This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Tuesday, August 18 • 11:00am - 11:20am
Scalable Analytics of Machine Data

Sign up or log in to save this to your schedule and see who's attending!

Datacenters generate a voluminous amount of machine data ranging from performance metrics, workload activities, resource utilization, system configuration, topologies, events, logs, and failures. Analysis of such data can yield actionable insights for system admins and IT decision-makers to improve efficiency and reduce risk in their infrastructure. CloudPhysics has built a SaaS application which receives machine data from hundreds of thousands of servers around the world and provides data-driven IT analytics. As machines can generate data much faster than humans, building a data pipeline to handle this firehose presents unique challenges. This talk covers our experience in building a scalable analytics back-end for both real-time streaming and batch analysis of machine data, using Scala, Spark, and NoSQL technologies on AWS. We will discuss a unified modeling and analysis framework for heterogeneous, dynamic, semi-structured machine data. We will share the characteristics of our analytical workload, the scaling principles learned through iterations of the back-end, and efficiency gains achieved.

avatar for Xiaojun Liu

Xiaojun Liu

Xiaojun Liu is a co-founder and Chief Scientist at CloudPhysics and focuses on building the machine data analytics back-end that generates actionable insights for users. Prior to CloudPhysics he worked at Google, Salesforce.com, and Sun Microsystems on performance engineering and system modeling and simulation. He holds M.Eng. and B.Eng. degrees from Tsinghua University in Beijing and a Ph.D. in EECS from UC Berkeley.

Tuesday August 18, 2015 11:00am - 11:20am
Track B

Attendees (8)