Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Thursday, August 13
 

9:00am

FinagleCon
The first annual Finagle community conference 

Colocated with Scala By the Bay, there will be a whle day of TwitterOSS, free to SBTB attendees!


Thursday August 13, 2015 9:00am - 5:00pm
Twitter HQ Twitter HQ One 10th Street Expansion 875 Stevenson Street San Francisco, CA
 
Friday, August 14
 

8:00am

Breakfast sponsored by Typesafe
Red Door Catering
Oakland, CA 

Friday August 14, 2015 8:00am - 9:15am
Foyer Outside Venue A

8:00am

Registration
Friday August 14, 2015 8:00am - 3:00pm
Foyer Outside Venue A

8:15am

Wifi sponsored by Workday
Friday August 14, 2015 8:15am - 8:45am
Kaiser Center

8:15am

Coffee sponsored by BoldRadius
Friday August 14, 2015 8:15am - 11:30am
Foyer Outside Venue A

8:30am

Opening Remarks and updates
Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Chief Scientist, Nitro/By the Bay
Chief Scientist at Nitro, founder and organizer, SF {Scala, Text, Spark, Reactive}, {Scala, Big Data Scala, Text, Data, ...} By the Bay.


Friday August 14, 2015 8:30am - 8:45am
Track A

8:45am

Welcome - The Rise of the Full-stack Scala Employee
Speakers
avatar for Tiho Bajic

Tiho Bajic

CTO, Nitro


Friday August 14, 2015 8:45am - 9:00am
Track A

9:00am

Keynote I: If you aren't inspired, do something else.
Speakers
AH

Andrew Headrick

I started programming when I was 13 and have been using Scala as my primary language since 2007. I am the co-founder and cto of InnoVint, the greatest winery operations and wine production platform in the universe.


Friday August 14, 2015 9:00am - 9:30am
Keynote

9:40am

Keynote II: How important is choice of language to build a scalable platform?
We all know technology choices have a profound impact on platform development, especially if that platform needs to scale. Choice of technology at the app server level, DB and specific design choices have an impact on scale. What about the choice of language? What does that have to do with technology choices? A strong team builds stellar software that shines. Does language have impact on the formation of the team? We all know team sets tone for culture. How does language choice impact the team culture, hiring, ramp-up, productivity? We will share the wisdom/learning of developing all Scala based scalable platform here at Verizon, ONcue. During the development of the OnCue service, we will share our experience that language choice is similar to a tip of the iceberg problem, the first area that is easy to observe but the majority of the scaling challenges are hidden and waiting to be discovered as problems arise. As we share our experience on platform development, we will also shed light on the least discussed topic on devops. What impact does a level of automation and the engineering focus on the ops team have to do with the developer productivity and the evolution of the platform.

Speakers
VN

Vidhya Narayanan

Vidhya Narayanan is a Director of Engineering for Verizon, ONcue in San Jose, CA, where she leads a team that develops the cloud platform responsible for all of the client facing APIs, analytics, recommendation and advertisement and a team that develops tools and solutions for continuos development, delivery and deployment into the cloud. Before Verzion, Vidhya used to manage teams in the Java development group and NetBeans IDE organizations... Read More →


Friday August 14, 2015 9:40am - 10:10am
Keynote

10:20am

Keynote III: Data Science at Scale with Spark
Apache Spark has been blessed as the replacement for MapReduce in Hadoop environments. It also runs in other deployment modes. Spark provides better performance, better user productivity, and it supports a wider range of application scenarios than MapReduce, including event stream processing, ad hoc queries, graph representations and algorithms, and iterative algorithms, such as those commonly used in machine learning.

This talk discusses Spark from a Data Science perspective, it's strengths and weaknesses, the Scala, as well as Java, Python, and R APIs it offers for common analytics problems, what's missing, and what's planned. We'll look at support for ad hoc queries over large data sets, stream processing, machine learning algorithms, graph processing, and the user experience.

Speakers
avatar for Dean Wampler

Dean Wampler

Office of the CTO, Architect for Big Data Products and Services, Typesafe
Dean Wampler, Ph.D. (@deanwampler) leads the Big Data efforts at Typesafe, focusing on Spark, Mesos, Hadoop, Akka, and other tools. He is the author of "Programming Scala, Second Edition" and "Functional Programming for Java Developers", and the co-author of "Programming Hive", all from O'Reilly. Dean is a contributor to several open source projects and he co-organizes and speaks at many technology conferences and Chicago-based user groups.


Friday August 14, 2015 10:20am - 10:50am
Keynote

11:00am

Drinking the Free Kool-Aid
Put your abstract algebra knowledge to work for you by leveraging free monads, coproducts, and interpreters to separate concerns and create a compelling application architecture. Define work in terms of free monads, keeping it pure, and then swap out interpreters to change runtime semantics. Together we’ll explore the how and why free monads are a useful abstraction that undergirds libraries like Facebook’s Haxl and Twitter’s Stitch. We’ll live code an implementation to garner intuition and so you can impress all your friends.* *Disclaimer: friends may not be impressed.

Speakers
DH

David Hoyt

Father, husband (sorry, too late if you were interested), and rather opinionated FP advocate, specializing in distributed systems. If you'd like to know more, let's chat!


Friday August 14, 2015 11:00am - 11:40am
Track A

11:00am

Developing functional domain models with event sourcing
Event sourcing persists each entity as a sequence of state changing event. An entity’s current state is derived by replaying the events. Event sourcing is a great way to implement event-driven micro services. When one service updates an entity, the new events are consumed by other services, which then update their own state. In this talk we describe how to implement business logic using a domain model that is based on event sourcing. You will learn how to write functional, immutable domain models in Scala. We will compare and contrast a hybrid OO/FP design with a purely functional approach. You will learn how Domain Driven Design concepts such as bounded contexts and aggregates fit in with event-driven microservices.

Speakers
avatar for Chris Richardson

Chris Richardson

Founder, Eventuate, Inc
Chris Richardson is a developer and architect. He is a Java Champion, a JavaOne rock star and the author of POJOs in Action, which describes how to build enterprise Java applications with frameworks such as Spring and Hibernate. Chris was also the founder of the original CloudFoundry.com, an early Java PaaS for Amazon EC2. He consults with organizations to improve how they develop and deploy applications and is working on his third startup.


Friday August 14, 2015 11:00am - 11:40am
Track B

11:50am

Does your Scala Abide?
[Scala Abide](https://github.com/scala/scala-abide) helps you keep your Scala source code classy, at a semantic level. Abide doesn't care as much about formatting (there are other fine tools for that) -- it's all about the types. Abide is primarily a platform to build your style checking rules on, and in this talk I will show you how to do just that. Abide comes with a standard set of rules, and can easily be plugged into most build flows (from maven and sbt to Scala IDE), as it is implemented as a Scala compiler plugin.

Speakers
avatar for Adriaan Moors

Adriaan Moors

I work behind the scenes on everything Scala. I implemented type constructor polymorphism in Scala 2.7, did some work on type constructor inference and improving the interaction between dependent method types and implicit search, rewrote the new pattern matcher in 2.10 from scratch, slogged through the big XML refactoring of 2.11's Ant build, and, most recently, I bootstrapped the effort to capture Scala's CI infrastructure as a Chef cookbook... Read More →


Friday August 14, 2015 11:50am - 12:30pm
Track A

11:50am

Scala.js: confessions of a backend engineer
Imagine you’re at a tiny startup, or perhaps on a small team. You need a quick-and-dirty, proof-of-concept UI and there’s no time to find a frontend developer. Or maybe you and your peers desperately need a web-based tool to make your jobs easier, and you realize that if you don’t build it, no one else will. But you’re “the backend guy” (or gal). You just don’t have time to dust off the little bit of JavaScript you learned 5 years ago. In fact, you chuckle at the thought of trying to get your teammates to maintain a JavaScript codebase. Fortunately, you don’t have to. You can write it all in Scala using Scala.js. In this talk, I’ll share my personal experience using Scala.js from the perspective of a backend Scala developer, and also walk you through what it’s like to build a real world client-server application using Scala.js.

Speakers
avatar for Julie Pitt

Julie Pitt

Co-Founder, Order of Magnitude Labs
As Co-Founder of Order of Magnitude Labs, Julie uses Scala to develop learning algorithms capable of producing intelligent behavior. She spent the previous 10 years developing JVM server-side applications in a variety of environments, from startups to government institutions. Most notably, she was on the team that first brought Netflix to a variety of popular platforms and launched the service internationally. Along the way, Julie made several... Read More →


Friday August 14, 2015 11:50am - 12:30pm
Track B

12:30pm

Lunch sponsored by Cloudera
Red Door Catering,
Oakland, CA 

Friday August 14, 2015 12:30pm - 1:30pm
Foyer Outside Venue A

1:30pm

Batteries included - How Scala makes Data Centers more power efficient
In this talk Erich Nachbar will present how Scala is used to introduce Software Defined Power in Data Centers. The architecture to manage server power allocations relies on containerizes Scala services and a large scale data processing pipeline to predict, control and monitor power. He will introduce the used architecture patterns to: * run containerized Scala services on CoreOS * utilize Kafka as processing backbone to increase robustness and scale * store timeseries & JSON data in Cassandra (KairosDB, Doradus) * implement service discovery for Scala and other infrastructure services through a Software Defined Network (Weave).

Speakers
avatar for Erich Nachbar

Erich Nachbar

Erich Nachbar is Cofounder and CTO at Virtual Power Systems. He has been building scalable systems during his entire career. As a technology enthusiast he has been adopting promising technology like Hadoop (2007), Spark (2012) & Kafka (2012) early on. Erich has been speaking at various conferences like the Hadoop and Spark Summit on how to design contemporary system architectures and deploy them in production. Speaker Video... Read More →


Friday August 14, 2015 1:30pm - 2:10pm
Track A

1:30pm

Play Webapp Architecture, Shiny Objects, and a Modest Proposal
ScalaCourses.com has been serving online Scala and Play training material to students for over two years. ScalaCourses.com teaches courses on the same technology stack that the web site runs on. The Cadenza application that powers ScalaCourses.com is a Play Framework 2 application, written in Scala and using Akka, Slick, AWS and Postgres. Some of the architectural features in Cadenza that allow a modest-sized Play application to serve large amounts of multimedia data efficiently will be discussed, including  technical details of how to work with an immutable domain model that can be modified. Over the last 2+ years the underlying technology has changed a lot; a brief history of Play Framework will be recounted, and how that impacted Cadenza. The session will conclude with a proposal regarding Play Framework's future.

Speakers
avatar for Michael Slinn

Michael Slinn

Managing Editor, ScalaCourses.com
Mike Slinn is the architect and lead developer of Cadenza. He co-ordinates the ScalaCourses.com instructors as well as being a major contributor to the Scala / Play courses provided through ScalaCourses.com. Mike has written 3 books, including "Composable Futures With Akka 2.0". Mike has traveled the world for decades teaching computer software commercially; happily, he now travels much less and uses video conferencing instead. Mike has been... Read More →


Shiny pdf

Friday August 14, 2015 1:30pm - 2:10pm
Track B

2:20pm

Enterprise APIs With Ease Using Scala
Our REST API server project at Netflix was a big success, resulting in 150 web services in five months and no downtime. By taking advantage of self-documenting API frameworks, continuous integration with JSON-friendly API tests and automated deployments, we were able to focus on building services that would publicize themselves. This session highlights the engineering tools and processes that enable rapid API development, deployment and adoption. I'll cover the Scalatra, Swagger, and ScalaTest frameworks as well as Netflix's push-button deployment system using Jenkins and Asgard. 

Speakers
avatar for Jason Swartz

Jason Swartz

Software Developer, ClassPass
Jason Swartz is a software developer who enjoys intuitive user interfaces, expressive programming languages and concise user documentation. He built APIs at eBay and Netflix, and is currently learning to live sweaty at ClassPass in SF. His book, “Learning Scala”, was published by O’Reilly Media in December 2014.


Friday August 14, 2015 2:20pm - 2:40pm
Track A

2:20pm

An overview of Axle: a Scala-embedded DSL
Axle is a Scala-embedded domain-specific language built on Spire. This talk will cover the architecture and objectives of the project. Live coding examples will show how "design patterns" from abstract algebra apply to a range of domains including machine learning, bioinformatics, game theory, and statistics. It will also demonstrate how typeclasses allow the abstraction of the "platform". This enables algorithms to be projected onto a wide variety of platforms (including Spark).

Speakers
avatar for Adam Pingel

Adam Pingel

VP of Engineering, Ravel Law
Adam Pingel has been writing code in some form or another since 1982, and specifically in Scala since 2009. In September he joins Ravel Law as VP of Engineering.


Friday August 14, 2015 2:20pm - 2:40pm
Track B

2:50pm

Proactively Lazy: Enforcing Laziness with Types
Laziness as expressed with by-name parameters to functions can lead to surprising behavior with evaluation occurring where the developer might not expect. This is especially true when using complex helpers or building DSLs that pass by-name through several functions. This talk will show how to use Scala's type system to enforce where evaluation occurs by lifting lazy and by-name values into monadic constructs.

Speakers
avatar for Andy Wortman

Andy Wortman

Andy has a grand vision of seeing the words "proof" and "correctness" as common as "test" and "spec" already are. More realistically, he'd be elated to see better failure modes become the norm. By day he writes Scala at Originate for all kinds of work; the rest of the time he keeps Scala around as a trusty tool to do all kinds of things, not just web apps!


Friday August 14, 2015 2:50pm - 3:10pm
Track A

2:50pm

What JavaScript taught me about programming in Scala
As a longtime Scala developer who recently switched to (nearly) full-time JavaScript development, I have learned some interesting and unexpected lessons about the capabilities and ecosystem of both JavaScript and Scala. In this talk, we'll go through some of the more compelling examples, and see what one language might learn from the other.

Speakers
avatar for James Earl Douglas

James Earl Douglas

Sr. Software Engineer, Wikimedia
A functional programmer with advanced experience developing production software in Scala and Java, James is passionate about continuous learning and keeping just outside of his comfort zone. When not knee-deep in type theory, James can be found running, cycling, and stargazing around the San Francisco Bay Area.


Friday August 14, 2015 2:50pm - 3:10pm
Track B

3:20pm

Actor-based multi-cloud orchestration and management using Apache Jclouds
Handling of multiple cloud environments can be a very cumbersome task. Through this talk I wish to present a Centralized Cloud Management and Orchestration framework to solve that. A brief description of the software stack follows - Scala bindings of Apache Jclouds serves as the underlying layer for handling integration and launching instances in multiple clouds. The orchestration is handled by Akka actors. A sub-module called Recipes, which is built on Jclouds’s ScriptBuilder class allows creation of installation and configuration files for components of the Big Data stack. The project uses Spray for handling API requests and Play for serving the front-end. The SORM-framework was chosen for making database calls to MySQL. This has been developed and tested at Sigmoid. An on-premise version of this product is in the works at Sigmoid.

Speakers
AV

Ajay Viswanathan

Software Engineer, Sigmoid
Ajay Viswanathan is a Software Engineer at Sigmoid handling Cloud and Infrastructure Management and DevOps with focus on real-time Big data analytics. He enjoys cross-country cycling and photography in his free time.


Friday August 14, 2015 3:20pm - 3:45pm
Track B

3:20pm

A Tour of Functional Type Classes via Scodec and Simulacrum
Of the major functional type classes, functor and monad garner the most attention, with applicative functors a close runner up. There are many more type classes which are less known -- invariant and contravariant functors, monoidal functors, invariant monads, etc. In this talk, we'll tour a variety of type classes, including the big three (functors, applicatives, and monads) as well as some lesser known ones, by looking at practical examples of their use in the scodec library. We'll also look at how these type classes can be encoded in Scala using the Simulacrum type class support library. 

Speakers
avatar for Michael Pilquist

Michael Pilquist

Software Architect, Comcast
Michael Pilquist is the author of Scodec, a suite of open source Scala libraries for working with binary data, and Simulacrum, a library that simplifies working with type classes. He is also a committer on a number of other projects in the Scala ecosystem, including Cats and FS2. He is also the chief software architect at Combined Conditional Access Development (CCAD), a joint venture between Comcast and ARRIS, Inc., where he is responsible for... Read More →


Friday August 14, 2015 3:20pm - 4:20pm
Track A

3:55pm

Legacy modernization from monolithic PHP to reactive SOA
There are two types of 5 year old software startups: the ones that are embarrassed about the technical debt they accrued during rapid growth, and the ones that failed. To survive and scale towards 100M users, Hootsuite is undergoing an architectural transformation from our legacy monolithic PHP to reactive SOA with Scala, Akka, and Play. How do we pivot to a new architecture while retaining identical functionality with low risk and zero downtime? How do we interface new reactive systems with legacy systems that can’t keep up? This talk will detail two modernization projects accomplished in 2014. The first was our user authentication and information microservice, which we developed, migrated to and deployed with zero downtime. The second is a URL shortening and image hosting service that provides both an API to our web and mobile products and a user interface.

Speakers
avatar for Mike White

Mike White

Sr. Software Engineer, Hootsuite
Mike is a Senior Engineer at Hootsuite, building key pieces of the service-oriented architecture during its drive to “everything as a service”. With roots in enterprise legacy modernization (COBOL and Fortran to JavaEE), Mike has embraced Scala and SaaS and he's not looking back. When not pushing to production, Mike enjoys pushing his kids (on swings).


Friday August 14, 2015 3:55pm - 4:20pm
Track B

4:30pm

Suffuse: usable virtual filesystems
Suffuse is a scala library facilitating creation and manipulation of virtual filesystems. The applications are endless: a physical filesystem can be treated as a collection of files and presented in any other form, for instance filtering files, transforming filenames, or altering file contents. Or a virtual filesystem can be created from physical files or from pure data, allowing infinitely large filesystems, infinitely varied files which may be synthesized on demand, two-way lenses which transform all reads and writes to a file, and much more.

Speakers
avatar for Paul Phillips

Paul Phillips

Paul Phillips is co-founder of Typesafe and the all-time most prolific Scala committer. No longer affiliated with either, he now volunteers as Scala's conscience.


Friday August 14, 2015 4:30pm - 5:10pm
Track A

4:30pm

Towards Reliable Lookups
Why do data structure lookups often return Options? Could we safely eliminate all the recovery code that we hope is never called? We will see how Scala’s type system lets us express referential integrity constraints to achieve unparalleled reliability. We apply the technique to in-memory data structures using the Total-Map library and consider how to extend the benefits to persisted data.

Speakers
avatar for Patrick Prémont

Patrick Prémont

Functional Programming Architect, BoldRadius Solutions
Patrick Prémont is a Functional Programming Architect at BoldRadius Solutions where he helps clients build reliable applications in Scala and ScalaJS. For a decade he led the development of a portable game development platform: first as a startup founder, and later as a Technical Director at Electronic Arts. He first encountered functional programming with the Miranda language, more than 20 years ago, and took up Haskell professionally 6 years... Read More →


Friday August 14, 2015 4:30pm - 5:10pm
Track B

5:20pm

An embedded DSL to manipulate Mathprog Mixed Integer Programming models within Scala
GNU Mathprog is an algebraic modeling language for describing mathematical programming models. The language is a subset of AMPL, the most popular language used for describing such models. One advantage of this languages is the similarity of its syntax with the mathematical notation used to describe optimization problems. In this talk we describe the use of Scala in the construction of an embedded DSL to work with Mathprog models. At construction time, the DSL works as an alternative way to build the AST representing the model, taking advantage of the Scala compiler to check the validity of most expressions. The AST can be created also by parsing Mathprog model files. Once an AST is obtained, the DSL allows to manipulate and transform the AST to extract data or create new models. We show how the use of type classes and implicit conversion prioritization gives us a systematic way to define operators, valid combinations of parameters and result types, and conversions between expressions. At the end, we explore possible applications of the DSL to stochastic programming models.

Speakers
avatar for Germán Ferrari

Germán Ferrari

Germán Ferrari is an MSc student in Computer Science at the Basic Sciences Programme (PEDECIBA), Universidad de la República (UdelaR), Uruguay. He is teaching/research assistant at the Operational Research Department of the Instituto de Computación, UdelaR, and also works as Software Engineer at the Central Bank of Uruguay. He is a Scala enthusiast since late 2008, and co-organizes the Scala Meetup Montevideo.


Friday August 14, 2015 5:20pm - 5:40pm
Track B

5:20pm

How you convince your manager to adopt Scala.js in production: Take 1.
The talk will present fully functional sample application developed with Scala.js, scalatags, scalacss and other Scala and Typesafe technologies. We aim to show all the pros and cons for having Scala coast-to-coast approach to web-application development and encourage people not to shy away from asking difficult questions challenging this approach. Participants can expect to gain a clear view on the current state of the Scala based client side technologies and take away an activator template with application code that could be used as a base for technical discussions with their peers and managers. P.S. Second speaker details: email - dave.sugden@boldradius.com ; company and role - Senior Software Engineer, BoldRadius Solutions ; picture - https://prismic-io.s3.amazonaws.com/boldradius/a72ab22c9c43a726ea03d7dbf92cf5221b580cc0.jpeg ; twitter handle - @fritzsss

Speakers
avatar for Katrin Shechtman

Katrin Shechtman

Katrin is a Senior Software Engineer at BoldRadius where she successfully employs her love for Scala, Akka and Play. Being Scala autodidact since 2012 (for the record: beginning of the year -:)) and constantly striving for functional perfection since then, Katrin helps BoldRadius clients and her fellow Toronto based Scala fighters to practice type safety for better software results. 


Friday August 14, 2015 5:20pm - 6:00pm
Track A

6:00pm

Scala by the Bay Beer and Wine Reception sponsored by Nitro
Join our host sponsor Nitro, By the Bay staff, speakers and sponsors for a meet-and-greet reception featuring passed appetizers and a selection of local California beers and wines.

Friday August 14, 2015 6:00pm - 8:00pm
Foyer Outside Venue A
 
Saturday, August 15
 

8:00am

Breakfast sponsored by Twitter
Red Door Catering
Oakland, CA 

Saturday August 15, 2015 8:00am - 9:15am
Foyer Outside Venue A

8:00am

Registration
Saturday August 15, 2015 8:00am - 3:00pm
Foyer Outside Venue A

8:15am

Coffee sponsored by BoldRadius
Saturday August 15, 2015 8:15am - 5:30pm
Foyer Outside Venue A

8:15am

Wifi sponsored by 47 Degrees
Saturday August 15, 2015 8:15am - 5:30pm
Kaiser Center

8:45am

Opening Remarks and Updates
Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Chief Scientist, Nitro/By the Bay
Chief Scientist at Nitro, founder and organizer, SF {Scala, Text, Spark, Reactive}, {Scala, Big Data Scala, Text, Data, ...} By the Bay.


Saturday August 15, 2015 8:45am - 9:00am
Track A

9:00am

Keynote IV: The Sadness at the End of the Happy Path
Resilience; most developers understand what the word means, at least superficially, but way too many lack a deeper understanding of what it really means in the context of the system that they are working on now. I find it really sad to see, since understanding and managing failure is more important today than ever. Outages are incredibly costly—for many definitions of cost—and can sometimes take down whole businesses. In this talk we will explore the essence of resilience. What does it really mean? What is its mechanics and characterizing traits? How do other sciences and industries manage it? We will see that everything hints at the same conclusion; there is no "happy path", failure is an option and resilience is by design. In this talk we will explore how. 

Speakers
avatar for Jonas Bonér

Jonas Bonér

Jonas Bonér is a programmer, speaker, writer, Java Champion and entrepreneur. He is the co-founder and CTO of Typesafe and is an active contributor to the open source community; most notably started the Akka project and the AspectWerkz AOP runtime (now AspectJ). Learn more at: jonasboner.com


Saturday August 15, 2015 9:00am - 9:45am
Keynote

9:45am

Keynote V: Scala community matters
If you're here at Scala by the Bay, you're part of the Scala community. Let's take a look at what matters in the Scala community, how you can help, and what matters we still need to address as a community. Any way you can combine the words "Scala", "community" and "matters" we will explore.

Speakers
DW

Dick Wall

Scala Community Guy
Dick founded the Bay Area Scala Enthusiasts (BASE)— One of the first Scala user groups. Dick is also the first recipient of the Phil Bagwell Memorial Scala Community Award. He is a committer on several Scala open source projects and creator of SubCut - a dependency injection solution for Scala. Dick also is a former Google developer advocate where he taught hundreds of developers, both internal and external to Google... Read More →


Saturday August 15, 2015 9:45am - 10:15am
Keynote

10:20am

Scraping Reddit with Akka Streams
"Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure on the JVM." - reactive-streams.org Akka streams is Typesafe's implementation of the Reactive Streams standard. It provides a high level DSL for building composable stream processing pipelines. This talk will introduce Akka Stream concepts like sources, flows, sinks, and flow graphs by building a throttled stream processing pipeline for scraping Reddit comments and persisting the resulting word counts. Plenty of in-REPL examples. Based on this tutorial: https://github.com/pkinsky/akka-streams-example

Speakers
avatar for Paul Kinsky

Paul Kinsky

Software Engineer, Nitro PDF
Paul Kinsky has been writing Scala for the last 4 years. Outside of work, he enjoys climbing, science fiction, and tinkering with Haskell and laser cutters.


Saturday August 15, 2015 10:20am - 10:50am
Track A

10:20am

Functional Domain Modeling: Beyond Buzzword-Compliance
Functional domain modeling isn't just a combination of favorite terms. I'll explain why "functional" modeling of problem domains may actually be the original kind of domain modeling. A functional approach can be just as semantically clear as stateful approaches -- or more so. I'll show how an example solar domain can provide a loose semantics for an API project in Scala to demonstrate how it can help tie types and services in your codebase to a real-world domain.

Speakers
avatar for Luke Smith

Luke Smith

I am a senior software engineer at Sungevity in Oakland, California. I'm particularly interested in creating software that helps mission-driven organizations meaningfully connect with large populations of people.


Saturday August 15, 2015 10:20am - 10:50am
Track B

11:00am

Scala: Power and Versatility
Scala is an amazingly powerful tool for all sorts of applications, from web and mobile to desktop and the cloud. Monadic patterns in Scala let me express both sequential and asynchronous data flows easier than ever. I could use these patterns in an SBT plugin for submitting solutions to Google Code Jam problems where I could link together REST requests with local computations. Scala futures and Akka let me scale bioinformatics algorithms both up and out. Scala.js let me use bioinformatics algorithms in the browser—one language to rule them all! Combining Scala.js and Play helped me build web applications that are easy to use and fun to build! In this talk, I will present a variety of projects that show Scala's power and versatility.

Speakers
avatar for Shadaj Laddad

Shadaj Laddad

Shadaj loves to program. He has programmed in Ruby, Python, C, Java, Javascript, and Scala—his favorite. Shadaj hosts his projects on GitHub, and has a channel on Youtube. He has presented at OSCON 2014, Scala Days (2012, 2013, 2014, 2015), and the Bay Area Scala Enthusiast group showing his Scala projects. Besides programming, he likes Math and Science. He has interned at Coursera working with technologies such as Play! Framework and Scala.js... Read More →


Saturday August 15, 2015 11:00am - 11:40am
Track A

11:00am

Get Productive with Scala Macros: Interactive live-coding session where we code a Scala macro library using Scala quasiquotes.
Many developers shy away from Scala macros. Admittedly, they can be esoteric and tend to have a high learning-curve but in this interactive coding session we will live-code a library (or two) to demystify some of the concepts and show some common debugging techniques.

Speakers
avatar for Pathikrit Bhowmick

Pathikrit Bhowmick

Principal Engineer, Coatue
Pathikrit writes Scala full-time at a hedge fund. He is also the author of many widely used Scala libraries: https://github.com/pathikrit


Saturday August 15, 2015 11:00am - 11:40am
Track B

11:50am

Full stack Scala
Recently I started working on my startup’s first product. The product backend serves a REST API and has a frontend that consists of a single-page web app (with mobile apps coming soon). For the backend, the obvious choice was Scala, but we also chose Scala on the frontend through Scala.js along with React. These choices have worked incredibly well for us. In this talk, I will discuss lessons learned in using full-stack Scala.

Speakers
avatar for Ramnivas Laddad

Ramnivas Laddad

Ramnivas is a technologist, author, and presenter who is passionate about doing software right. He has been leading innovation in Spring Framework and Cloud Foundry since their beginning. Ramnivas has led a group in Cloud Foundry and started the Spring Cloud project. Ramnivas is the author of AspectJ in Action (1st ed, 2nd ed), the best-selling book on aspect-oriented programming that has been lauded by industry experts for its presentation of... Read More →


Saturday August 15, 2015 11:50am - 12:30pm
Track A

11:50am

Powerful and elegant Scala one*-liners
Traditionally, proponents of dynamic scripting languages have been particularly proud of their mighty one-liners, bragging about how much they can accomplish with but a few keystrokes. Statically typed languages, on the other hand, have earned quite a reputation as painfully verbose. A meager hello world in Java, for instance, requires no less than 5 lines and 124 characters of code (a little less if you indent with tabs). It's been said that Scala is the static language that feels dynamic. With its powerful and expressive syntax, concise yet not terse, a single line of Scala code can leverage an impressive amount of JVM power. In this interactive, hands-on session I show you some of the most powerful and elegant Scala one*-liners that give us dynamic productivity in a static language. Ready your REPL, we are going to write some code! (*for different values of "one")

Speakers
avatar for Jason Arhart

Jason Arhart

Jason works for Originate as Scala Engineer. He is in charge of | setting best practices for the company and ensuring that Scala | projects are constantly improving in quality.


Saturday August 15, 2015 11:50am - 12:30pm
Track B

12:30pm

Lunch sponsored by Ticketfly
Red Door Catering,
Oakland CA 

Saturday August 15, 2015 12:30pm - 1:30pm
Foyer Outside Venue A

1:30pm

Programs as Values: JDBC Programming with Doobie
FP is often glibly described as "programming with functions", but we can equivalently say that FP is about programming with *values*. Functions are values, failures are values, effects are values, and indeed programs themselves can be treated as values. This talk explores some consequences of this idea of programs-as-values by looking at *doobie*, a pure-functional database library for Scala. We will examine the low-level implementation via free monads over algebras of the JDBC types, and show how this naturally enables an expressive and safe high-level API with familiar, lightweight, and composable abstractions including streaming result sets, and trivial compatibility with other pure libraries like Remotely and HTTP4s. Even if you are not in the market for a database layer, this talk will provide a practical outline for a general solution to dealing with terrible APIs (like JDBC) in a principled way.

Speakers
avatar for Rob Norris

Rob Norris

Irritant, Gemini Observatory
By day Rob writes software for big telescopes, and by night he hacks on open-source FP libraries, mostly in Scala. He is a regular irritant on IRC and Twitter where he goes by the handle tpolecat. Rob lives in Portland with his wife, daughter, and cat, all of whom are very good sports.


Saturday August 15, 2015 1:30pm - 2:10pm
Track A

1:30pm

The Scala.js compilation pipeline
This talk will give you a deep insight into the moving parts that were required to make Scala.js production ready. After laying out the internal parts of the Scala.js compilation, linking and optimization pipeline, we deep-dive into selected components in an attempt to demystify Scalac's JavaScript back-end.

Speakers
avatar for Tobias Schlatter

Tobias Schlatter

Tobias Schlatter is one of the Scala.js core contributors and has accompanied Scala.js from an early post-proof of concept stage to production readiness during his time at LAMP, EPFL. He actively maintains major parts of the Scala.js core project and takes an active role in shaping Scala.js' future.


Saturday August 15, 2015 1:30pm - 2:10pm
Track B

2:20pm

Scalactic EquaSets: Sets with a Different Idea of Equality
Scalactic EquaSets provide a convenient set implementation when you need to determine set membership via a function other than the .equals method. They enforce at compile time that two sets can be composed via intersect, union, or diff only if they have the same notion of equality. They allow you map and flatMap only through lazy intermediate objects. This talk will give you an overview of this API and walk you through the reasons behind its design choices.

Speakers
avatar for Bill Venners

Bill Venners

President & CEO, Artima, Inc.
Bill Venners is president of Artima, Inc., publisher of Scala books and developer tools, and co-founder of Escalate Software, LLC, provider of Scala training and consulting. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality, and coauthor with Martin Odersky and Lex Spoon of the book, Programming in Scala.


Saturday August 15, 2015 2:20pm - 2:40pm
Track A

2:20pm

Automatically deriving efficient data structures in Scala
It's common to need a collection data structure which supports a particular set of operations efficiently. For example, hash maps are a good choice if you just need to support get and set, and a min heap is a good choice if you need add and getMin. I describe a system I'm building which takes a Scala specification of the methods needed in a collection class, then works interactively with the user to decide on the best set of data structures to use to implement the desired interface with maximum efficiency, then generates the necessary code for this optimized collection class.

Speakers
avatar for Buck Shlegeris

Buck Shlegeris

Software Engineer, Triplebyte
Buck Shlegeris works at Triplebyte by day. By night, he enjoys writing Scala, thinking about programming language theory and data structures, and writing music.


Saturday August 15, 2015 2:20pm - 2:40pm
Track B

2:50pm

CQRS/ES with Scala and Akka Persistence
Command Query Responsibility Segregation with Event Sourcing (CQRS/ES) is a powerful way to build distributed systems. With tools like Scala, Akka and Akka Persistence, we now have tools at our disposal that allow us to build clustered environments that implement these types of patterns. This talk will explain the rationale behind CQRS/ES, how it fits the Reactive paradigm and the advantages at both the technology and business level. We will also discuss the implications of consistency models and how this approach affects Domain Driven Design.

Speakers
avatar for Duncan DeVore

Duncan DeVore

Engineer, Typesafe, Inc.
I'm passionate about the design and implementation of distributed systems using the tenets of the Reactive Manifesto with Scala, Akka and the Typesafe stack. I believe in responsible design through functional programming with and abundance of test coverage. I love to code, present and help others work through the challenges of distributed computing. | | Co-author of "Reactive Application Development", Manning Publications, Co... Read More →


Saturday August 15, 2015 2:50pm - 3:10pm
Track A

2:50pm

Evolving Your Code: More Functional Error Handling in Scala
There are many ways to approach error handling in Scala, some of which will look more familiar to those coming from Java backgrounds, and some that are more familiar to those coming from the FP side. In this talk, we'll explore refactoring your error handling code from Java-like Scala to a more functional style. We'll also take a brief look at some of the already existing constructs for functional error handling in the Scala ecosystem.

Speakers
avatar for Long Cao

Long Cao

Long is a software engineer at MediaMath in New York City and has been doing full time Scala development for the last 3 years. Prior to MediaMath he was an engineer writing Scala at Pellucid Analytics and Gawker Media, and hails from the great state of Texas. When not coding, he can be found indulging in coffee, beer, or both kinds of football.


Saturday August 15, 2015 2:50pm - 3:10pm
Track B

3:20pm

A better Scala REPL?
The Scala REPL has been often touted as an advantage for the language: an interactive, exploratory experience very different from the static, often-IDE-based experience that for many is the bulk of their experience using Scala. Nevertheless, in comparison, the Scala REPL really sucks: buggy & unfriendly, it is not a place you want to spend most of your time.
What if the Scala REPL had the same autocomplete as you'd get in Eclipse or IntelliJ? What if it had syntax-highlighting for everything? What if you could load libraries like Shapeless or Akka-HTTP to try out, without needing to muck with SBT? What if your Scala REPL was as versatile, usable and configurable as Bash or Zsh, and could be used as your home on the command line?

Come hear about how you could turn all these "what if"s into a reality!

Speakers
avatar for Li Haoyi

Li Haoyi

Software Engineer, Dropbox
Haoyi is a software engineer at Dropbox who works on Python/Coffeescript during the day and contributes to the Scala open-source ecosystem at night. He is known for his contributions to the Scala.js project, writing a JVM from scratch in 3000LOC, and doin


Saturday August 15, 2015 3:20pm - 4:20pm
Track A

3:20pm

Lift-ng: Secure, rapid web development with Scala and AngularJS
In this talk I present lift-ng, an open-source library which extends the Lift web framework into a powerful and secure Angular backend that equips developers to meet today’s demands for rapidly- developed, rich and secure web applications. In this live-coding demo*, I show how Scala’s elegantly malleable syntax allows a DSL which gives the appearance of extending Angular down to the server. Members of the audience will leave equipped to build web applications with one of the hottest front end technologies backed by the cornerstone Scala web framework.

Speakers
avatar for Joe Barnes

Joe Barnes

Senior Software Engineer, AOL / go90
Joe Barnes is currently a Senior Software Engineer at AOL where he develops backend analytics applications and devops for go90. He has spent most of the last decade developing applications on the JVM, with Scala taking focus in late 2012. His contributions to the development community come through open source development, blogging, and speaking.


Saturday August 15, 2015 3:20pm - 4:20pm
Track B

4:30pm

Cats - A fresh look at Pure Functional Programming in scala
Cats is a very recent project which aims to foster Pure Functional Programming in Scala. It is providing typeclasses from category theory that one might expect to find if coming from haskell, but the project is focussing very heavily on trying to make a library which is approachable to people new to the concepts without making sacrifices in speed, purity, adherence to its mathematical underpinnings. Being able to start such a project from scratch has allowed us to rethink many old ideas, and to use lots of newly available libraries and techniques. We will explore the library, and pay special attention to some of the supporting technologies it is built upon.

Speakers
avatar for Stew O'Connor

Stew O'Connor

Stew O'Connor learned about then quickly became addicted to Scala about 3 years ago, and was drawn right to the pure functional programming endeavors in Scala, he made many contributions to the scalaz library, and now is one of the primary contributors to the Cats library. He works at Verizon as a software architect where they are trying to gradually steer a very large team of scala developers to a more pure and disciplined style of writing... Read More →


Saturday August 15, 2015 4:30pm - 5:10pm
Track A

4:30pm

Automatic Concurrency through Computation Expressions
Computation Expressions are a generalization of do-notation and idiom brackets. I will show how this notation allows one to write normal looking code that runs concurrently in a somewhat automatic fashion while retaining desired runtime properties. I will also present my library: https://github.com/jedesah/computation-expressions which is an encoding of Computation Expressions in Scala. If I have time, I will go over some of the implementation of the library. This is a follow-up talk in spirit to my talk at PNWScala: (https://www.youtube.com/watch?v=tU4pU5vaddU#t=823

Speakers
avatar for Jean-Rémi Desjardins

Jean-Rémi Desjardins

Software Engineer, Verizon Oncue
Jean-Remi Desjardins has been doing Scala for the past 3 years. He has contributed to many open source Scala projects including, among others, Scalaz, Shapeless, Sbt and the Scala language itself.


Saturday August 15, 2015 4:30pm - 5:10pm
Track B

5:20pm

Past, Present, and Future of Scala: Closing Panel with Martin Odersky
Speakers
MO

Martin Odersky

Martin is a German computer scientist and professor of programming methods at EPFL in Switzerland. He specializes in code analysis and programming languages. He designed the Scala programming language and Generic Java, and built the current generation of javac, the Java compiler. In 2007 he was inducted as a Fellow of the Association for Computing Machinery. In 1989, he received his Ph.D. from ETH Zurich under the supervision of Niklaus Wirth... Read More →
EO

Erik Osheim

Erik is a key member of Typelevel, co-author of Spire and other great Scala OSS.
avatar for Julie Pitt

Julie Pitt

Co-Founder, Order of Magnitude Labs
As Co-Founder of Order of Magnitude Labs, Julie uses Scala to develop learning algorithms capable of producing intelligent behavior. She spent the previous 10 years developing JVM server-side applications in a variety of environments, from startups to government institutions. Most notably, she was on the team that first brought Netflix to a variety of popular platforms and launched the service internationally. Along the way, Julie made several... Read More →
avatar for Bill Venners

Bill Venners

President & CEO, Artima, Inc.
Bill Venners is president of Artima, Inc., publisher of Scala books and developer tools, and co-founder of Escalate Software, LLC, provider of Scala training and consulting. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality, and coauthor with Martin Odersky and Lex Spoon of the book, Programming in Scala.
DW

Dick Wall

Scala Community Guy
Dick founded the Bay Area Scala Enthusiasts (BASE)— One of the first Scala user groups. Dick is also the first recipient of the Phil Bagwell Memorial Scala Community Award. He is a committer on several Scala open source projects and creator of SubCut - a dependency injection solution for Scala. Dick also is a former Google developer advocate where he taught hundreds of developers, both internal and external to Google... Read More →


Saturday August 15, 2015 5:20pm - 6:20pm
Track A
 
Sunday, August 16
 

8:00am

Registration
Sunday August 16, 2015 8:00am - 9:30am
Galvanize 44 Tehama St., San Francisco

9:00am

Complete End-to-End Data Pipeline Training

Scala By the Bay and Big Data Scala are joined by a day (8/16) of unique Complete End-to-End Data Pipeline Training, when hundreds of engineers will build, in one day, a complete analytics startup with

  • Mesos (Mesosphere DCOS) as a platform by Mesosphere
  • Akka-based API by Typesafe
  • Kafka message bus by Confluent
  • Spark streaming by Databricks
  • Cassandra for persistence by Datastax
  • Spark Notebook by Andy Petrella

Sunday August 16, 2015 9:00am - 6:00pm
Galvanize 44 Tehama St., San Francisco
 
Monday, August 17
 

8:00am

Breakfast sponsored by H2O
Red Door Catering
Oakland, CA 

Monday August 17, 2015 8:00am - 9:15am
Foyer Outside Venue A

8:00am

Registration
Monday August 17, 2015 8:00am - 3:00pm
Foyer Outside Venue A

8:15am

Coffee Sponsored by MemSQL
Monday August 17, 2015 8:15am - 5:30pm
Foyer Outside Venue A

8:45am

Opening Remarks and Updates
Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Chief Scientist, Nitro/By the Bay
Chief Scientist at Nitro, founder and organizer, SF {Scala, Text, Spark, Reactive}, {Scala, Big Data Scala, Text, Data, ...} By the Bay.


Monday August 17, 2015 8:45am - 9:00am
Track A

9:00am

Keynote I: Developing Big Data Components for a Recommender System
Speakers
DD

Debora Donato

Debora Donato is Sr. Director of Personalization and Principal Data Scientist at StumbleUpon. Before moving to StumbleUpon, Debora was Senior Scientist at Yahoo! Labs. Her research interests include User Behavior Analysis, Recommendation Systems, Web Information Retrieval, Link Analysis, Algorithms for the Characterization of the Web, Complex Networks and Social Networks.


Monday August 17, 2015 9:00am - 9:30am
Keynote

9:40am

Keynote II Apache Spark: The Killer App for Scala
Speakers
MO

Mike Olson

Mike co-founded Cloudera in 2008 and served as its CEO until 2013 when he took on his current role of chief strategy officer (CSO). As CSO, Mike is responsible for Cloudera’s product strategy, open source leadership, engineering alignment and direct engagement with customers. Prior to Cloudera Mike was CEO of Sleepycat Software, makers of Berkeley DB, the open source embedded database engine. Mike spent two years at Oracle Corporation as vice... Read More →


Monday August 17, 2015 9:40am - 10:10am
Keynote

10:20am

Keynote III BDS: Spark -- the Ultimate Scala Collections
Speakers
MO

Martin Odersky

Martin is a German computer scientist and professor of programming methods at EPFL in Switzerland. He specializes in code analysis and programming languages. He designed the Scala programming language and Generic Java, and built the current generation of javac, the Java compiler. In 2007 he was inducted as a Fellow of the Association for Computing Machinery. In 1989, he received his Ph.D. from ETH Zurich under the supervision of Niklaus Wirth... Read More →


Monday August 17, 2015 10:20am - 10:50am
Keynote

11:00am

Breakthrough OLAP performance on Cassandra and Spark
Apache Cassandra is rock-solid and widely deployed for OLTP and real-time applications, but is typically not thought of as an OLAP database for analytical queries.  This talk will show architectures and techniques for combining Apache Cassandra and Spark to yield a 10-1000x improvement in OLAP analytical performance.  We will then introduce a new open-source project that combines the above performance improvements with the ease of use of Apache Cassandra, and compare it to implementations based on Hadoop and Parquet.
First, the existing Cassandra Spark connector allows one to easily load data from Cassandra to Spark.  We’ll cover how to accelerate queries through different caching options in Spark, and the tradeoffs and limitations around performance, memory, and updating data in real time.  We then dive into the use of columnar storage layout and efficient coding techniques that dramatically speed up I/O for OLAP use cases.  Cassandra features like triggers and custom secondary indexes allow for easy data ingestion into columnar format.  Next, we explore how to integrate this new storage with Spark SQL and its pluggable data storage API.  Future developments will enable extreme analytical database performance, including smart caching of column projections, a columnar version of Spark’s Catalyst execution planner, and how vectorization makes for fast cache- and GPU-friendly calculations — see Spark’s Project Tungsten.

FiloDB is a new open-source database using the above techniques to combine very fast Spark SQL analytical queries with the ease of use of Cassandra.  We will briefly cover interesting use cases, such as:* Easy exactly-once ingestion from Kafka for streaming and IoT applications* Incremental computed columns and geospatial annotations. We’ll discuss how FiloDB improves aggregations needed for choropleth maps over standard PostGIS solutions.

Speakers
avatar for Evan Chan

Evan Chan

Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies.  He is the creator of the FiloDB open-source distributed analytical database, as well as the Spark Job Server.  He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including a columnar real-time distributed query engine. He... Read More →


Monday August 17, 2015 11:00am - 11:40am
Track A

11:00am

Un-collaborative filtering: Giving the right recommendations when your users aren’t helping you
Competitions, such as the Netflix Prize and Kaggle, have driven a great deal of research on recommendation engines. However, many e-commerce datasets lack explicit ratings, consisting solely of binary purchase information containing no labeled negative data. Such data requires special consideration and treatment for both model selection and validation of results. In this talk I will describe implementation of a recommendation system for binary purchase data in Spark’s MLlib, compare fitting and prediction benchmarks for various models, and illustrate the performance differences across different scales of big data. Finally, I will share the lessons learned in how to efficiently select and implement the best recommendation model for your dataset.

Speakers
avatar for Leah McGuire

Leah McGuire

Senior Member of Technical Staff, Salesforce
Leah McGuire is a Senior Member of Technical Staff at Salesforce, implementing data-driven features and recommendations in Salesforce products. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products. She completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of... Read More →


Monday August 17, 2015 11:00am - 11:40am
Track B

11:50am

Caution: The Code You are About to Enjoy is Extremely Hot! Developing Big Data Applications with Scalding
Learn how to use Scalding, a Scala library developed by Twitter, and conquer your Big Data problems. If you are a data scientist looking into better ways of conquering Hadoop, Scalding just may be the gateway to the powerful world of Scala. If you are a seasoned Scala developer, you will see how the functional paradigm and Scala collections extend to Big Data domain, dramatically reducing the size of Big Data programs. There will be no PowerPoint in this talk: we will do some real time drawings and will be running real code; we’ll put all of it into GitHub for everybody to enjoy.

Speakers
avatar for Vladimir Bacvanski

Vladimir Bacvanski

Founder, SciSpike
Dr. Vladimir Bacvanski interest is in better and more productive ways to develop Big Data applications. He is a founder of SciSpike, a company doing custom development, consulting and training and engages clients on both Big Data and Scala topics. His recent projects include Big Data and Internet of Things in healthcare, reactive Big Data and Web Scale systems and introducing Scala, Akka, and Spark in a large financial organization. Vladimir is... Read More →


Monday August 17, 2015 11:50am - 12:10pm
Track B

11:50am

The Live Layer and The Lambda Architecture
How can we take advantage of fast storage systems and fast computational substrates in our big data pipelines? In this talk we’ll propose a ‘Live Layer’ that can solve key problems in the Lambda Architecture by combining fast storage with fast compute. Our Sparkle data visualization system uses a Live Layer. We’ll show some data visualizations, and discuss how Sparkle takes advantage of Cassandra, Spark, and reactive streams.

Speakers
avatar for Lee Mighdoll

Lee Mighdoll

Lee started the online service organization at Nest Labs, maker of the Nest thermostat. Prior to Nest, Lee has served in executive and engineering leadership roles at Apple, Twitter, General Magic, ObjectSpeed and WebTV, and has advised a variety of startups. Lee's current interests are in scalable reliable distributed systems, sensor networks, big data, and machine learning.


Monday August 17, 2015 11:50am - 12:30pm
Track A

12:30pm

Lunch sponsored by Nitro
Red Door Catering,
Oakland CA

Monday August 17, 2015 12:30pm - 1:30pm
Foyer Outside Venue A

1:30pm

Scalding at Shazam
This talk will dive into Scalding, a Scala API to Hadoop, and how it is used at for data processing. It will begin by briefly introducing the core abstractions provided by Scalding and describe typical data flow. Then we will dive deeper into Typed API, Algebird, Matrix API, and unit testing. A motivating example of building a simple recommendation engine will be used.

Speakers
avatar for Dan Osipov

Dan Osipov

Dan Osipov has been working with Scala for the past three years, mostly in the context of data analysis. Before that, he was developing Android applications and web systems. He sometimes blogs about his experiences at http://danosipov.com


Monday August 17, 2015 1:30pm - 2:10pm
Track A

1:30pm

Ask Craig - Building Smarter Applications
Long regarded as the most widely used online classifieds service in the world for jobs, real estate and goods for sale, Craigslist currently serves over 700 geographic regions in more than 70 countries. Given the inconsistency of posting behaviors and categorical tagging from job postings, H2O’s teams wanted to create an application that could contextually predict the right categories to improve listing classifications. Using Spark’s Word2Vec model, and training a Sparkling Water GBM model based on the vectors of over 20,000 job postings, H2O was able to predict 80% of the appropriate job categories.

Speakers
avatar for Hank Roark

Hank Roark

Data Scientist and Hacker, H2O.ai
Hank is a Data Scientist & Hacker at H2O. Hank comes to H2O with a background turning data into products and system solutions and loves helping others find value in their data. He has a deep background in the the application domains of telematics, remote sensing, logistics, manufacturing, agriculture, and the Internet of Things. Before becoming passionate about machine intelligence, Hank managed international software teams and worked as IT... Read More →


Monday August 17, 2015 1:30pm - 2:10pm
Track B

2:20pm

Managing Kafka, the easy way, with help from Play, Akka, and Curator
We will look at a brief intro of Kafka and some of the challenges of maintaining a cluster. We'll then look at operations concerns around adding new nodes to the cluster, swapping out nodes in the cluster, or removing nodes from the cluster and how the Kafka Manager makes this easier. There is no out of the box UI for Kafka which explains the level of interest we've seen for Kafka Manager. It is the 4th most popular project on Yahoo's public github account. Then we'll delve into the internals of Kafka Manager and how it was built around Play Framework, Akka, and Apache Curator. We'll talk about how we use the in-memory state for making some operations quick and what's upcoming/future work.

Speakers
avatar for Hiral Patel

Hiral Patel

Technologist, Yahoo Inc
Hiral's been working with Scala for the past 6 years and Big Data for the past 12 years. He's built data platform's, data intensive applications, and real-time analytics frameworks. Hiral is currently a Senior Principal Architect/Engineer at Yahoo Inc.


Monday August 17, 2015 2:20pm - 2:40pm
Track A

2:20pm

Introducing Java/Hadoop Developers to Scala/Spark
Over the past several years, a variety of education and training programs have taught Java developers how to solve big data problems using the Hadoop MapReduce framework. With the addition of the Spark in-memory processing paradigm, and the associated attractive Scala API, these developers now face the dual challenge of a new programming language and paradigm. In this talk, I will discuss the basic issues faced in this transition, discussing both concepts that carry over and new ones encountered, along with a suggested roadmap for learning these two technologies.

Speakers
avatar for Brad Rubin

Brad Rubin

Director, Center of Excellence for Big Data, University of St. Thomas
Brad Rubin is an Associate Professor at the University of St. Thomas in St. Paul in the Graduate Programs in Software department where he teaches Big Data Architecture, Software Analysis and Design, Computer Security, and Advanced Computer Security. Most recently, he is pursuing a research agenda using the Hadoop ecosystem.


Monday August 17, 2015 2:20pm - 2:40pm
Track B

2:50pm

Building Highly Available & Scalable Search with Kafka, Spark Streaming, and Elastic Search
Historically, achieving things like horizontal scalability, high availability, and fault tolerance all in the same system was difficult for seasoned architects and nearly impossible for novices. But with the latest gen of open source, you can get these qualities for free (or at least cheap). Learn how we're using Scala with Kafka, Spark Streaming, and Elastic Search to build a distributed search service that achieves all of the above.

Speakers
avatar for Vlad Giverts

Vlad Giverts

Sr. Director Software Engineering, Workday
Vlad is a Sr Director at Workday responsible for building prediction and recommendation products for the company's cloud HR and Financial Management systems. He was previously CTO at Identified, a data and analytics startup acquired by Workday in early 2014. Prior to that he was an Architect at Tagged where he built highly scalable backend services including distributed search systems. Vlad holds a BA in Computer Science from UC Berkeley.


Monday August 17, 2015 2:50pm - 3:10pm
Track A

2:50pm

Enabling Enterprise Big Data Analytics with Scala at Alpine
The vision of Alpine Data Labs is to make data science so straightforward that it becomes a tool for business users as well as data scientists. To this goal, we developed an intuitive visual UI which allows users to interact with Hadoop data and perform advanced analytics. However, architecting a highly scalable and effective platform presents some specific challenges including: supporting multiple Hadoop distributions, supporting pig/sql/R/hive/mapreduce/Spark, and showing visual progress for all the analysis. We have leveraged Scala to address each of these issue by building an agent architecture which uses Akka to scale out to different Hadoop distributions and designed an R-Akka Server that allows Alpine to scale out R sessions. We use Spray + Akka to expose our Alpine restful APIs and have implemented Machine Learning algorithms in Spark using Scala. We have also enhanced the Spark Yarn module via Akka messaging as communication channel.

In this talk, we will specifically focus on the Alpine Spark Integration:
  • Submitting a Spark job from a servlet engine
  • Enhancing the Spark client in Yarn cluster mode to enable the Yarn app Listener and the stop Yarn application
  • Yarn resource capacity callback 
  • Messaging Channel for logging, progress, error handling via Akka
  • Re-directing print stream and Spark Job Progress listener to Alpine UI.
  • Job progress live streaming to Alpine UI via websocket

Speakers
avatar for Chester Chen

Chester Chen

Director of Engineering, Alpine Data
Chester Chen is the Director of Engineering and hands on architect at Alpine Data Labs. He manages the analytics platform development as well as contribute to some of the major developments. He has been working with scala on and off since Scala 2.7. He is the founder and organizer of SF Big Analytics Meetup, as well as the main co-organizer of the SF machine learning meetup. Before joining Alpine Data Labs, he had played many roles as Technical... Read More →
avatar for Steven Hillion

Steven Hillion

Co-Founder, Alpine Data Labs
Steven Hillion is the co-founder of Alpine Data Labs, which is dedicated to making advanced analytics scalable, accessible, and operational. | | Steven has been leading large engineering and analytics projects for fifteen years. Before joining Alpine Data Labs, he founded the analytics group at Greenplum, leading a team of data scientists and also designing and developing new open-source and enterprise analytics software. Before that, he was... Read More →


Monday August 17, 2015 2:50pm - 3:10pm
Track B

3:20pm

Title: Scala, FP and Spark - the Perfect Combo for Machine Learning
While FP and Scala have already become the mainstays of middleware, web development and big data stacks (Akka, Play, Kafka, Spark), they tend not to have a big presence in the machine learning and NLP communities. For instance, the emerging deep learning toolkits are mostly Python‐based (Pylearn2, Theano, etc.). The same goes for general-purpose machine learning (Python's scikit-learn, countless R libraries). Performance seekers dissatisfied with slow scripting languages write typed Cython code, contorted C++ libraries bound to scripting language wrappers, or resort to random exotic solutions such as Lua. Some even dispense with all abstraction and write incomprehensible CUDA kernels. There has to be a better way. As a machine learning engineer, I want to write strongly typed functional code. Math has no place for side effects, and I don't want to waste time running a simulation for hours, only to find that I made a typo in my "stringly-typed" script. Unbeknownst to most, Scala's machine learning and NLP ecosystem is growing rapidly, from numeric processing (Spire, Breeze) to big data machine learning (MLLib, Mahout) to GPU‐based text parsing (Puck), to general‐purpose probabilistic programming (FACTORIE). In this talk, I'll do a quick overview of Scala's machine learning ecosystem, and show how easy it is to re-use existing components to build a new, scalable algorithm implementation. If you'd like to see how you can write vectorized linear regression running native BLAS code, based on an SGD/Adagrad implementation written from scratch. capable of running at scale on petabytes of data using Spark, this talk is for you.

Speakers
avatar for Marek Kolodziej

Marek Kolodziej

Principal Research Engineer, Nitro
Marek Kolodziej is a Principal Research Engineer at Nitro, Inc. He's been working on a diverse set of machine learning, distributed computing and big data problems for the past 6 years, and statistics and econometrics for the past 11. His current passion is deep learning and GPU computing. Marek got his PhD in Energy and Environmental Economics from Boston University.


Monday August 17, 2015 3:20pm - 4:00pm
Track A

3:20pm

Why Apache Flink is the 4G of Big Data Analytics Frameworks?

Apache Flink is a community-driven open source and memory-centric Big Data analytics framework.  It provides the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases. 

Flink uses a mixture of Scala and Java internally, has very good Scala APIs and some of its libraries are basically pure Scala (FlinkML and Table).

At its core, it is a streaming dataflow execution engine and it also provides several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML) and graph processing (Gelly).

In this talk, you will learn in more details about:

  1. What is Apache Flink, how it fits into the Big Data ecosystem and why it is the 4G (4th Generation) of Big Data Analytics frameworks? 
  2. How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? 
  3. Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? What are the benchmarking results between Apache Flink and those other Big Data analytics frameworks?

Speakers
avatar for Slim Baltagi

Slim Baltagi

Director, Big Data Engineering Fellow, Capital One
Slim Baltagi is currently a director of Big Data engineering at Capital One in Chicago. He has more than 17 years of IT and business experience and has spent the last four years of his life hadooping and more recently sparking and flinking! He has worked on more than 12 Big Data projects as a solution architect. He enjoys evangelizing Big Data technologies, maintaining a blog and a Big Data Knowledge Base and also running the Chicago Apache Flink... Read More →


Monday August 17, 2015 3:20pm - 4:00pm
Track B

4:20pm

Writing your first Scala near Real-time Streaming application Using Spark Streaming and Running it on top of Mesos
Spark being one of the biggest big data analytics Apache project written in Scala, Spark Streaming is one of its component for near real-time stream computing. This talk will be more of a hands on focusing on Writing your first Spark streaming Job in scala, integrating it with Apache Kafka (to read messages at high throughput), discussing more on Scaling the pipeline to read and process over a million events per second with cheap machines and various fault-tolerant mechanisms associated with it.

Speakers
AV

Ajay Viswanathan

Software Engineer, Sigmoid
Ajay Viswanathan is a Software Engineer at Sigmoid handling Cloud and Infrastructure Management and DevOps with focus on real-time Big data analytics. He enjoys cross-country cycling and photography in his free time.


Monday August 17, 2015 4:20pm - 5:00pm
Track A

4:20pm

TSAR (the TimeSeries AggregatoR) - How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies
Twitter’s 250+ million users generate tens of billions of tweet views per day. Aggregating these events in real time – in a robust enough way to incorporate into our products – presents a massive scaling challenge. In this talk I’ll introduce TSAR (the TimeSeries AggregatoR), a robust, flexible, and scalable service for real-time event aggregation designed to solve this problem and a range of similar ones. I’ll discuss how we built TSAR using Python and Scala from the ground up, almost entirely on open-source technologies (Storm, Summingbird, Kafka, Aurora, and others), and describe some of the challenges we faced in scaling it to process tens of billions of events per day.

Speakers
avatar for Anirudh Todi

Anirudh Todi

Software Engineering Manager, Twitter Inc
At Twitter, Anirudh works on the Data Platform team. Anirudh and his team are chartered with processing and understanding the vast body of data that is generated by the operation of the Twitter platform. Their technologies are used to build a range of cutting-edge services that can process petabytes of data per month in real time for insights into the usage patterns of the Twitter platform. Anirudh has previously worked at Facebook helping scale... Read More →


Monday August 17, 2015 4:20pm - 5:00pm
Track B

5:10pm

Spark After Dark: Generating high-quality Dating Recommendations
Advanced analytics, streaming data pipelines, machine learning, graph processing, and text processing. We'll use the latest Spark libraries including Spark SQL, BlinkDB, Spark Streaming, MLlib, and GraphX - as well as Twitter's Algebra for sketch algorithms, probabilistic data structures, and approximations. https://www.youtube.com/watch?v=g0i_d8YT-Bs

Speakers
avatar for Chris Fregly

Chris Fregly

Research Scientist, PipelineIO
Chris Fregly is Founder and Research Scientist at PipelineIO - a Streaming Machine Learning and Artificial Intelligence Startup in San Francisco. | | Chris is a regular speaker at many conferences and Meetups throughout the world. He’s also an Apache Spark Contributor, Netflix Open Source Committer, and Founder of the Global Advanced Spark and TensorFlow Meetup, and Author of the upcoming O'Reilly Video Series on Deploying and... Read More →


Monday August 17, 2015 5:10pm - 5:50pm
Track A

5:10pm

Quantifind's Story: Building Custom Interactive Data Analytics Infrastructure
Building interactive data analytics products on top of large volumes of data is challenging. This talk will outline Quantifind's infrastructure story for building analytics software. We will describe the infrastructure that we originally built on top of existing systems such as Spark/Hadoop. However, the bulk of this talk will focus on a new, custom distributed system that we have built in-house for our predictive analytics software. This new system includes a distributed, in-memory, interactive computing platform that supports fast interactive querying against large volumes of compacted raw data that isn't pre-aggregated. This infrastructure can be viewed as an in-memory combination of map/reduce style computation and indexed structures built from bit sets to offsets in the compacted off-heap data. Akka Cluster sits at the core of this system for distributed communication between nodes. We will discuss why we chose to build our own system as well as tips and tricks that we've learned along the way for pushing the JVM for these types of systems.

Speakers
avatar for Ryan LeCompte

Ryan LeCompte

Software Engineer, Quantifind
Ryan LeCompte is the infrastructure tech lead at Quantifind. Ryan has been developing backend systems and infrastructure with Scala for the past two years at Quantifind. Interests include concurrency, distributed programming, and data structures.


Monday August 17, 2015 5:10pm - 5:50pm
Track B

6:00pm

Big Data Scala Beer and Wine Reception
Join By the Bay staff, speakers and sponsors for a meet-and-greet reception featuring passed appetizers and a selection of local California beers and wines.

Monday August 17, 2015 6:00pm - 8:00pm
Kaiser Rooftop Garden 300 Lakeside Drive
 
Tuesday, August 18
 

8:00am

Breakfast
Red Door Catering
Oakland, CA 

Tuesday August 18, 2015 8:00am - 9:15am
Foyer Outside Venue A

8:00am

Registration
Tuesday August 18, 2015 8:00am - 3:00pm
Foyer Outside Venue A

8:15am

Coffee Sponsored by Typesafe
Tuesday August 18, 2015 8:15am - 5:30pm
Foyer Outside Venue A

8:45am

Opening Remarks and Updates
Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Chief Scientist, Nitro/By the Bay
Chief Scientist at Nitro, founder and organizer, SF {Scala, Text, Spark, Reactive}, {Scala, Big Data Scala, Text, Data, ...} By the Bay.


Tuesday August 18, 2015 8:45am - 9:00am
Track A

9:00am

Keynote IV: New Developments in Spark
Spark has a simple and high-level API, but one thing this enables is making big changes inside the engine without affecting user code. Over the past year, our team has been working on some of the biggest changes to Spark since its creation, improving both its performance and usability and debuggability.

I plan to cover some of the latest updates to Spark, including Project Tungsten for off-heap memory management, I/O layer improvements, and new monitoring tools. I'll also discuss new APIs that allow even richer optimization.

Speakers
MZ

Matei Zaharia

Matei is an assistant professor at MIT and CTO of Databricks, the company commercializing Apache Spark. He started Spark as a research project at UC Berkeley and has been involved in the big data community since 2007, through projects including Hadoop, Mesos and Shark.


Tuesday August 18, 2015 9:00am - 9:30am
Keynote

9:40am

Keynote V BDS: Apache Kafka and the Rise of The Stream Data Platform
What happens if you take everything that is happening in your company—every click, every database change, every application log—and make it all available as a real-time stream of well structured data?

I will discuss the experience at LinkedIn and elsewhere moving from batch-oriented ETL to real-time streams using Apache Kafka. I’ll talk about how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. I will cover some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, applications, and data systems in a self-service fashion.

I will describe how real-time streams can become the source of ETL into Hadoop or a relational data warehouse, and how real-time data can supplement the role of batch-oriented analytics in Hadoop or a traditional data warehouse.

I will also describe how applications and stream processing systems such as Storm, Spark, or Samza can make use of these feeds for sophisticated real-time data processing as events occur.

Speakers
avatar for Jay Kreps

Jay Kreps

Jay (@jaykreps) is co-founder and CEO at Confluent. Prior to Confluent, Jay Kreps was the initial developer on several open source projects, including Apache Kafka, Apache Samza, Voldemort. He was the lead architect for data infrastructure at LinkedIn.


Tuesday August 18, 2015 9:40am - 10:10am
Keynote

10:20am

Interactive Spark in your Browser
Supporting running Spark scripts directly from a browser would bring the user experience up. Indeed, everybody has a Web navigator, the command line can be avoided, built-in graphing and visualization make it easy to explore and understand data with just a few clicks. This also simplifies the administration as now everything becomes centralized in a service and is accessible by non native clients. For this purpose, an open source Spark Job Server was developed in order to provide Scala, SQL and Python in a Web shell. The main Hadoop components of the platform are also integrated in the same interface. This talk describes the architecture of the Spark Server and its main features: # Scala, Python, SQL submissions # Impersonation # Security # Job progress / canceling # YARN / HDFS / Hive integration The server also ships with a friendly user interface built as a Hue app. We will focus on explaining how they were built, how to use the API and which lessons were learned. The final end user interaction will be live demoed.

Speakers
avatar for Romain Rigaux

Romain Rigaux

Cloudera
Romain is an engineer at Cloudera and the Lead of Hue. Before he worked on distributed systems at Yahoo! and Google and has been building Web apps since the early days.
avatar for Erick Tryzelaar

Erick Tryzelaar

Erick Tryzelaar is a Software Engineer working on Hue. Before | Cloudera, he worked on the collaborative graph tool Dendrite at | In-Q-Tel/Lab41, Zynga's configuration management system, and Pixar's | render farm.


Tuesday August 18, 2015 10:20am - 10:50am
Track A

10:20am

Developing Spark SQL Integration for MongoDB using Spark's External Datasource API
The external data sources API introduced with Apache Spark 1.2.0 provides a clean and systematic way to integrate a wide range of external database systems with Spark SQL. MongoDB provides an interesting challenge for such integration because its data model, based on JSON, involves no prescriptive schema and is aggressively non-rectangular. This presentation will cover the integration issues from the viewpoint of a "Spark outsider": a moderately competent Scala programmer who is an early adopter of the external data source API, but not familiar with Spark internals. The target audience is such developers who need to integrate the database of their choice as quickly and practically as possible. Topics: • The external data source API (including significant enhancements coming in Spark 1.3.0) • The SchemaRDD mechanism (to become DataFrame in Spark 1.3.0) • MongoDB, its data model and Scala API (Casbah) • The implementation approach, including efficient schema inference, filter and projection push-down, and data partitioning • Examples of querying MongoDB through Spark SQL, HiveQL and DataFrame Lots of Scala code samples will be based on the NSMC project: https://github.com/spirom/spark-mongodb-connector (This probably makes sense as either a full length talk or a tutorial, although a half-length talk could provide some value. A lightning talk is unlikely to make sense.)

Speakers
avatar for Spiro Michaylov

Spiro Michaylov

Development Manager, Tableau Software
Spiro Michaylov is a development manager in the data platform organization at Tableau Software in Kirkland, Washington. He has been working in distributed systems and big data for almost twenty years, developing compilers for parallel scientific computing, high frequency trading infrastructure in the securities industry, and parts of the ETL and data integration platform at Ab Initio Software. He designed several of the enterprise DBMS features... Read More →


Tuesday August 18, 2015 10:20am - 10:50am
Track B

11:00am

Scalable Analytics of Machine Data
Datacenters generate a voluminous amount of machine data ranging from performance metrics, workload activities, resource utilization, system configuration, topologies, events, logs, and failures. Analysis of such data can yield actionable insights for system admins and IT decision-makers to improve efficiency and reduce risk in their infrastructure. CloudPhysics has built a SaaS application which receives machine data from hundreds of thousands of servers around the world and provides data-driven IT analytics. As machines can generate data much faster than humans, building a data pipeline to handle this firehose presents unique challenges. This talk covers our experience in building a scalable analytics back-end for both real-time streaming and batch analysis of machine data, using Scala, Spark, and NoSQL technologies on AWS. We will discuss a unified modeling and analysis framework for heterogeneous, dynamic, semi-structured machine data. We will share the characteristics of our analytical workload, the scaling principles learned through iterations of the back-end, and efficiency gains achieved.

Speakers
avatar for Xiaojun Liu

Xiaojun Liu

Xiaojun Liu is a co-founder and Chief Scientist at CloudPhysics and focuses on building the machine data analytics back-end that generates actionable insights for users. Prior to CloudPhysics he worked at Google, Salesforce.com, and Sun Microsystems on performance engineering and system modeling and simulation. He holds M.Eng. and B.Eng. degrees from Tsinghua University in Beijing and a Ph.D. in EECS from UC Berkeley.


Tuesday August 18, 2015 11:00am - 11:20am
Track B

11:00am

Economical machine learning via functional programming
Machine learning technologies play an important role in modern software systems, especially as a complement to platforms capable of supplying abundant training data. However, machine learning is notoriously difficult to work with from a software engineering perspective, compounding existing technical debt with new sources of uncertainty and complexity. Technical debt management is therefore crucial when using machine learning, placing a premium on basic correctness, minimization of accidental complexity, and abstractions about which one can easily reason. Fortunately, functional programming ideas can help us achieve these goals. We will discuss how functional programming design patterns can be applied to machine learning and data processing problems, with examples using Scala and the Scalaz library.

Speakers
avatar for David Andrzejewski

David Andrzejewski

Data Sciences Engineer, Sumo Logic
David Andrzejewski is a Data Sciences Engineer at Sumo Logic and co-organizer of the SF Bay Area Machine Learning meetup group. Prior to Sumo Logic, David held a postdoctoral research position working on knowledge discovery at Lawrence Livermore National Laboratory (LLNL). He completed his PhD in Computer Sciences at the University of Wisconsin-Madison in 2010, where he had also previously received an M.S. in Computer Sciences and a B.S. in... Read More →


Tuesday August 18, 2015 11:00am - 11:40am
Track A

11:20am

Creating an index of all US Small and Medium Business with Scala and Spark
Radius Intelligence (www.radius.com) empowers Data Science to deliver an unique marketing intelligence platform used by hundreds of US companies. At Radius we have moved our entire data processing platform from Hadoop to Spark and this presentation will discuss how data scientists, data engineers and product managers come together to explore data and build new data processing and predictive models on top of our database of tens of millions of US businesses. The presentation will explain how Spark is used to deliver high speed matching across hundreds of millions of records leveraging Scala / Spark for data processing and how MLLib machine learning libraries are used to resolve and impute values for the Index.

Speakers
avatar for Thomas Gerber

Thomas Gerber

Thomas Gerber is a Big Data Engineer lead @ Radius, where he crunches lots of data on lots of machines, using Spark and Scala. | | He was a Solution Architect for 6 years at search engine software editor Exalead (acquired by Dassault Systemes), which gave him the passion for distributed systems. | | Thomas also was cofounder and CTO of AODocs, which provides Smart Document Management as a service, on top of Google Drive. 


Tuesday August 18, 2015 11:20am - 11:40am
Track B

11:50am

A simple breakdown of the Java Memory Model, and how it applies to asynchronous, non-blocking, concurrent and parallel applications
Rich Hickey has previously discussed value, identity and state – but outside the context of the specifics of how we write code on the JVM. How do these concepts correlate to the programming constructs we use to write code in Java and Scala, and how do we minimize or mitigate the impact of concurrency with them. This talk will be an accessible exploration of the JVM Heap, thread stacks and concurrency primitives on the JVM and how to compose multi-threaded code in Scala.

Speakers
JA

Jamie Allen

Jamie is the Senior Director of Global Services for Typesafe, responsible for the enablement of customers around the world through consulting and training. He is the author of Effective Akka book from O’Reilly, and the co-author of the upcoming Reactive Design Patterns book from Manning. Jamie is a computer languages enthusiast who enjoys writing performant code that most efficiently leverages the resources at hand.


Tuesday August 18, 2015 11:50am - 12:30pm
Track A

11:50am

An Introduction to NLP4L: Natural Language Processing Tool for Apache Lucene
NLP4L is a natural language processing tool for Apache Lucene written in Scala. The main objective of this OSS project is to use NLP technologies to improve Lucene users' search experience. The unique difference between NLP4L and other NLP tools is that NLP4L itself references Lucene index instead of raw text as its processing target. The session will describe how to use Scala to obtain word statistics and N-gram statistics from Lucene index and introduces various tools including the Hidden Markov model that are developed using these functions.

Speakers
avatar for Koji Sekiguchi

Koji Sekiguchi

Founder & CEO, RONDHUIT Co.,LTD.
Graduated from Chiba University, the Department of Electrical and Electronics Engineering, the Faculty of Engineering. Worked for several software companies before founding RONDHUIT, the Apache Lucene/Solr consulting/training service provider. Building own career in search engine industry, natural language processing was added to the target of interest. Enrolled in Japan Advanced Institute of Science and Technology in 2012 to major in natural... Read More →


Tuesday August 18, 2015 11:50am - 12:30pm
Track B

12:30pm

Lunch sponsored by Cloudera
Red Door Catering,
Oakland CA 

Tuesday August 18, 2015 12:30pm - 1:30pm
Foyer Outside Venue A

1:30pm

WOLFE: A Declarative Machine Learning Stack
Performing machine learning with existing toolkits on large datasets is quite a frustrating experience: each toolkit focuses on its own subclass of machine learning techniques, have their own different interfaces to how much of the underlying system is surfaced to the user, and don't support the iterative development that is required to tune machine learning algorithms and achieve satisfactory predictors. In this talk we present Wolfe, a declarative machine learning stack consisting of three crucial components: (1) Language: a math-like syntax embedded in Scala to concisely specify arbitrarily complex machine learning systems that unify most existing, and future, techniques, (2) Interpreter that transforms the declarative description into efficient code that scales to large-datasets, and (3) REPL: A new iPython-like IDE for Scala that supports the unique features for machine learning such as visualizing structured data, probability distributions, and state of optimization. (joint work with Sebastian Riedel, Tim Rocktaschel at UCL)

Tuesday August 18, 2015 1:30pm - 2:10pm
Track A

1:30pm

NLP with Scala at D2C
D2C is a mobile marketing and advertising company in Japan. We are now developing the original Natural Language Processing (NLP) engine and Text mining system for a mobile advertising based on the Apache Spark, Scala, Play2, which are planed to be launched in April of 2015. In search advertising, it is important and also difficult to select the appropriate keywords for ads. Therefore we are studying how to estimate appropriate keywords for each ad from search query logs. For these issues, we use several text-mining algorithms via Apache Spark and Scala. We will show you the results about these themes

Speakers
avatar for Masaki Rikitoku

Masaki Rikitoku

Expert, D2C
Masaki Rikitoku is an NLP engineer by profession. He is currently employed as a data engineer at D2C, where he is developing a Japanese NLP engine and big data processing environments for mobile advertising. Before joining D2C, he had researched and developed a text mining tool for Enterprise Information Retrieval and an in-memory columnar aggregation engine for Business Intelligence.


Tuesday August 18, 2015 1:30pm - 2:10pm
Track B

2:20pm

DASE – A Design Pattern for Machine Learning with Scala, Spray and Spark
In this talk, we will introduce the latest developments of open source machine learning in relation to Scala, Apache Spark, MLlib and PredictionIO. We will show how such tools can be used to build and deploy predictive engines in real production environments. Introducing the DASE design pattern, we will illustrate how developers, data engineers and data scientists can build machine learning applications with separation of concern (SoC) in mind. “D" stands for Data Source and Data Preparator, which take care of the preparation of data for model training. “A" stands for Algorithm, which is where the code of one or more algorithms are implemented with native supported for MLlib and other machine learning libraries. “S” stands for Serving, which handles the application logic during the retrieval of predicted results. Finally, “E” stands for Evaluation. Finally, we will cover upcoming developments, including new Engine Templates for various business scenarios.


Speakers
avatar for Simon Chan

Simon Chan

CEO, PredictionIO
Simon Chan is a co-founder of PredictionIO, with years of experience in the tech industry in London, Hong Kong, Mainland China and Silicon Valley. His doctoral research work at University College London was on machine learning techniques for large-scale user preference prediction in noisy non-experimental environments.


Tuesday August 18, 2015 2:20pm - 2:40pm
Track A

2:20pm

Building a Health Data platform for the future
Bringing our health data together from many sources can reveal powerful insights about our health, but proves to be a staggering technical challenge. We are building a system to organize and process health data at scale. This talk will cover how we build this platform using a variety of technologies including Scala, Kafka, Spark, Mesos and Marathon to build a platform for future growth and stability.

Speakers
avatar for Ola Wiberg

Ola Wiberg

Co-founder, Human API
@olawiberg As Co-founder and VP of Engineering at Human API Ola Wiberg is responsible for infrastructure development, data management, and information security. 


Tuesday August 18, 2015 2:20pm - 2:40pm
Track B

2:50pm

Large volume data analysis on the Typesafe Reactive Platform
This talk focuses on processing of large volumes of data on Typesafe platform using Scala, Akka, Reactive Streams and Spark, mainly to utilise machine learning algorithms in parallel or distributed environment. It aims to explain different parallel and distributed programming models, use cases, developer considerations, internals and important ideas and concepts behind mentioned frameworks in developer level detail. Focuses on usability of the mentioned approaches for machine learning pipelines and efficient and scalable analytics over large amounts of data. It also demonstrates usage of mentioned principles on a large scale sensor event processing application use case from practice (using Scala, Akka Cluster, Reactive Streams, CQRS, event sourcing, Spark, machine learning and more). NOTE: Similar talk submitted for SBTB. This talk would concentrate more on the Big Data, ML and how Scala and the Typesafe technologies fit these use cases.

Speakers
avatar for Martin Zapletal

Martin Zapletal

Cake Solutions
Martin Zapletal is heading up the technical team of Cake Solutions Inc. in New York. Martin specialises in the design and implementation of reactive, scalable, resilient distributed systems, machine learning and working with large amounts of data. He also has background in functional programming and promotes the use of functional languages, mainly Scala, and patterns in enterprise level production systems.


Tuesday August 18, 2015 2:50pm - 3:10pm
Track A

2:50pm

Lambda architecture - a blueprint for building big data systems
The lambda architecture is more than one weird trick to enable querying massive data sets in real-time. It is a generalized blueprint for building data systems that fit your business requirements. It has been called a number of different names since the dawn of computer science: event sourcing, command query responsibility separation, materialized views. The main idea behind this pattern has stayed the same: keep your data processing functional. The goal of this talk is to show you how to leverage this fundamentally functional approach to data processing to build systems that scale, both to load and to business requirements.

Speakers
avatar for Jieren Chen

Jieren Chen

Jieren is shamelessly attention deficit, and has worked on everything from an iPhone remote desktop application at iTeleport (now Screenhero,) to rebuilding Klout’s social network data processing platform. He currently works at Whil, building the future of meditation. Oh look, a squirrel.


Tuesday August 18, 2015 2:50pm - 3:10pm
Track B

3:20pm

Scala data pipelines at Spotify
This talk takes the audience through a brief history of music recommendation pipelines at Spotify. We'll present how we moved from a mostly Python shop to mostly Scala for data science, the lessons learned, and where we are going next. We'll also dig a bit into the technical details of scaling music recommendation algorithms in Scala.

Speakers
avatar for Neville Li

Neville Li

Software engineer, Spotify
Neville is a software engineer at Spotify who works mainly on data infrastructure and tools for machine learning and advanced analytics. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and Parquet. Before that he worked on search quality at Yahoo! and old school distributed systems like MPI.


Tuesday August 18, 2015 3:20pm - 4:00pm
Track A

3:20pm

Map/Reduce as an Example of Programming with Categories
Category Theory is normally considered conceptually, inspiring data structures, but it also has a computational side (viz. monads). The question then comes, can one consider categories as a programming language and what would that look like? Deferring the general question, this will show a reduction of the Map/Reduce programming paradigm to categories using a lot of basic categorical machinery, such as monoids, free monoids, functors and fibres and then show how the resulting, fully functional, program, implemented in Scala, can be further reduced to execute on multiple platforms, such as Hadoop and Spark. This will provide a practical view of categories for the working non-mathematician and should be at least amusing, if not enlightening.

Speakers
MF

Matthew Fuchs

Director of Research, salesforce.com
Matthew Fuchs is a Director of Research at salesforce.com but mostly plays Data Scientist. He's been at startups and big companies, worked on big data, speech recognition systems, cloud computing, mobile objects, functional programming, some logic calculi, events and continuations, the invention of XML, and various other desiderata. Category Theory has been an occasional obsession for years.


Tuesday August 18, 2015 3:20pm - 4:00pm
Track B

4:20pm

A Gentle Introduction to Apache Spark and Locality Sensitive Hashing
Apache Spark is an increasingly popular Big Data computation platform which lets developers be more productive than Map-Reduce. But what of writing programs that run fast ? We will see in this talk how we can find approximate nearest neighbours in a web log quickly with hashing. We'll also use this journey as a reason to study and employ the basic tenets of how to write a fast Spark program. Employed in cases suffering from “the curse of dimensionality” (where feature dimensions are untractably many), locality-sensitive hashing is a technique that can help find approximate nearest neighbours by simply hashing examples in a clever way. We will see how to use it to find close user behaviours in a web log efficiently by exploiting the parallelism offered by Spark. Along the way, we will encounter the landmark notions of how to write efficient Spark programs in Scala, including partition-specific commands, variable capture avoidance, early filters, sparse shuffling, and broadcast variables. After this talk, attendees will have been acquainted with a very easy to parallelise technique with many other uses (e.g. de-duplication), and have a couple more techniques in their grab bag for removing the bottlenecks in their Spark programs.

Speakers
avatar for François Garillot

François Garillot

François Garillot joined Typesafe in 2012 after an early stint in research, where he spoke frequently at international conferences. He is now working in Typesafe's Spark team, leveraging his Scala knowledge to improve Spark's support for scalable machine learning and data science applications. Based in Lausanne, he speaks at Swiss conferences and Scala user groups in Lyon and Paris. He recently spoke at Strata Hadoop Barcelona on how to... Read More →


Tuesday August 18, 2015 4:20pm - 5:00pm
Track A

4:20pm

Reactive Stream Processing with Kafka-rx
Kafka enables high throughput log distributed processing while Reactive Extensions enables push systems through rich functional patterns. Let's see where the two meet through examples from a small library, and explore the foundations of stream processing.

Speakers
avatar for Thomas Omans

Thomas Omans

Team Lead, Commission Junction
engineering lead @ commission junction & author of kafka-rx


Tuesday August 18, 2015 4:20pm - 5:00pm
Track B

5:00pm