location_city Sydney schedule Sep 18th 02:00 - 02:30 PM place Grand Lodge people 83 Interested

This presentation explains the concept of Kappa and Lambda architectures and showcases how useful business knowledge can be extracted from the constantly flowing river of data.

It also demonstrates how a simple POC could be built in a day with only getting your toes wet by leveraging Docker and other technologies like Kafka, Spark and Cassandra.


Outline/Structure of the Demonstration

After a brief introduction to Kappa/Lambda a live demo will be performed. It will include a short explanation of each component involved (Web Service, Kafka, Spark Streaming and Cassandra) and their setup (using Docker-Compose). Additionally, it will highlight the data flow using as an example a modified version of Kaggle Expedia data set. Finally, it will discuss the pros and cons for several business scenarios.

Learning Outcome

Audience will learn the concepts of Kappa and Lambda architectures. It will also facilitate them the identification of business cases most suited for those types of architectures. Additionally, they will walk out with a functional POC code (Github repository) that they could extend and adapt for their use.

Target Audience

Developers and technical managers interested in discovering how to easily get business value from their real-time data.

Prerequisites for Attendees

Generic knowledge of data processing and Big Data technologies.



schedule Submitted 4 years ago

Public Feedback

    • Davor Bonaci

      Davor Bonaci - Realizing the Promise of Portable Data Processing with Apache Beam

      Davor Bonaci
      Davor Bonaci
      Sr. Software Engineer
      Google Inc.
      schedule 4 years ago
      Sold Out!
      30 Mins

      The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to "run-anything-anywhere".

      This talk will briefly cover the capabilities of the Beam model for data processing, as well as the current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services, Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the challenges Beam aims to address in the future.