Introduction to Apache Spark
Livecoding
Fri 12:10 - 12:45
You need a laptop
Atlas 2
Software Design
Summary
This livecoding session introduces Apache Spark and is aimed at seasoned developers with an interest in understanding the streaming data pipelines that power today’s real-time analytics engines.
Apache Spark is the open-source cluster computing framework that has largely replaced Hadoop in recent years. It features in-memory processing and streaming capabilities as well as an SQL interface and a mature set of tools for machine learning and graph processing workloads.
We’ll first take a look at how to build a few basic static pipelines using Spark’s new DataSet API. Towards the end, we’ll examine a relatively complex Kafka-Spark-Cassandra streaming pipeline that more closely mimicks a real-life high-load production setting.
Who is it for?
AlbertArchitect
ChrisCTO
DianaDevOps
MeganManager
TamaraTeam
Leader
Leader
DavidDeveloper
BiancaBusiness
Analyst
Analyst
TudorTester
Who is it for?
AlbertArchitect
ChrisCTO
DianaDevOps
MeganManager
TamaraTeam
Leader
Leader
DavidDeveloper
BiancaBusiness
Analyst
Analyst
TudorTester
Leave a Reply