Spark SQL Training

Spark SQL
Learn a widely used framework, Spark SQL, in a few days and describe complex
distributed applications with ease
French / English
Certificate
Overview
This training allows developers and architects to write
complex distributed applications that enable better decisions to be made faster and
decisions and actions in real time, applied to a wide variety of use cases, architectures and
variety of use cases, architectures and industries
complex distributed applications that enable better decisions to be made faster and
decisions and actions in real time, applied to a wide variety of use cases, architectures and
variety of use cases, architectures and industries

Prerequisite
Good knowledge of the Java language
Knowledge of functional programming and knowledge of database on database management
Goals
Master the fundamental concepts of Spark
Develop applications with Spark Streaming
Doing parallel programming with Spark on a cluster - Exploit data with Spark SQL
Have a first approach to Machine Learning
Training Program
INTRODUCTION TO APACHE SPARK
- History of the Framework
- The different versions of Spark (Scala, Python and Java)
- Comparison with the Apache Hadoop environment
- The different modules of Spark
- The different versions of Spark (Scala, Python and Java)
- Comparison with the Apache Hadoop environment
- The different modules of Spark
PROGRAMMING WITH RESILIENT DISTRIBUTED DATASET (RDD)
- Presentation of RDDs
- Creating, manipulating and reusing RDDs
- Accumulators and broadcast variables
- Using partitions
- Creating, manipulating and reusing RDDs
- Accumulators and broadcast variables
- Using partitions
HANDLING STRUCTURED DATA WITH SPARK SQL
- SQL, DataFrames and Datasets
- The different types of data sources
- Interoperability with RDDs
- Performance of Spark SQL
- JDBC/ODBC server and Spark SQL CLI
- The different types of data sources
- Interoperability with RDDs
- Performance of Spark SQL
- JDBC/ODBC server and Spark SQL CLI
SPARK ON A CLUSTER
- The different types of architecture: Standalone, Apache Mesos or
Hadoop YARN
- Configure a cluster in Standalone mode
- Packing an application with its dependencies
- Deploying applications with Spark-submit
- Size a cluster
- Configure a cluster in Standalone mode
- Packing an application with its dependencies
- Deploying applications with Spark-submit
- Size a cluster
ANALYSE IN REAL TIME WITH SPARK STREAMING
- How it works
- Presentation of Discretized Streams
- The different types of sources
- Manipulation of the API
- Comparison with Apache Storm
- Presentation of Discretized Streams
- The different types of sources
- Manipulation of the API
- Comparison with Apache Storm
HANDLING GRAPHS WITH GRAPHX
- Presentation of GraphX
- The different operations
- Creating graphs
- Vertex and Edge RDD
- Presentation of different algorithms
- The different operations
- Creating graphs
- Vertex and Edge RDD
- Presentation of different algorithms
MACHINE LEARNING WITH SPARK
- Introduction to Machine Learning
- The different classes of algorithms - Presentation of Spark ML and MLlib
- Implementations of the different algorithms in MLlib
- The different classes of algorithms - Presentation of Spark ML and MLlib
- Implementations of the different algorithms in MLlib
Project
01.
Model and render data
02.
Create interactive dashboards
03.