About ABIS
All CoursesbalkjeGeneral » Introduction to HW & SW » Soft skills » TracksOperating systems » MVS - z/OS » UNIX - Linux - AIX » Mac OS X » iPad and iPhone iOSDatabases and middleware » Relational databases & SQL » DB2 for z/OS » DB2 for LUW » Oracle » SQL Server » MySQL & MariaDB » IMS » CICS » IBM MQ » WebSphere » Big data and analyticsApplication development » Methods and techniques » TOGAF » PRINCE2 » Agile development and Scrum » Programming languages » Internet development » Object Oriented systems » Java » Development tools » SAS » XML » SOA & web servicesSystems management » ITIL » SecuritybalkjePractical informationRegistration 
Big data in practice using Spark


Nowadays everybody seems to be working with "big data". Also you would like to interrogate your large data sources (click streams, social media, relational data, sensor data, ...) and are experiencing the shortcomings of traditional data tools. Maybe you want the processing power of a cluster, and parallel computing, to analyse your distributed data stores.

If fast prototyping and processing speed are a priority, Spark will most likely be the platform of your choice. Apache Spark is an open source processing engine focusing on low latency, ease of use, and analytics. It's an alternative to the slower MapReduce approach delivered by e.g. Hadoop (cf our course Big data in practice using Hadoop).

This course builds on the topics which are set forth in the Big data concepts course. You will get hands-on practice on linux with Spark and its libraries for machine learning and visualisation. You learn how to implement robust data processing in Scala with an SQL-style interface, and with the other API's for Java and Python.

After successful completion of the course, you will have sufficient basic expertise to set up a big data environment, to import data into it, and to interrogate it using Spark. You will be able to write simple Scala and SparkSQL programs that use the MLlib and GraphX libraries.

Main topics

  • Motivation for Spark & base concepts
  • The Apache Spark project and its components
  • Getting to learn the Spark architecture and programming model
  • Data sources
  • Learn how to access data residing in Hadoop HDFS, Cassandra, or HBase
  • Interfaces
  • Working with the several programming interfaces and the web interface
  • Writing and debugging programs for simple data analytic problems
  • Short introduction to Hadoop HDFS, HBase, and Cassandra

Intended for

Whoever wants to start practising "big data": developers, data architects, and anyone who needs to work with big data technology.


Familiarity with the concepts of data stores and more specifically of "big data" is necessary; see our course Big data concepts. Additionally, minimal knowledge of SQL and UNIX are useful. Experience with at least one programming language (Java, PHP, Python, Scala, C++ or C#) is a must.

Training method

Classroom instruction, with practical examples and supported by extensive practical exercises.

Course leader

Peter Vanroose.


2 days.


You can enrol by clicking on a date
datedur.lang.  location  price
30 Nov2?Leuven  (BE)1000 EUR  (excl. VAT) 
11 Dec2NWoerden  (NL)1000 EUR  (exempt from VAT)