About ABIS
All CoursesbalkjeGeneral courses » Introduction to HW & SW » Soft skills » TracksOperating systems » MVS - z/OS » Linux - UNIX » Mac OS X » iPad and iPhone iOSDatabases and middleware » Relational databases & SQL » Db2 for z/OS » Db2 for LUW » Oracle » SQL Server » MySQL & MariaDB » IMS » CICS » IBM MQ » WebSphere » Data Science, Big Data and AnalyticsApplication development » Methods and techniques » TOGAF » PRINCE2 » Agile development and Scrum » Programming languages » Internet development » Object Oriented systems » Java » Development tools » SAS » XML » SOA & web servicesSystems management » ITIL » SecuritybalkjePractical informationRegistration 
Big data in practice using Spark

Nowadays everybody seems to be working with "big data". Also you would like to interrogate your large data sources (click streams, social media, relational data, sensor data, ...) and are experiencing the shortcomings of traditional data tools. Maybe you want the processing power of a cluster, and parallel computing, to analyse your distributed data stores.

If fast prototyping and processing speed are a priority, Spark will most likely be the platform of your choice. Apache Spark is an open source processing engine focusing on low latency, ease of use, and analytics. It's an alternative to the slower MapReduce approach delivered by e.g. Hadoop (cf our course Big data in practice using Hadoop).

This course builds on the topics which are set forth in the Big data architecture and infrastructure course. You will get hands-on practice on linux with Spark and its libraries for machine learning and visualisation. You learn how to implement robust data processing in Scala with an SQL-style interface, and you will hear about the other API's (for Java, Python, and R).

After successful completion of the course, you will have sufficient basic expertise to set up a big data environment, to import data into it, and to interrogate it using Spark. You will be able to write simple Scala and SparkSQL programs that use the MLlib, GraphX, and Streaming libraries.


No public sessions are currently scheduled. We will be pleased to set up an on-site course or to schedule an extra public session (in case of a sufficient number of candidates). Interested? Please contact ABIS.

Intended for

Whoever wants to start practising "big data": developers, data architects, and anyone who needs to work with big data technology.


Familiarity with the concepts of data stores and more specifically of "big data" is necessary; see our course Big data architecture and infrastructure. Additionally, minimal knowledge of SQL and UNIX are useful. Minimal experience with at least one programming language (Java, PHP, Python, Scala, C++ or C#) is a must.

Main topics

Training method

Classroom instruction, with practical examples and supported by extensive practical exercises.


2 days.

Course leader

Peter Vanroose.


Voldoende, voorbeelden waren helder en relevant. Ik had graag echter meer tijd besteed aan concreet oefenen met de stof. (, )
I learn a lot from this training. Quite useful knowledge and can lead my following self-study. (, )
Obtain an overview of Spark and its capability. Some trying-out exercises to better know how spark works. (, )
I thought the first day went a bit too slow. I guess the content is quite broad, as was the audience, so many things were explained in ample detail and in a lengthy way. The second day was much nicer as it was more to the point of Spark. (, )