Big data architecture and infrastructure

These days everybody seems to be working with "big data". But what does this mean precisely? What kind of data are we speaking about? Which infrastructure does one need for it? And what does it buy us? During this training, we are pursuing answers to these questions!

Data is gradually becoming more and more vital to any kind of enterprise. Analysing large amounts of data aimed at optimizing enterprise processes, marketing, important decisions, ... is not new. But because of the steadily increasing data volumes, the increasing diversity of data sources, and the broader availability of data, such an analysis is expecting always more from the infrastructure, the software, and the data models. In so far even that it seems like a new framework will be necessary. The traditional, established relational model seems to fall short in describing and guiding the new challenges of "data analysis for business intelligence".

"Big data analytics" is the name of this coordinating framework, in which both old models and techniques (like date warehousing, online analytic processing, Hadoop, cluster analysis, ...) and newer insights (data in motion, emotional text analytics, ...) have found each other. The capability to condense relevant insights from more diverse, larger, and rapidly changing data, can help managers and other decision makers to better support their decisions.

This course gives a general picture of big data and what it represents: an overview is given of the technologies on which it is based, and the frequently heard technological terms which we need to get acquainted with are placed in context and perspective.


No public sessions are currently scheduled. We will be pleased to set up an on-site course or to schedule an extra public session (in case of a sufficient number of candidates). Interested? Please let us know.

Intended for

The course is designed for everybody who wants to learn about big data: IT personnel, people confronted with big data technologies. Also for non (IT) technical collaborators.


Elementary knowledge of database management systems is an advantage.

Main topics

  • Introduction: about data, databases, and data warehouses - and now big data
  • What is big data?
  • Perspective: problem formulation - why big data?
  • data centric management
  • the 4 Vs: volume, variety, velocity, variability - types of data - examples
  • data quality, consistency, and reliability (veracity)
  • Big data architecture - components - technologies - towards an integrated data architecture
  • Overview of new data sources: web statistics ("click streams"); social media; Twitter feeds; Google Maps; sensor data (e.g. surveillance cameras) ant the Internet of Things (IoT); ...
  • NoSQL databases versus relational databases - types and use - and popular today: MongoDB, Cassandra, ...
  • Big Data Frameworks
  • The "divide & conquer" model: Hadoop and MapReduce - distribute data and analyse it through massively parallel algorithms
  • Spark: in-memory hence speed - supporting a plethora of data sources
  • Machine learning
  • Performance considerations
  • Big data analytics - know your data -- or: the role of the data scientist!
  • How to judge data quality; risk analysis - and the importance of statistics
  • Use of programming languages: Python, R, Scala, ...
  • Use of visualisation tools in order to keep an overview and to estimate the relative importance of the different data sources
  • Overview of often used (open source) products/technologies on the market

Training method

Classroom training.


1 day.

Course leader

Peter Vanroose, Kris Van Thillo.