home 
 
 
 
ennl
 
Home
A propos
Services
Cours
Ressources
Contacts
MyABIS
C
Tous les coursbalkjeGénéralités » Introduction HW & SW » Soft skills » Cycles completsSystèmes d’exploitation » MVS - z/OS » UNIX - Linux - AIX » Mac OS X » iPad et iPhone iOSBases de données et middleware » Relational databases & SQL » DB2 for z/OS » DB2 for LUW » Oracle » SQL Server » MySQL & MariaDB » IMS » CICS » IBM MQ » WebSphere » Big data et analyticsDéveloppement d’applications » Méthodiques et techniques » TOGAF » PRINCE2 » Agile et Scrum » Les langages de programmation » Internet development » Object Oriented systems » Java » Development tools » SAS » XML » SOA & web servicesGestion de système » ITIL » SecuritybalkjeEn pratiqueInscriptions 
Cette page n'est pas disponible en français.
Big data in practice using Hadoop

Objectives

Nowadays everybody seems to be working with "big data". Do you also want to interrogate your several data sources (click streams, social media, relational data, sensor data, ...) and are you experiencing the shortcomings of traditional data tools? Maybe you are in need of distributed data stores like HDFS and a MapReduce infrastructure like Hadoop's.

This course builds on the concepts which are set forth in the Big data concepts course. you will get hands-on practice on linux with Apache Hadoop: HDFS, Yarn, Pig, and Hive. You learn how to implement robust data processing with an SQL-style interface which generates MapReduce jobs. You also learn to work with the graphical tools which allow for easy follow-up of the jobs and the workflows on the distributed Hadoop cluster.

After successful completion of the course, you will have sufficient basic expertise to set up a Hadoop cluster, to import data into HDFS, and to interrogate it clevery using MapReduce.

When you want to use Hadoop with Spark, you are referred to the course Big data in practice using Spark.

Main topics

  • Motivation for Hadoop & base concepts
  • The Apache Hadoop project and the components of Hadoop
  • HDFS: the Hadoop Distributed File System
  • MapReduce: what and how
  • The workings of a Hadoop cluster
  • Writing a MapReduce program
  • Implementing MapReduce drivers, mappers, and reducers in Java
  • Writing Mappers and Reducers by use of an other progamming or scripting language (e.g. Perl)
  • Unit testing
  • Writing partitioners for optimizing the load balancing
  • Debugging a MapReduce program
  • Data Input / Output
  • Reading and writing sequential data from a MapReduce program
  • The use of binary data
  • Data compression
  • Some frequently used MapReduce components
  • Sorting, searching, and indexing of data
  • Word counts and counting pairs of words
  • Working with Hive and Pig
  • Pig as a high-level basic interface for letting generate a sequence of MapReduce jobs
  • Hive as a high-level SQL-style interface for letting generate a sequence of MapReduce jobs
  • Short introduction to HBase and Cassandra as alternative data stores
 

Intended for

Whoever wants to start practising "big data": developers, data architects, and anyone who needs to work with big data technology.

Background

Familiarity with the concepts of data stores and more specifically of "big data" is necessary; see our course Big data concepts. Additionally, minimal knowledge of SQL, UNIX and Java are useful. Experience with a programming language (Java, PHP, Python, Perl, C++ or C#) is a must.

Training method

Classroom instruction, with practical examples and supported by extensive practical exercises.

Course leader

Peter Vanroose.

Duration

2 days.

Schedule

Vous pouvez vous inscrire en cliquant sur une date
dateduréelang.  lieu  prix
02 Nov2NWoerden  (NL)1000 EUR  (exempte de TVA) 
20 Nov2?Leuven  (BE)1000 EUR  (excl. TVA) 

Score global

   
4.1/5 (basé sur 26 évaluations)

Reviews

     
Happy with the training even if I would spend less time on HDFS and MapReduce and more time in others components (Pig, Hive,...) (, )
     
Een dag langer? (, )
     
Goed om een overzicht te krijgen (, )
     
De cursus bevat de nodige informatie en past goed in de 2 dagen. (, )
     
Goede introductie (, )
     
goed overzicht van big data architectuur en de samenhang tussen producten en tools (, )
     
Prima cursus, goede basis voor het opzetten hadoop kennis (, )
     
Bon debut pour commencer dans le Big data (, )
     
Wel ok, ik denk dat de algemene uitleg veel sneller kan. Soms veel focus op details die voor mij bijna irrelevant lijken. Kan ook aan mij liggen. (, )
     
Zeer goede introductie (, )

Rafraîchissez cette page pour voir d'autres commentaires.