SQL for BI and Data Science

During this training, you will learn how to use SQL for analysing data. It may come as a surprise that so-called "in-database" analytics is used much less than it could be: most often, the (production) data is first copied to an analytics platform or BI tool to do the "data science" there. For large data volumes, or for data "in the cloud", this is not the best choice since precious time is lost by copying the data. This course will teach you how you can perform data analytics in the database itself. The learned techniques will of course also be applicable on data which has been copied, to an environment that supports SQL, e.g. to a data warehouse (DW), or to a Big Data environment like Hadoop (with Hive) or Spark.

After a short introduction on data warehouses, data lakes, business intelligence (BI) and Data Science topics, we dive into a thorough discussion on the syntactic possibilities offered by SQL. More specifically, statistical, analytical and OLAP functions will be treated, as well as the use of SQL for ETL, for data exploration, and (if applicable) for MapReduce. The most important ones are individually discussed and demonstrated; realistic labs should allow the participant to get an idea of the applicability of these SQL functions and options.

The SQL syntax covered in this course is platform independent. When relevant, platform specific exceptions will be pointed-out with respect to Oracle, Db2, SQL Server and Hive/Spark.

At the end of this training, participants will be able to:

  • understand how SQL (ie. the SELECT instruction) can be used for reporting and explorative purposes in the context of DW, BI, Big Data and Data Science;
  • write effective SQL from a DW / BI / Big Data perspective;
  • write problem-based SQL statements, and evaluate possible alternative SQL constructs for their effectiveness

Schedule

datedur.lang.locationprice 
24 Feb2Woerden 1000 EUR (exempt from VAT)
11 May2Leuven 1000 EUR (excl. VAT)
SESSION INFO AND ENROLMENT

Intended for

This course targets everyone involved in the development of business reports, by means of SQL statements on relational database systems or Hadoop (Hive) or Spark SQL.

Background

In order to have the full benefit of this course, it is important to be really proficient with SQL (level SQL workshop and being familiar with complex queries; having followed the SQL advanced course is an advantage). Furthermore it's helpful to have a basic understanding of data warehouses (see e.g. our course Modern data warehousing and business intelligence) or Big Data (see e.g. Big data architecture and infrastructure) and be aware of some elementary statistical terminology (see e.g. Statistics fundamentals).

Main topics

  • Data warehouse, business intelligence, Data Science -- positioning
  • Extract / Transform / Load (ETL)
  • the dimensional model
  • Statistics and analytics -- support in SQL
  • Purpose and context -- multi-platform support.
  • AVG, CORRELATION, COUNT, COVARIANCE, MAX, MIN, RAND, STDDEV, SUM, VARIANCE, MEDIAN, ...
  • Online analytical processing in SQL
  • Purpose and context -- multi-platform support.
  • RANK, DENSE_RANK, ROW_NUMBER, PARTITION BY, ORDER BY, ROWS, RANGE, GROUP BY, GROUPING SETS, ROLLUP, CUBE, OVER
  • WINDOWING functions: SUM ... OVER ..., LEAD; LAG and similar aggregation functions, and their application in a.o. trend analysis
  • Modelling: histograms; linear regression; normality testing; trend analysis; confidence intervals; ...
  • Text analytics; regular expressions

Training method

Class training alternated with exercises. These can be done on Db2, Oracle, SQL Server, Hive, or Spark. (Please indicate your choice on the enrolment form.)

Duration

2 days.

Course leader

Peter Vanroose, Arnout Veugelen.

Reviews

goed cursus met fijne afwisseling tussen theorie en praktijk

 
  (, )

goede verdieping op onderwerp SQL

 
  (, )

Goede cursus, veel theorie, misschien iets meer met praktijk voorbeelden en dan bedoel ik ook visueel tonen van uitkomst, niet alleen de query

 
  (, )

Prima!

 
  (, )

Uitstekend. Goed.

 
  (, )

omdat ik persoonlijk wat minder goed de basis beheers, vond ik het bijtijden bijzonder pittig

 
  (, )

Goede cursus; voldoende hoog niveau, met gelegenheid tot even "ophalen" van kennis. Analyse-onderdeel bleek een beetje onderbelicht, maar dat heeft mogelijk te maken met de geschiktheid van SQL hiervoor.

 
  (, )

Goed, gedreven trainer die veel onderwerpen aankaart

 
  (, )

Click here to see more comments...

Zeer enthousiast.

 
  (, )

Also interesting

Enrollees for this training also took the following courses:


SESSION INFO AND ENROLMENT