Category Archives: Data Warehouse

Online Statistics Gathering: Update 2024

Online Statistics Gathering has been introduced in Oracle 12c. I think it is a very important building block to establish a consistent approach in dealing with optimizer statistics in your ETL jobs. I have already written some posts about it a while ago (first, second and third). When I was preparing a talk about optimizer statistics for data engineers recently, I took a closer look at it again. Oracle has now continuously improved the feature over several versions and has lifted many restrictions. Let’s see what I mean.

Continue reading

Object Statistics in Your Data Pipelines

As my electrical engineering lecturer at university used to joke, there are only two faults in electricity: 1) No contact where it is needed. 2) There is contact where it is not needed. You can often think the same way when looking for the causes of a poor execution plan: no stats were gathered when the optimizer needed them for a good execution plan or someone gathered stats when it was inappropriate.

Continue reading

Issue with the Hint ENABLE_PARALLEL_DML

Performing an ETL with large data sets, it is often a good idea to run DML in parallel. But, in contrast to parallel query or DDL, parallel DML has to be explicitly enabled. You had to issue ALTER SESSION ENABLE PARALLEL DML in the past. Starting with 12c you can enable parallel DML specifically for each query using the hint ENABLE_PARALLEL_DML. For a few years now, I’ve been using the hint now and then and was quite happy. An observation I made a few days ago can lead to a rethinking. What I could observe is that for the SQL with embedded hint a new child cursor was created each time. Let’s test it!

Continue reading

Debugging SCD2

This post is again about the Slowly Changing Dimensions Type 2, but focusing on another problem. Once you have a need to validate the versioning mechanism, how you can do this? Or, in other words, having several versions of the same data (identified by the natural key), how to check what fields have been changed from version to version? Working with systems like Siebel CRM, which have some tables with 500+ columns, this possibility was really useful.
Of course you can write some PL/SQL code and iterate through the columns to compare their values. But I’m a friend of “pure SQL” solutions – let’s see how this can be done. Continue reading

How to simplify the data historization?

Maintaining a data historization is a very common but time consuming task in a data warehouse environment. You face it while loading historized Core-Layer (also known as Enterprise Data Warehouse or Foundation Layer), Data Vault Models, Slowly Changing Dimensions, etc. The common techniques used involve outer joins and some kind of change detection. This change detection must be done with respect of Null-values and is possibly the most trickiest part. A very good overview by Dani Schnider can be found in his blog: Delta Detection in Oracle SQL

But, on the other hand, SQL offers standard functionality with exactly desired behaviour: Group By or Partitioning with analytic functions. Can it be used for this task? Does it make sense? And how would the ETL process look like? Can we further speed up the task using partition exchange and when does it make sense? I’ll look at this in the next few posts. Continue reading