Performing an ETL with large data sets, it is often a good idea to run DML in parallel. But, in contrast to parallel query or DDL, parallel DML have to be explicitly enabled. You had to issue ALTER SESSION ENABLE PARALLEL DML in the past. Starting with 12c you can enable parallel DML specifically for each query using the hint ENABLE_PARALLEL_DML. For a few years now, I’ve been using the hint now and then and was quite happy. An observation I made a few days ago can lead to a rethinking. What I could observe is that for the SQL with embedded hint a new child cursor was created each time. Let’s test it!Continue reading
Online Statistics Gathering has been introduced in 12c and is a very handy feature for ETL and batch jobs developers. However the devil is in the detail. There are some points to remember. Let’s take a closer look. Continue reading
This post is again about the Slowly Changing Dimensions Type 2, but focusing on another problem. Once you have a need to validate the versioning mechanism, how you can do this? Or, in other words, having several versions of the same data (identified by the natural key), how to check what fields have been changed from version to version? Working with systems like Siebel CRM, which have some tables with 500+ columns, this possibility was really useful.
Of course you can write some PL/SQL code and iterate through the columns to compare their values. But I’m a friend of “pure SQL” solutions – let’s see how this can be done. Continue reading
Maintaining a data historization is a very common but time consuming task in a data warehouse environment. You face it while loading historized Core-Layer (also known as Enterprise Data Warehouse or Foundation Layer), Data Vault Models, Slowly Changing Dimensions, etc. The common techniques used involve outer joins and some kind of change detection. This change detection must be done with respect of Null-values and is possibly the most trickiest part. A very good overview by Dani Schnider can be found in his blog: Delta Detection in Oracle SQL
But, on the other hand, SQL offers standard functionality with exactly desired behaviour: Group By or Partitioning with analytic functions. Can it be used for this task? Does it make sense? And how would the ETL process look like? Can we further speed up the task using partition exchange and when does it make sense? I’ll look at this in the next few posts. Continue reading