Category Archives: Data Warehouse

Object Statistics in Your Data Pipelines

As my electrical engineering lecturer at university used to joke, there are only two faults in electricity: 1) No contact where it is needed. 2) There is contact where it is not needed. You can often think the same way when looking for the causes of a poor execution plan: no stats were gathered when the optimizer needed them for a good execution plan or someone gathered stats when it was inappropriate.

Continue reading →

Issue with the Hint ENABLE_PARALLEL_DML

1 Reply

Performing an ETL with large data sets, it is often a good idea to run DML in parallel. But, in contrast to parallel query or DDL, parallel DML has to be explicitly enabled. You had to issue ALTER SESSION ENABLE PARALLEL DML in the past. Starting with 12c you can enable parallel DML specifically for each query using the hint ENABLE_PARALLEL_DML. For a few years now, I’ve been using the hint now and then and was quite happy. An observation I made a few days ago can lead to a rethinking. What I could observe is that for the SQL with embedded hint a new child cursor was created each time. Let’s test it!

Continue reading →

Online Statistics Gathering for ETL – Part 2

Leave a reply

In the first part we looked at general preconditions for online statistics gathering to work and some restrictions. In this part we’ll take a look at what happens with direct path loads into partitioned tables. Continue reading →

Online Statistics Gathering for ETL – Part 1

Leave a reply

Online Statistics Gathering has been introduced in 12c and is a very handy feature for ETL and batch jobs developers. However the devil is in the detail. There are some points to remember. Let’s take a closer look. Continue reading →

Debugging SCD2

Leave a reply

This post is again about the Slowly Changing Dimensions Type 2, but focusing on another problem. Once you have a need to validate the versioning mechanism, how you can do this? Or, in other words, having several versions of the same data (identified by the natural key), how to check what fields have been changed from version to version? Working with systems like Siebel CRM, which have some tables with 500+ columns, this possibility was really useful.
Of course you can write some PL/SQL code and iterate through the columns to compare their values. But I’m a friend of “pure SQL” solutions – let’s see how this can be done. Continue reading →

Data Historization II

Leave a reply

In the previous post I have shown how to use a combination of a UNION ALL and a GROUP BY to do a slowly changing dimension type 2 data historization. I’ve done some tests since then to compare performance of this approach with common methods in various situations. Continue reading →

How to simplify the data historization?

4 Replies

Maintaining a data historization is a very common but time consuming task in a data warehouse environment. You face it while loading historized Core-Layer (also known as Enterprise Data Warehouse or Foundation Layer), Data Vault Models, Slowly Changing Dimensions, etc. The common techniques used involve outer joins and some kind of change detection. This change detection must be done with respect of Null-values and is possibly the most trickiest part. A very good overview by Dani Schnider can be found in his blog: Delta Detection in Oracle SQL

But, on the other hand, SQL offers standard functionality with exactly desired behaviour: Group By or Partitioning with analytic functions. Can it be used for this task? Does it make sense? And how would the ETL process look like? Can we further speed up the task using partition exchange and when does it make sense? I’ll look at this in the next few posts. Continue reading →

SQLORA

Just some more Oracle stuff

Category Archives: Data Warehouse

Object Statistics in Your Data Pipelines

Issue with the Hint ENABLE_PARALLEL_DML

Online Statistics Gathering for ETL – Part 2

Online Statistics Gathering for ETL – Part 1

Debugging SCD2

Data Historization II

How to simplify the data historization?