{"id":301,"date":"2016-11-24T12:14:47","date_gmt":"2016-11-24T10:14:47","guid":{"rendered":"http:\/\/blog.sqlora.com\/en\/?p=301"},"modified":"2018-04-08T23:00:17","modified_gmt":"2018-04-08T21:00:17","slug":"how-to-simplify-the-data-historization","status":"publish","type":"post","link":"https:\/\/blog.sqlora.com\/en\/how-to-simplify-the-data-historization\/","title":{"rendered":"How to simplify the data historization?"},"content":{"rendered":"<p>Maintaining a data historization is a very common but time consuming task in a data warehouse environment. You face it while loading historized Core-Layer (also known as Enterprise Data Warehouse or Foundation Layer), Data Vault Models, Slowly Changing Dimensions, etc. The common techniques used involve outer joins and some kind of change detection. This change detection must be done with respect of Null-values and is possibly the most trickiest part. A very good overview by Dani Schnider can be found in his blog: <a href=\"https:\/\/danischnider.wordpress.com\/2016\/10\/08\/delta-detection-in-oracle-sql\/\" target=\"_blank\">Delta Detection in Oracle SQL<\/a><\/p>\n<p>But, on the other hand, SQL offers standard functionality with exactly desired behaviour: Group By or Partitioning with analytic functions. Can it be used for this task? Does it make sense? And how would the ETL process look like? Can we further speed up the task using partition exchange and when does it make sense? I&#8217;ll look at this in the next few posts. <!--more--><\/p>\n<p><strong>Introduction and common approach<\/strong><\/p>\n<p>First of all, I suppose you are familiar with the historization and temporal validity concepts to some extent. We consider so called Slowly Changing Dimensions Type 2. But this historization schema does not apply only for dimensions.<\/p>\n<p>Consider following example. We have a source table which must be loaded into the target table on a daily basis. The target table has validity date columns and allows multiple versions for one business key. If we detect changes in the data for some business key, we close (set the valid_to date) the current version and open a new one with the new data. No meaningful data will be replaced, all changes will be completely trackable over time.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nwhenever sqlerror continue\r\ndrop table t_source purge;\r\ndrop table t_target purge;\r\ndrop sequence scd2_seq ;\r\n\r\ncreate sequence scd2_seq ;\r\n\r\n--Source Table \r\n\r\ncreate table t_source\r\n( load_date date\r\n, business_key varchar2(5)\r\n, first_name varchar2(50)\r\n, second_names varchar2(100)\r\n, last_name varchar(50)\r\n, hire_date date\r\n, fire_date date\r\n, salary number(10));\r\n\r\n\r\n-- Target Table\r\n \r\ncreate table t_target\r\n( dwh_key number\r\n, dwh_valid_from date\r\n, dwh_valid_to date\r\n, current_version char(1 byte)\r\n, etl_op varchar2(3 byte)\r\n, business_key varchar2(5)\r\n, first_name varchar2(50)\r\n, second_names varchar2(100)\r\n, last_name varchar(50)\r\n, hire_date date\r\n, fire_date date\r\n, salary number(10));\r\n\r\n\r\n\r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-01', '123', 'Roger', null, 'Federer', DATE '2010-01-01', null, 900000);\r\n               \r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-01', '456', 'Rafael', null, 'Nadal', DATE '2009-05-01', null, 720000);\r\n\r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-01', '789', 'Serena', null, 'Williams', DATE '2008-06-01', null, 650000);               \r\n               \r\n-- We simulate the first ETL run and simply insert our data in the target table\r\n \r\ninsert into t_target (dwh_key, dwh_valid_from, dwh_valid_to, current_version, etl_op, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(scd2_seq.nextval, DATE '2016-12-01', null, 'Y', 'INS', '123', 'Roger', null, 'Federer', DATE '2010-01-01', null, 900000);\r\n               \r\ninsert into t_target (dwh_key, dwh_valid_from, dwh_valid_to, current_version, etl_op, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(scd2_seq.nextval, DATE '2016-12-01', null, 'Y', 'INS', '456', 'Rafael', null, 'Nadal', DATE '2009-05-01', null, 720000);\r\n\r\ninsert into t_target (dwh_key, dwh_valid_from, dwh_valid_to, current_version, etl_op, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(scd2_seq.nextval, DATE '2016-12-01', null, 'Y', 'INS', '789', 'Serena', null, 'Williams', DATE '2008-06-01', null, 650000);  \r\n               \r\ncommit;\r\n<\/pre>\n<p>&nbsp;<br \/>\nThe most common approach I have seen involves outer joining the source table with the current versions in the target table based on the business key. If we have the business key only on one side of the join, we are dealing with the new or deleted records. If the key is\u00a0 on both sides, we\u00a0 must check for data changes. To be able to close the old versions and insert the new ones in one MERGE Statement we use some split technique\u00a0 and then combine the old and new records with a UNION ALL.\u00a0 Again, the whole approach is very well described here: <a href=\"https:\/\/danischnider.wordpress.com\/2016\/10\/08\/delta-detection-in-oracle-sql\/\" target=\"_blank\">Delta Detection in Oracle SQL<\/a><\/p>\n<p>The typical pattren of the ETL process looks in Oracle Warehouse Builder something like this:<\/p>\n<p><A name=\"twice\"><\/A><\/p>\n<p><a href=\"https:\/\/blog.sqlora.com\/en\/wp-content\/uploads\/sites\/2\/2016\/11\/scd2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-307\" src=\"https:\/\/blog.sqlora.com\/en\/wp-content\/uploads\/sites\/2\/2016\/11\/scd2.jpg\" alt=\"SCD2\" width=\"1094\" height=\"301\" srcset=\"https:\/\/blog.sqlora.com\/en\/wp-content\/uploads\/sites\/2\/2016\/11\/scd2.jpg 1094w, https:\/\/blog.sqlora.com\/en\/wp-content\/uploads\/sites\/2\/2016\/11\/scd2-300x83.jpg 300w, https:\/\/blog.sqlora.com\/en\/wp-content\/uploads\/sites\/2\/2016\/11\/scd2-1024x282.jpg 1024w, https:\/\/blog.sqlora.com\/en\/wp-content\/uploads\/sites\/2\/2016\/11\/scd2-624x172.jpg 624w\" sizes=\"auto, (max-width: 1094px) 100vw, 1094px\" \/><\/a><\/p>\n<p>Pretty much work. The drawback of the Oracle Warehouse Builder implementation is that the source and target tables will be scanned an joined twice. So how can we do it better?<\/p>\n<p><strong>Another approach<\/strong><\/p>\n<p>Imagine, next day we&#8217;ve got the new data in the source:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n-- Next we get new data in the source \r\ndelete from t_source;\r\n\r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-02', '123', 'Roger', null, 'Federer', DATE '2010-01-01', null, 900000);\r\n               \r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-02', '456', 'Rafael', null, 'Nadal', DATE '2009-05-01', null, 720000);\r\n-- New Second_name \r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-02', '789', 'Serena', 'Jameka', 'Williams', DATE '2008-06-01', null, 650000);     \r\n--new record\r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-02', '345', 'Venus', null, 'Williams', DATE '2016-11-01', null, 500000);     \r\n\r\ncommit;\r\n<\/pre>\n<p>&nbsp;<br \/>\nFor Business Key 789 (Serena Williams) we get the second name instead of an empty (NULL) value. The Business Key 345 (Venus Williams) is new.<br \/>\nNow we have following content in our tables:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nselect * from t_source;\r\n\r\nLOAD_DATE\u00a0\u00a0BUS\u00a0FIRST_NAME\u00a0SECOND_NAM\u00a0LAST_NAME\u00a0\u00a0HIRE_DATE\u00a0\u00a0FIRE_DATE\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0SALARY\r\n----------\u00a0---\u00a0----------\u00a0----------\u00a0----------\u00a0----------\u00a0----------\u00a0----------\r\n2016-12-02\u00a0123\u00a0Roger\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Federer\u00a0\u00a0\u00a0\u00a02010-01-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0900000\r\n2016-12-02\u00a0456\u00a0Rafael\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Nadal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a02009-05-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0720000\r\n2016-12-02\u00a0789\u00a0Serena\u00a0\u00a0\u00a0\u00a0\u00a0Jameka\u00a0\u00a0\u00a0\u00a0\u00a0Williams\u00a0\u00a0\u00a02008-06-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0650000\r\n2016-12-02\u00a0345\u00a0Venus\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Williams\u00a0\u00a0\u00a02016-11-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0500000\r\n\r\n4\u00a0rows\u00a0selected.\r\n\r\nselect * from t_target;\r\n\r\n   DWH_KEY DWH_VALID_ DWH_VALID_ C ETL_OP BUS FIRST_NAME SECOND_NAM LAST_NAME  HIRE_DATE  FIRE_DATE      SALARY\r\n---------- ---------- ---------- - ------ --- ---------- ---------- ---------- ---------- ---------- ----------\r\n         1 2016-12-01            Y INS    123 Roger                 Federer    2010-01-01                900000\r\n         2 2016-12-01            Y INS    456 Rafael                Nadal      2009-05-01                720000\r\n         3 2016-12-01            Y INS    789 Serena                Williams   2008-06-01                650000\r\n\r\n3 rows selected.\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\n<strong>Step 1<\/strong><\/p>\n<p>We must compare the source data with the current versions in the target table. It might be not so straightforward, but let combine them with a UNION operator at first step:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n-- The current versions in the target table\r\nselect  'T' source_target  -- to distinguish these records later\r\n,       business_key\r\n,       dwh_valid_from\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_target\r\nwhere   current_version = 'Y'\r\n-- Now the new data\r\n-- put them together with UNION ALL\r\nunion all\r\nselect  'S' source_target\r\n,       business_key\r\n,       load_date dwh_valid_from  -- we use load_date as the new validity start date\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_source;\r\n\r\nS BUS DWH_VALID_ FIRST_NAME SECOND_NAM LAST_NAME  HIRE_DATE  FIRE_DATE      SALARY\r\n- --- ---------- ---------- ---------- ---------- ---------- ---------- ----------\r\nT 123 2016-12-01 Roger                 Federer    2010-01-01                900000\r\nT 456 2016-12-01 Rafael                Nadal      2009-05-01                720000\r\nT 789 2016-12-01 Serena                Williams   2008-06-01                650000\r\nS 123 2016-12-02 Roger                 Federer    2010-01-01                900000\r\nS 456 2016-12-02 Rafael                Nadal      2009-05-01                720000\r\nS 789 2016-12-02 Serena     Jameka     Williams   2008-06-01                650000\r\nS 345 2016-12-02 Venus                 Williams   2016-11-01                500000\r\n\r\n7 rows selected.\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\nWe add the column SOURCE_TARGET which can be &#8216;T&#8217; or &#8216;S&#8217; to distinguish between the records coming from target and source tables.<\/p>\n<p>Note that we can have at most two records per business key in this result set. Possible cases are:<\/p>\n<ul>\n<li>The tracked columns of those two records can be exactly the same &#8211; no action needed<\/li>\n<li>The tracked columns could be changed &#8211; must detect this, close the current version and open a new one<\/li>\n<li>There can be only one record per business key if we have a new record that is not yet in target or if the record was deleted in source<\/li>\n<\/ul>\n<p>&nbsp;<br \/>\n<strong>Step 2<\/strong><\/p>\n<p>So let us simply count records per business key and columns to be tracked using the analytic function count(). The most interesting part is in line 32. The Partioning Clause is the business_key and all columns to be tracked. This partitioning clause is doing the whole change detection work for us. Its behaviour regarding nulls is exactly what we want: NULL=NULL and NULL!=VALUE. If we have some change, we will have another partition. If we have no change, both old and new records will be in one partition.<\/p>\n<pre class=\"brush: sql; highlight: [32]; title: ; notranslate\" title=\"\">\r\nwith union_source_target as \r\n(\r\n-- What are the current versions in the target table\r\nselect  'T' source_target  -- to distinguish these records later\r\n,       business_key\r\n,       dwh_valid_from\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_target\r\nwhere   current_version = 'Y'\r\n-- Now the new data\r\n-- put them together with UNION ALL\r\nunion all\r\nselect  'S' source_target\r\n,       business_key\r\n,       load_date dwh_valid_from  -- we use load_date as the new validity start date\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_source\r\n)\r\nselect un.*\r\n--     we need NVL for the records which were deleted in source in case we must &quot;close&quot; them. The constant means for example load date and can be returned from PL\/SQL function \r\n,      nvl(lead (dwh_valid_from ) over(partition by business_key order by dwh_valid_from) - 1,  DATE '2016-12-02' ) new_dwh_valid_to    \r\n,      count(*) over (partition by business_key, first_name, second_names, last_name, hire_date, fire_date, salary)  cnt --  cnt = 2 - two versions are the same, otherwise INS\/UPD\/DEL\r\n,      count(*) over (partition by business_key) cnt_key  -- the count of versions per business_key\r\nfrom   union_source_target un;\r\n\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Step 3<\/strong><\/p>\n<p>So how can we use the information gained at step 2? If the count is equal two, then we have two identical records &#8211; nothing to do. Otherwise we must handle Insert, Update or Delete. We can show the expected actions with the following query. Actually we don&#8217;t need it, this is only to show what will happen.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n-- Filter only the records where some action is needed and show what kind of action \r\n\r\nwith union_source_target as \r\n(\r\n-- What are the current versions in the target table\r\nselect  'T' source_target  -- to distinguish these records later\r\n,       business_key\r\n,       dwh_valid_from\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_target\r\nwhere   current_version = 'Y'\r\n-- Now the new data\r\n-- put them together with UNION ALL\r\nunion all\r\nselect  'S' source_target\r\n,       business_key\r\n,       load_date dwh_valid_from  -- we use load_date as the new validity start date\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_source\r\n)\r\n, action_needed as\r\n(\r\nselect un.*\r\n--     we need NVL for the records which were deleted in source in case we must &quot;close&quot; them. The constant means for example load date and can be returned from PL\/SQL function \r\n,      nvl(lead (dwh_valid_from ) over(partition by business_key order by dwh_valid_from) - 1,  DATE '2016-12-02' ) new_dwh_valid_to    \r\n,      count(*) over (partition by business_key, first_name, second_names, last_name, hire_date, fire_date, salary)  cnt --  cnt = 2 - two versions are the same, otherwise INS\/UPD\/DEL\r\n,      count(*) over (partition by business_key) cnt_key  -- the count of versions per business_key (in this result set only, not in the target table)  \r\nfrom   union_source_target un\r\n)\r\nselect an.*\r\n,      case \r\n          when source_target = 'T' and cnt_key &gt; 1 then 'Close this version (new version will be added)' \r\n          when source_target = 'T' and cnt_key = 1 then 'Close this version (no new version will be added, the record was deleted in source)'\r\n          when source_target = 'S' then 'Insert new version'                    \r\n       end action\r\nfrom   action_needed an\r\nwhere  cnt != 2 \r\n\r\nS BUS DWH_VALID_ FIRST_NAME SECOND_NAM LAST_NAME  HIRE_DATE  FIRE_DATE      SALARY NEW_DWH_VALID_TO CNT CNT_KEY ACTION                                            \r\n- --- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- --- ------- --------------------------------------------------\r\nS 345 2016-12-02 Venus                 Williams   2016-11-01                500000 2016-12-02         1       1 Insert new version                                \r\nT 789 2016-12-01 Serena                Williams   2008-06-01                650000 2016-12-01         1       2 Close this version (new version will be added)    \r\nS 789 2016-12-02 Serena     Jameka     Williams   2008-06-01                650000 2016-12-02         1       2 Insert new version                                \r\n\r\n3 rows selected.\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\n<strong>Step 4<\/strong><\/p>\n<p>The actions are as expected and now we are ready to merge this result set into the target table:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nmerge into  t_target t\r\nusing (  \r\nwith union_source_target as \r\n(\r\n-- What are the current versions in the target table\r\nselect  'T' source_target  -- to distinguish these records later\r\n,       business_key\r\n,       dwh_valid_from\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_target\r\nwhere   current_version = 'Y'\r\n-- Now the new data\r\n-- put them together with UNION ALL\r\nunion all\r\nselect  'S' source_target\r\n,       business_key\r\n,       load_date dwh_valid_from  -- we use load_date as the new validity start date\r\n,       first_name\r\n,       second_names\r\n,       last_name\r\n,       hire_date\r\n,       fire_date\r\n,       salary\r\nfrom    t_source\r\n)\r\n, action_needed as\r\n(\r\nselect un.*\r\n--     we need NVL for the records which were deleted in source in case we must &quot;close&quot; them. The constant means for example load date and can be returned from PL\/SQL function \r\n,      nvl(lead (dwh_valid_from ) over(partition by business_key order by dwh_valid_from) - 1,  DATE '2016-12-02' ) new_dwh_valid_to    \r\n,      count(*) over (partition by business_key, first_name, second_names, last_name, hire_date, fire_date, salary)  cnt --  cnt = 2 - two versions are the same, otherwise INS\/UPD\/DEL\r\n,      count(*) over (partition by business_key) cnt_key  -- the count of versions per business_key (in this result set only, not in the target table)  \r\nfrom   union_source_target un\r\n)\r\nselect an.*\r\n,      case \r\n          when source_target = 'T' and cnt_key &gt; 1 then 'Close this version (new version will be added)' \r\n          when source_target = 'T' and cnt_key = 1 then 'Close this version (no new version will be added, the record was deleted in source)'\r\n          when source_target = 'S' then 'Insert new version'                    \r\n       end action\r\nfrom   action_needed an\r\nwhere  cnt != 2 \r\n)  q\r\non ( t.business_key  = q.business_key and t.dwh_valid_from = q.dwh_valid_from  )\r\nwhen not matched then insert (    dwh_key\r\n                                , dwh_valid_from\r\n                                , dwh_valid_to\r\n                                , current_version\r\n                                , etl_op\r\n                                , business_key\r\n                                , first_name\r\n                                , second_names\r\n                                , last_name\r\n                                , hire_date\r\n                                , fire_date\r\n                                , salary )\r\n    values (  scd2_seq.nextval\r\n            , q.dwh_valid_from \r\n            , NULL  -- NULL means no end date \r\n            , 'Y'\r\n            , 'INS'\r\n            , q.business_key\r\n            , q.first_name\r\n            , q.second_names\r\n            , q.last_name\r\n            , q.hire_date\r\n            , q.fire_date\r\n            , q.salary )\r\nwhen matched then update set \r\n             t.dwh_valid_to = q.new_dwh_valid_to\r\n           , t.current_version = 'N'\r\n           , t.etl_op = 'UPD';\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\nWe are done with it! What is in the target table after the merge? Exactly what we&#8217;ve expected: the new version for Venus Williams and the new version for Serena Williams. Both are valid from 2016-12-02, our load date. The old version for Serena Williams was closed with valid_to date of 2016-12-01.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n\r\nselect   * \r\nfrom     t_target\r\norder by business_key, dwh_valid_from;\r\n\r\n   DWH_KEY DWH_VALID_ DWH_VALID_ C ETL_OP BUS FIRST_NAME SECOND_NAM LAST_NAME  HIRE_DATE  FIRE_DATE      SALARY\r\n---------- ---------- ---------- - ------ --- ---------- ---------- ---------- ---------- ---------- ----------\r\n         1 2016-12-01            Y INS    123 Roger                 Federer    2010-01-01                900000\r\n         6 2016-12-02            Y INS    345 Venus                 Williams   2016-11-01                500000\r\n         2 2016-12-01            Y INS    456 Rafael                Nadal      2009-05-01                720000\r\n         3 2016-12-01 2016-12-01 N UPD    789 Serena                Williams   2008-06-01                650000\r\n         5 2016-12-02            Y INS    789 Serena     Jameka     Williams   2008-06-01                650000\r\n\r\n5 rows selected.\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p>One more test, the next load on 2016-12-03. We change number column SALARY, date column FIRE_DATE and Rafael Nadal is not in the source any more:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n\r\ndelete from t_source;\r\n\r\n-- Salary changed\r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-03', '123', 'Roger', null, 'Federer', DATE '2010-01-01', null, 920000);\r\n            \r\n-- Business_key = 456 is not there anymore\r\n   \r\n-- No change \r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-03', '789', 'Serena', 'Jameka', 'Williams', DATE '2008-06-01', null, 650000);     \r\n-- Change in fire_date\r\ninsert into t_source (load_date, business_key, first_name, second_names, last_name, hire_date, fire_date, salary)\r\n               values(DATE '2016-12-03', '345', 'Venus', null, 'Williams', DATE '2016-11-01', DATE '2016-12-01', 500000);     \r\n\r\ncommit;  \r\n\r\nMERGE\r\n... \r\n\r\nselect   * \r\nfrom     t_target\r\norder by business_key, dwh_valid_from;\r\n\r\n   DWH_KEY DWH_VALID_ DWH_VALID_ C ETL_OP BUS FIRST_NAME SECOND_NAM LAST_NAME  HIRE_DATE  FIRE_DATE      SALARY\r\n---------- ---------- ---------- - ------ --- ---------- ---------- ---------- ---------- ---------- ----------\r\n         1 2016-12-01 2016-12-02 N UPD    123 Roger                 Federer    2010-01-01                900000\r\n        11 2016-12-03            Y INS    123 Roger                 Federer    2010-01-01                920000\r\n         6 2016-12-02 2016-12-02 N UPD    345 Venus                 Williams   2016-11-01                500000\r\n        10 2016-12-03            Y INS    345 Venus                 Williams   2016-11-01 2016-12-01     500000\r\n         2 2016-12-01 2016-12-02 N UPD    456 Rafael                Nadal      2009-05-01                720000\r\n         3 2016-12-01 2016-12-01 N UPD    789 Serena                Williams   2008-06-01                650000\r\n         5 2016-12-02            Y INS    789 Serena     Jameka     Williams   2008-06-01                650000\r\n\r\n7 rows selected.\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\nAs you can see, Roger Federer got a salary raise and a new version, Venus Willaims was fired and got a new version too. The record for Rafael Nadal was deleted in the source. We have recognized that and set an appropriate valid_to date.<\/p>\n<p>One could certainly argue about the best startegy to handle temporal validity. How can I define infinity, with NULL values as I did, or with some special values, e.g. 9999-12-31? Does it make sense to use some kind of current_version (Yes\/No) column? Is it better to have Valid_to equal to Valid_from of the following version? In our example we have one day gap between them. What is the best strategy to handle deleted records? All of this are reasonable questions, of course, but not  the point here. With this approach we have enough information to implement all of the possible strategies.<\/p>\n<p><strong>Conclusion <\/strong><br \/>\nIt looks very simple but nevertheless does the right thing! We can avoid an outer join. We can avoid complex change detection with NVL, buildng row hashes or the like. The source will be scanned only once, which can be very important if we are dealing with some complex view as a source. Even more, we have a UNION instead of a join &#8211; in 12c both branches of the UNION operator can be executed concurrently. <\/p>\n<p>I will do some further investigation, compare perfromance, look at parallel execution, etc. and share results here. Is it possible to change the statement so that SCD1 columns could be handled as well? Another interesting alternative I have tested is also based on the same approach but uses a multi-table insert instead of a merge and partition exchange afterwards. I&#8217;ll describe this in one of my next posts.<\/p>\n<div class=\"crp-list-container\"><h3 class=\"crp-list-title\">Related Posts<\/h3><ul class=\"crp-list\"><li class=\"crp-list-item crp-list-item-image-none\"><div class=\"crp-list-item-title\"><a href=\"https:\/\/blog.sqlora.com\/en\/data-historization-ii\/\" target=\"_blank\">Data Historization II<\/a><\/div><\/li><li class=\"crp-list-item crp-list-item-image-none\"><div class=\"crp-list-item-title\"><a href=\"https:\/\/blog.sqlora.com\/en\/debugging-scd2\/\" target=\"_blank\">Debugging SCD2<\/a><\/div><\/li><\/ul><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Maintaining a data historization is a very common but time consuming task in a data warehouse environment. You face it while loading historized Core-Layer (also known as Enterprise Data Warehouse or Foundation Layer), Data Vault Models, Slowly Changing Dimensions, etc. The common techniques used involve outer joins and some kind of change detection. This change [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[50,2,3],"tags":[56,55,53,54],"class_list":["post-301","post","type-post","status-publish","format-standard","hentry","category-data-warehouse","category-oracle","category-sql","tag-change-detection","tag-owb","tag-scd2","tag-slowly-changing-dimensions"],"_links":{"self":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/301","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/comments?post=301"}],"version-history":[{"count":52,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/301\/revisions"}],"predecessor-version":[{"id":1311,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/301\/revisions\/1311"}],"wp:attachment":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/media?parent=301"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/categories?post=301"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/tags?post=301"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}