After the previous post about building an UNPIVOT query via SQL macro to output table rows like key-value pairs, I thought of another use case for the UNPIVOT operator, where developing a reusable SQL macro will make sense. From time to time you need to spot the difference between almost identical rows. You know they are different and you can easily check this using MINUS or GROUP BY, but if you want to know in what column(s) exactly the difference is, you need another approach.
Let’s start with a simple table to test with. We have three unique rows identified by ID
and VERS_ID
, system generated CREATED
timestamp and three columns C1, C2
and C3
where some values are different across those three rows.
SQL> CREATE TABLE diff_test (
2 id NUMBER ,
3 vers_id NUMBER,
4 created TIMESTAMP(6) default systimestamp,
5 c1 NUMBER,
6 c2 VARCHAR2(20),
7 c3 DATE
8 );
Table DIFF_TEST created.
SQL>
SQL> INSERT INTO diff_test (id, vers_id, c1, c2, c3) values (4711, 1, 100, 'A', DATE '2020-09-25');
1 row inserted.
SQL> INSERT INTO diff_test (id, vers_id, c1, c2, c3) values (4711, 2, 200, 'A', DATE '2020-09-25');
1 row inserted.
SQL> INSERT INTO diff_test (id, vers_id, c1, c2, c3) values (4711, 3, 200, 'B', DATE '2020-09-27');
1 row inserted.
SQL>
SQL> SELECT * FROM diff_test;
ID VERS_ID CREATED C1 C2 C3
---------- ------- --------------------------- ----- --- -------------------
4711 1 22.12.20 11:45:28,892617000 100 A 25.09.2020 00:00:00
4711 2 22.12.20 11:45:28,951802000 200 A 25.09.2020 00:00:00
4711 3 22.12.20 11:45:29,009886000 200 B 27.09.2020 00:00:00
Listing 1: Test table
Of course, for this small table we can spot the difference immediately: between VERS_ID 1 and 2 the column C1 changes from 100 to 200 and between VERS_ID 2 and 3 C2 changes to “B” and C3 from 25.09 to 27.09. But what about those systems (you name it) having hundreds of columns in their tables?
Let’s put together the query step by step. To simplify the example, let’s say we will only compare exactly two rows. However, this does not make our function any less useful.
Step 1: UNPIVOT
First we will construct a new “single-column” unique key P_KEY and select all columns we want to check:
SQL> with data as (select * from diff_test where vers_id < 3)
2 , src_query as (
3 select "ID"||'#'||"VERS_ID" as p_key
4 , "C1","C2","C3"
5 from (data) )
6 select * from src_query;
P_KEY C1 C2 C3
-------- ----- --- -------------------
4711#1 100 A 25.09.2020 00:00:00
4711#2 200 A 25.09.2020 00:00:00
Listing 2: select all columns and build a key
Next just UNPIVOT all columns except the key. You can have a look on the previous post to see how to do this and avoid ORA-01790:
SQL> with data as (select * from diff_test where vers_id < 3)
2 , src_query as (
3 select ID||'#'||VERS_ID as p_key
4 , C1,C2,C3
5 from (data) )
6 , unpivot_query as (
7 select P_KEY, col_name, col_value
8 from ( select P_KEY, to_char(C1) as C1
9 , C2
10 , to_char(C3,'YYYY-MM-DD HH24:MI:SS') as C3
11 from src_query )
12 unpivot include nulls (col_value for col_name in (C1,C2,C3)))
13 select * from unpivot_query;
P_KEY CO COL_VALUE
-------- -- ----------------------------------------
4711#1 C1 100
4711#1 C2 A
4711#1 C3 2020-09-25 00:00:00
4711#2 C1 200
4711#2 C2 A
4711#2 C3 2020-09-25 00:00:00
6 rows selected.
Listing 3: Cast all columns as string and UNPIVOT
Step 2. GROUP BY
Now that we have each column name/column value pair in separate rows we can just GROUP BY
column names and COUNT DISTINCT
the values. If the values are different, you’ll have COUNT > 1
. That’s it.
SQL> with data as (select * from diff_test where vers_id < 3)
2 , src_query as (
3 select ID||'#'||VERS_ID as p_key
4 , C1,C2,C3
5 from (data) )
6 , unpivot_query as (
7 select P_KEY, col_name, col_value
8 from ( select P_KEY, to_char(C1) as C1
9 , C2
10 , to_char(C3,'YYYY-MM-DD HH24:MI:SS') as C3
11 from src_query )
12 unpivot include nulls (col_value for col_name in (C1,C2,C3)))
13 select listagg(p_key,'->') within group (order by p_key ) as p_keys
14 , col_name
15 , min (col_value) keep (dense_rank first order by p_key) old_value
16 , max (col_value) keep (dense_rank last order by p_key) new_value
17 from unpivot_query
18 group by col_name
19 having count(distinct dump(col_value)) > 1 ;
P_KEYS COL_NAME OLD_VALUE NEW_VALUE
---------------- -------- -------------------- --------------------
4711#1->4711#2 C1 100 200
Listing 4: GROUP BY can help to see differences
Why am I using DUMP(COL_VALUE)
? Without this trick we would not be able to detect changes from NULL to some value or from some value to NULL! This is because COUNT(NULL)
equals 0 (not NULL but 0!). Consider this small example:
SQL> with data as (select null as col from dual
2 union all
3 select 1 from dual)
4 select count(distinct col)
5 , count(distinct dump(col))
6 from data;
COUNT(DISTINCTCOL) COUNT(DISTINCTDUMP(COL))
------------------ ------------------------
1 2
Listing 5: Count(NULL) = 0
Step 3: SQL macro
Now that we know the query from Listing 4 does what we need, we can implement the SQL macro function:
create or replace function find_diff (p_tab dbms_tf.table_t
, p_key dbms_tf.columns_t
, p_exclude_cols dbms_tf.columns_t
default null)
return clob sql_macro
as
v_sql clob;
v_key varchar2(1000);
v_all_cols varchar2(32767);
v_all_casted_cols varchar2(32767);
v_name dbms_quoted_id;
begin
select listagg(column_value,'||''#''||') into v_key from table(p_key);
FOR i IN 1..p_tab.column.count LOOP
v_name := p_tab.column(i).description.name;
IF (NOT v_name MEMBER OF p_exclude_cols or P_exclude_cols is null)
AND NOT v_name MEMBER OF p_key THEN
IF p_tab.column(i).description.type = dbms_tf.type_number THEN
v_all_casted_cols := v_all_casted_cols||', to_char('||v_name||') as '||v_name ;
ELSIF p_tab.column(i).description.type = dbms_tf.type_date THEN
v_all_casted_cols := v_all_casted_cols||', to_char('||v_name||',''YYYY-MM-DD HH24:MI:SS'') as '||v_name ;
ELSIF p_tab.column(i).description.type = dbms_tf.type_timestamp THEN
v_all_casted_cols := v_all_casted_cols||', to_char('||v_name||',''YYYY-MM-DD HH24:MI:SS.FF9'') as '||v_name ;
ELSE
v_all_casted_cols := v_all_casted_cols||','||v_name ;
END IF;
v_all_cols := v_all_cols||','||v_name ;
END IF;
END LOOP;
v_all_cols := ltrim(v_all_cols,',');
v_all_casted_cols := ltrim(v_all_casted_cols,',');
v_sql := q'{ with src_query as (
select %PH_KEY% as p_key, %PH_ALL_COLS%
from p_tab )
, unpivot_query as (
select P_KEY, col_name, col_value
from ( select P_KEY, %PH_ALL_CASTED_COLS%
from src_query )
unpivot include nulls (col_value for col_name in (%PH_ALL_COLS% ) ) )
select listagg(p_key,'->') within group (order by p_key ) as p_keys
, col_name
, min (col_value) keep (dense_rank first order by p_key) old_value
, max (col_value) keep (dense_rank last order by p_key) new_value
from unpivot_query
group by col_name
having count(distinct dump(col_value)) > 1 }';
v_sql := replace(replace(replace(v_sql,'%PH_KEY%',v_key)
,'%PH_ALL_COLS%',v_all_cols)
,'%PH_ALL_CASTED_COLS%', v_all_casted_cols) ;
dbms_output.put_line (v_sql);
DBMS_TF.TRACE(p_tab);
DBMS_TF.TRACE(p_key);
return v_sql;
end;
/
Listing 6: The implementation of the SQL macro FIND_DIFF
The function accepts one table parameter (DBMS_TF.TABLE_T
) and two column list parameters (DBMS_TF.COLUMNS_T
): the one to pass all columns that make up the key (p_keys) and the one to pass the columns which we don’t want to compare (p_exclude_cols). The implementation is straightforward. Loop over the columns, check whether we should process them, handle conversion depending from the datatype (line 14-31). This kind of processing was a part of almost all examples for table macros so far. Then we just build the query. I prefer to write down the whole query first. I’m using placeholders, i.e. %PH_ALL_CASTED_COLS%
, where the content of other variables should be. But I don’t bother with concatenating upfront, but instead replace those placeholders afterwards (lines 47-49). I think it makes it just easier to see, understand and edit the query as a whole.
Quite important is the line 35. Simply put the table parameter name in your query where you want to select from it. Whatever it will be – table, view, subquery, etc. – it will be just substituted later.
Let’s test the function:
SQL> with data as (select * from diff_test where vers_id < 3)
2 select * from find_diff(data, columns(id, vers_id));
P_KEYS COL_NAME OLD_VALUE NEW_VALUE
---------------- -------- -------------------- --------------------
4711#1->4711#2 C1 100 200
4711#1->4711#2 CREATED 2020-12-22 11:45:28. 2020-12-22 11:45:28.
892617000 951802000
-- Exclude the column CREATED
SQL> with data as (select * from diff_test where vers_id < 3)
2 select * from find_diff(data, columns(id, vers_id), columns(created));
P_KEYS COL_NAME OLD_VALUE NEW_VALUE
---------------- -------- -------------------- --------------------
4711#1->4711#2 C1 100 200
-- Compare 2nd and 3rd versions
SQL> with data as (select * from diff_test where vers_id > 1)
2 select * from find_diff(data, columns(id, vers_id), columns(created));
P_KEYS COL_NAME OLD_VALUE NEW_VALUE
---------------- -------- -------------------- --------------------
4711#2->4711#3 C2 A B
4711#2->4711#3 C3 2020-09-25 00:00:00 2020-09-27 00:00:00
Listing 7: test the function
As you can see in the first output, comparing technical audit columns such as CREATED, MODIFIED
and so on, may not be what you want and this is when the introduced parameter p_exclude_cols becomes handy.
DBMS_TF.TRACE
Did you notice two calls to DBMS_TF.TRACE
at the end of the function? If you already played around with polymorphic table functions (PTF) in 18c and above, you probably already know this procedure. In fact, it’s a wrapper about dbms_output.put_line
and makes it easy to output the internal structures used in context of PTF. Since we are using some of these structures (DBMS_TF.TABLE_T
and DBMS_TF.COLUMNS_T
) we can also call this logging procedure just as we did in the DESCRIBE method of the PTF.
........tab.column[1] =
pass through column
NOT a for read column
__________Type:...............NUMBER
__________Max Length:.........22
__________Name:..............."ID"
__________Name Length:........4
__________Precision:..........0
__________Scale:..............-127
........tab.column[2] =
pass through column
NOT a for read column
__________Type:...............NUMBER
__________Max Length:.........22
__________Name:..............."VERS_ID"
__________Name Length:........9
__________Precision:..........0
__________Scale:..............-127
........tab.column[3] =
pass through column
NOT a for read column
__________Type:...............TIMESTAMP
__________Max Length:.........20
__________Name:..............."CREATED"
__________Name Length:........9
........tab.column[4] =
pass through column
NOT a for read column
__________Type:...............NUMBER
__________Max Length:.........22
__________Name:..............."C1"
__________Name Length:........4
__________Precision:..........0
__________Scale:..............-127
........tab.column[5] =
pass through column
NOT a for read column
__________Type:...............VARCHAR2
__________Max Length:.........20
__________Name:..............."C2"
__________Name Length:........4
__________Charset Id:.........AL32UTF8
__________Collation:..........USING_NLS_COMP
........tab.column[6] =
pass through column
NOT a for read column
__________Type:...............DATE
__________Max Length:.........21
__________Name:..............."C3"
__________Name Length:........4
........columns[1] =
"ID"
........columns[2] =
"VERS_ID"
Listing 8: The output of DBMS_TF.TRACE
What happens in the background
It was very interesting to see what query will be actually executed. As shown in Listing 9, you can use DBMS_UTILITY.EXPAND_SQL_TEXT for it. I have formatted the output a little. Quite extensive transformations here! You can see how Oracle does UNPIVOT: for three columns there are three queries combined with UNION ALL. But also important are the highlighted lines where we can see our subquery used as the input for the macro function. Remember, it was just one parameter on line 35 of the Listing 6 where we passed it.
DECLARE
l_clob CLOB;
BEGIN
DBMS_UTILITY.expand_sql_text (
input_sql_text => 'with data as (select * from diff_test where vers_id > 1)
select * from find_diff(data, columns(id, vers_id), columns(created))',
output_sql_text => l_clob );
DBMS_OUTPUT.put_line(l_clob);
END;
/
SELECT "A1"."P_KEYS" "P_KEYS","A1"."COL_NAME" "COL_NAME","A1"."OLD_VALUE" "OLD_VALUE","A1"."NEW_VALUE" "NEW_VALUE"
FROM (SELECT "A6"."P_KEYS" "P_KEYS","A6"."COL_NAME" "COL_NAME","A6"."OLD_VALUE" "OLD_VALUE","A6"."NEW_VALUE" "NEW_VALUE"
FROM (SELECT LISTAGG("A7"."P_KEY",'->') WITHIN GROUP ( ORDER BY "A7"."P_KEY") "P_KEYS"
,"A7"."COL_NAME" "COL_NAME"
,MIN("A7"."COL_VALUE") KEEP (DENSE_RANK FIRST ORDER BY "A7"."P_KEY") "OLD_VALUE"
,MAX("A7"."COL_VALUE") KEEP (DENSE_RANK FIRST ORDER BY "A7"."P_KEY" DESC ) "NEW_VALUE"
FROM (SELECT "A2"."P_KEY" "P_KEY","A2"."COL_NAME" "COL_NAME","A2"."COL_VALUE" "COL_VALUE"
FROM (SELECT "A10"."P_KEY" "P_KEY",'C1' "COL_NAME","A10"."C1" "COL_VALUE"
FROM (SELECT "A9"."P_KEY" "P_KEY",TO_CHAR("A9"."C1") "C1","A9"."C2" "C2",TO_CHAR("A9"."C3",'YYYY-MM-DD HH24:MI:SS') "C3"
FROM (SELECT TO_CHAR("A3"."ID")||'#'||TO_CHAR("A3"."VERS_ID") "P_KEY","A3"."C1" "C1","A3"."C2" "C2","A3"."C3" "C3"
FROM (SELECT "A4"."ID" "ID","A4"."VERS_ID" "VERS_ID","A4"."CREATED" "CREATED","A4"."C1" "C1","A4"."C2" "C2","A4"."C3" "C3"
FROM (SELECT "A5"."ID" "ID","A5"."VERS_ID" "VERS_ID","A5"."CREATED" "CREATED","A5"."C1" "C1","A5"."C2" "C2","A5"."C3" "C3"
FROM "ONFTEST"."DIFF_TEST" "A5"
WHERE "A5"."VERS_ID">1) "A4"
) "A3"
) "A9"
) "A10"
UNION ALL
SELECT "A10"."P_KEY" "P_KEY",'C2' "COL_NAME","A10"."C2" "COL_VALUE"
FROM (SELECT "A9"."P_KEY" "P_KEY",TO_CHAR("A9"."C1") "C1","A9"."C2" "C2",TO_CHAR("A9"."C3",'YYYY-MM-DD HH24:MI:SS') "C3"
FROM (SELECT TO_CHAR("A3"."ID")||'#'||TO_CHAR("A3"."VERS_ID") "P_KEY","A3"."C1" "C1","A3"."C2" "C2","A3"."C3" "C3"
FROM (SELECT "A4"."ID" "ID","A4"."VERS_ID" "VERS_ID","A4"."CREATED" "CREATED","A4"."C1" "C1","A4"."C2" "C2","A4"."C3" "C3"
FROM (SELECT "A5"."ID" "ID","A5"."VERS_ID" "VERS_ID","A5"."CREATED" "CREATED","A5"."C1" "C1","A5"."C2" "C2","A5"."C3" "C3"
FROM "ONFTEST"."DIFF_TEST" "A5"
WHERE "A5"."VERS_ID">1) "A4"
) "A3"
) "A9"
) "A10"
UNION ALL
SELECT "A10"."P_KEY" "P_KEY",'C3' "COL_NAME","A10"."C3" "COL_VALUE"
FROM (SELECT "A9"."P_KEY" "P_KEY",TO_CHAR("A9"."C1") "C1","A9"."C2" "C2",TO_CHAR("A9"."C3",'YYYY-MM-DD HH24:MI:SS') "C3"
FROM (SELECT TO_CHAR("A3"."ID")||'#'||TO_CHAR("A3"."VERS_ID") "P_KEY","A3"."C1" "C1","A3"."C2" "C2","A3"."C3" "C3"
FROM (SELECT "A4"."ID" "ID","A4"."VERS_ID" "VERS_ID","A4"."CREATED" "CREATED","A4"."C1" "C1","A4"."C2" "C2","A4"."C3" "C3"
FROM (SELECT "A5"."ID" "ID","A5"."VERS_ID" "VERS_ID","A5"."CREATED" "CREATED","A5"."C1" "C1","A5"."C2" "C2","A5"."C3" "C3"
FROM "ONFTEST"."DIFF_TEST" "A5"
WHERE "A5"."VERS_ID">1) "A4"
) "A3"
) "A9"
) "A10"
) "A2"
) "A7"
GROUP BY "A7"."COL_NAME"
HAVING COUNT(DISTINCT DUMP("A7"."COL_VALUE"))>1) "A6"
) "A1"
Listing 9: the actual query looks quite different
Bonus: Flashback query
You do not always have two rows in one table to compare. What if you only have one row which changes over time? Someone has just changed the row of interest and you want to know what is different? There are several ways to do it. But if the chances are good that the version you need can still be found in the undo segment, you can use flashback query and our SQL macro function together to answer the question:
SQL> drop table test_emp_diff;
Table TEST_EMP_DIFF dropped.
SQL>
SQL> create table test_emp_diff as select * from scott.emp;
Table TEST_EMP_DIFF created.
SQL> column tstp new_value tstp
SQL> select to_char(systimestamp, 'yyyy-mm-dd HH24:mi:ss.ff6') tstp from dual;
TSTP
--------------------------
2020-12-23 09:28:27.692766
SQL> update test_emp_diff set Job = 'DBA' where ename = 'SCOTT';
1 row updated.
SQL> with dat1 as (
2 select 2 vers, a.*
3 from test_emp_diff a
4 union all
5 select 1 vers, b.*
6 from test_emp_diff
7 as of timestamp (to_timestamp('&tstp', 'yyyy-mm-dd HH24:mi:ss.ff6')) b
8 )
9 , dat as (Select * from dat1 where ename ='SCOTT')
10 select * from find_diff (dat, columns(ename, vers));
P_KEYS COL_NAME OLD_VALUE NEW_VALUE
---------------- -------- -------------------- --------------------
SCOTT#1->SCOTT#2 JOB ANALYST DBA
Listing 10: Using SQL macro with flashback query
Where to test?
You can test table SQL macros in Oracle Database from 19.6. You can also play with them in LiveSQL. But the best thing is that you can test both scalar and table macros in the new release Oracle 21c with your free tier in Oracle cloud! Really great news! Go for it!
A minor difference I’ve noticed in 21c is that prefixing the table parameter with the function name doesn’t work anymore. I have not yet found out if this was intentional or a bug.
BANNER
-----------------------------------------------------------------------------
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
SQL>
SQL> create or replace function test_with_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
2 begin
3 return 'test_with_function.p_tab';
4 end;
5 /
Function TEST_WITH_FUNCTION compiled
SQL>
SQL> create or replace function test_without_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
2 begin
3 return 'p_tab';
4 end;
5 /
Function TEST_WITHOUT_FUNCTION compiled
SQL>
SQL> select * from test_with_function(dual);
SQL Error: ORA-00942: table or view does not exist
00942. 00000 - "table or view does not exist"
SQL>
SQL> select * from test_without_function(dual);
D
-
X
---------------
SQL> select banner from v$version;
BANNER
-----------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
SQL>
SQL> create or replace function test_with_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
2 begin
3 return 'test_with_function.p_tab';
4 end;
5 /
Function TEST_WITH_FUNCTION compiled
SQL>
SQL> create or replace function test_without_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
2 begin
3 return 'p_tab';
4 end;
5 /
Function TEST_WITHOUT_FUNCTION compiled
SQL>
SQL> select * from test_with_function(dual);
D
-
X
SQL>
SQL> select * from test_without_function(dual);
D
-
X
Listing 11: Differences in parameter naming 19c vs. 21c
This will probably be my last blog post this year (but who knows?). Next year I promise to post not only about SQL macros 😉 But at least in the next time I am eager to share two more interesting use cases. Merry Christmas! Have a good and healthy 2021!