Comparing Rows using a SQL Macro

After the previous post about building an UNPIVOT query via SQL macro to output table rows like key-value pairs, I thought of another use case for the UNPIVOT operator, where developing a reusable SQL macro will make sense. From time to time you need to spot the difference between almost identical rows. You know they are different and you can easily check this using MINUS or GROUP BY, but if you want to know in what column(s) exactly the difference is, you need another approach.

Let’s start with a simple table to test with. We have three unique rows identified by ID and VERS_ID, system generated CREATED timestamp and three columns C1, C2 and C3 where some values are different across those three rows.

SQL> CREATE TABLE diff_test  (
  2     id NUMBER ,
  3     vers_id NUMBER,
  4     created TIMESTAMP(6) default systimestamp,
  5     c1 NUMBER,
  6     c2 VARCHAR2(20),
  7     c3 DATE
  8     );

Table DIFF_TEST created.

SQL> 
SQL> INSERT INTO diff_test (id, vers_id, c1, c2, c3) values (4711, 1, 100, 'A', DATE '2020-09-25');

1 row inserted.

SQL> INSERT INTO diff_test (id, vers_id, c1, c2, c3) values (4711, 2, 200, 'A', DATE '2020-09-25');

1 row inserted.

SQL> INSERT INTO diff_test (id, vers_id, c1, c2, c3) values (4711, 3, 200, 'B', DATE '2020-09-27');

1 row inserted.

SQL> 
SQL> SELECT * FROM diff_test;

        ID VERS_ID CREATED                        C1 C2  C3                 
---------- ------- --------------------------- ----- --- -------------------
      4711       1 22.12.20 11:45:28,892617000   100 A   25.09.2020 00:00:00
      4711       2 22.12.20 11:45:28,951802000   200 A   25.09.2020 00:00:00
      4711       3 22.12.20 11:45:29,009886000   200 B   27.09.2020 00:00:00

Listing 1: Test table

Of course, for this small table we can spot the difference immediately: between VERS_ID 1 and 2 the column C1 changes from 100 to 200 and between VERS_ID 2 and 3 C2 changes to “B” and C3 from 25.09 to 27.09. But what about those systems (you name it) having hundreds of columns in their tables?

Let’s put together the query step by step. To simplify the example, let’s say we will only compare exactly two rows. However, this does not make our function any less useful.

Step 1: UNPIVOT

First we will construct a new “single-column” unique key P_KEY and select all columns we want to check:

SQL> with data as (select * from diff_test where vers_id < 3)
  2  ,  src_query as (
  3              select  "ID"||'#'||"VERS_ID" as p_key
  4              , "C1","C2","C3"
  5              from   (data) )
  6  select * from  src_query;

P_KEY       C1 C2  C3                 
-------- ----- --- -------------------
4711#1     100 A   25.09.2020 00:00:00
4711#2     200 A   25.09.2020 00:00:00

Listing 2: select all columns and build a key

Next just UNPIVOT all columns except the key. You can have a look on the previous post to see how to do this and avoid ORA-01790:

SQL> with data as (select * from diff_test where vers_id < 3)
  2  ,  src_query as (
  3              select  ID||'#'||VERS_ID as p_key
  4              , C1,C2,C3
  5              from   (data) )
  6  ,  unpivot_query as (  
  7              select P_KEY, col_name, col_value  
  8              from   ( select P_KEY, to_char(C1) as C1
  9                       ,      C2
 10                       ,      to_char(C3,'YYYY-MM-DD HH24:MI:SS') as C3
 11                       from   src_query )
 12              unpivot include nulls (col_value for col_name in (C1,C2,C3)))
 13  select * from unpivot_query;

P_KEY    CO COL_VALUE                               
-------- -- ----------------------------------------
4711#1   C1 100                                     
4711#1   C2 A                                       
4711#1   C3 2020-09-25 00:00:00                     
4711#2   C1 200                                     
4711#2   C2 A                                       
4711#2   C3 2020-09-25 00:00:00                     

6 rows selected.

Listing 3: Cast all columns as string and UNPIVOT

Step 2. GROUP BY

Now that we have each column name/column value pair in separate rows we can just GROUP BY column names and COUNT DISTINCT the values. If the values are different, you’ll have COUNT > 1 . That’s it.

SQL> with data as (select * from diff_test where vers_id < 3)
  2  ,  src_query as (
  3              select  ID||'#'||VERS_ID as p_key
  4              , C1,C2,C3
  5              from   (data) )
  6  ,  unpivot_query as (  
  7              select P_KEY, col_name, col_value  
  8              from   ( select P_KEY, to_char(C1) as C1
  9                       ,      C2
 10                       ,      to_char(C3,'YYYY-MM-DD HH24:MI:SS') as C3
 11                       from   src_query )
 12              unpivot include nulls (col_value for col_name in (C1,C2,C3)))
 13  select listagg(p_key,'->') within group (order by p_key ) as p_keys
 14  ,      col_name 
 15  ,      min (col_value) keep (dense_rank first order by p_key) old_value
 16  ,      max (col_value) keep (dense_rank last order by p_key) new_value
 17  from   unpivot_query
 18  group by col_name
 19  having count(distinct dump(col_value)) > 1 ;

P_KEYS           COL_NAME OLD_VALUE            NEW_VALUE           
---------------- -------- -------------------- --------------------
4711#1->4711#2   C1       100                  200

Listing 4: GROUP BY can help to see differences

Why am I using DUMP(COL_VALUE)? Without this trick we would not be able to detect changes from NULL to some value or from some value to NULL! This is because COUNT(NULL) equals 0 (not NULL but 0!). Consider this small example:

SQL> with data as (select null as col from dual 
  2                union all 
  3                select 1 from dual)
  4  select count(distinct col)
  5  ,      count(distinct dump(col))
  6  from data;

COUNT(DISTINCTCOL) COUNT(DISTINCTDUMP(COL))
------------------ ------------------------
                 1                        2

Listing 5: Count(NULL) = 0

Step 3: SQL macro

Now that we know the query from Listing 4 does what we need, we can implement the SQL macro function:

create or replace function find_diff (p_tab dbms_tf.table_t
                                    , p_key dbms_tf.columns_t
                                    , p_exclude_cols  dbms_tf.columns_t 
                                                      default null) 
return clob sql_macro
as
v_sql clob;
v_key varchar2(1000);
v_all_cols varchar2(32767);
v_all_casted_cols varchar2(32767);
v_name dbms_quoted_id;
begin 
  select listagg(column_value,'||''#''||') into v_key from table(p_key);
 
  FOR i IN 1..p_tab.column.count LOOP
    v_name := p_tab.column(i).description.name;
    IF (NOT v_name MEMBER OF p_exclude_cols or P_exclude_cols is null)
      AND NOT v_name MEMBER OF p_key THEN 
        IF p_tab.column(i).description.type = dbms_tf.type_number THEN
          v_all_casted_cols := v_all_casted_cols||', to_char('||v_name||') as '||v_name ; 
        ELSIF p_tab.column(i).description.type = dbms_tf.type_date THEN
          v_all_casted_cols := v_all_casted_cols||', to_char('||v_name||',''YYYY-MM-DD HH24:MI:SS'') as '||v_name ;
        ELSIF p_tab.column(i).description.type = dbms_tf.type_timestamp THEN
          v_all_casted_cols := v_all_casted_cols||', to_char('||v_name||',''YYYY-MM-DD HH24:MI:SS.FF9'') as '||v_name ;
        ELSE
          v_all_casted_cols := v_all_casted_cols||','||v_name ;
        END IF;
        v_all_cols := v_all_cols||','||v_name ; 
    END IF;
  END LOOP;
  v_all_cols := ltrim(v_all_cols,',');
  v_all_casted_cols := ltrim(v_all_casted_cols,',');
  v_sql := q'{ with src_query as (
            select  %PH_KEY% as p_key,  %PH_ALL_COLS%
            from   p_tab )
        ,  unpivot_query as (  
                select P_KEY, col_name, col_value  
                from   ( select P_KEY, %PH_ALL_CASTED_COLS%
                         from   src_query )
                unpivot include nulls (col_value for col_name in (%PH_ALL_COLS% ) ) )
        select listagg(p_key,'->') within group (order by p_key ) as p_keys
        ,      col_name 
        ,      min (col_value) keep (dense_rank first order by p_key) old_value
        ,      max (col_value) keep (dense_rank last order by p_key) new_value
        from   unpivot_query
        group by col_name
        having count(distinct dump(col_value)) > 1 }';   
   v_sql := replace(replace(replace(v_sql,'%PH_KEY%',v_key)
                            ,'%PH_ALL_COLS%',v_all_cols)
                    ,'%PH_ALL_CASTED_COLS%', v_all_casted_cols) ;
   dbms_output.put_line (v_sql);
   DBMS_TF.TRACE(p_tab);
   DBMS_TF.TRACE(p_key);
  return v_sql;
end;
/

Listing 6: The implementation of the SQL macro FIND_DIFF

The function accepts one table parameter (DBMS_TF.TABLE_T) and two column list parameters (DBMS_TF.COLUMNS_T): the one to pass all columns that make up the key (p_keys) and the one to pass the columns which we don’t want to compare (p_exclude_cols). The implementation is straightforward. Loop over the columns, check whether we should process them, handle conversion depending from the datatype (line 14-31). This kind of processing was a part of almost all examples for table macros so far. Then we just build the query. I prefer to write down the whole query first. I’m using placeholders, i.e. %PH_ALL_CASTED_COLS%, where the content of other variables should be. But I don’t bother with concatenating upfront, but instead replace those placeholders afterwards (lines 47-49). I think it makes it just easier to see, understand and edit the query as a whole.

Quite important is the line 35. Simply put the table parameter name in your query where you want to select from it. Whatever it will be – table, view, subquery, etc. – it will be just substituted later.

Let’s test the function:

SQL> with data as (select * from diff_test where vers_id < 3)
  2  select * from find_diff(data, columns(id, vers_id));

P_KEYS           COL_NAME OLD_VALUE            NEW_VALUE           
---------------- -------- -------------------- --------------------
4711#1->4711#2   C1       100                  200                 
4711#1->4711#2   CREATED  2020-12-22 11:45:28. 2020-12-22 11:45:28.
                          892617000            951802000    

-- Exclude the column CREATED
SQL> with data as (select * from diff_test where vers_id < 3)
  2  select * from find_diff(data, columns(id, vers_id), columns(created));

P_KEYS           COL_NAME OLD_VALUE            NEW_VALUE           
---------------- -------- -------------------- --------------------
4711#1->4711#2   C1       100                  200                 

-- Compare 2nd and 3rd versions
SQL> with data as (select * from diff_test where vers_id > 1)
  2  select * from find_diff(data, columns(id, vers_id), columns(created));

P_KEYS           COL_NAME OLD_VALUE            NEW_VALUE           
---------------- -------- -------------------- --------------------
4711#2->4711#3   C2       A                    B                   
4711#2->4711#3   C3       2020-09-25 00:00:00  2020-09-27 00:00:00

Listing 7: test the function

As you can see in the first output, comparing technical audit columns such as CREATED, MODIFIED and so on, may not be what you want and this is when the introduced parameter p_exclude_cols becomes handy.

DBMS_TF.TRACE

Did you notice two calls to DBMS_TF.TRACE at the end of the function? If you already played around with polymorphic table functions (PTF) in 18c and above, you probably already know this procedure. In fact, it’s a wrapper about dbms_output.put_line and makes it easy to output the internal structures used in context of PTF. Since we are using some of these structures (DBMS_TF.TABLE_T and DBMS_TF.COLUMNS_T) we can also call this logging procedure just as we did in the DESCRIBE method of the PTF.

........tab.column[1] = 
pass through column
NOT a for read column
__________Type:...............NUMBER
__________Max Length:.........22
__________Name:..............."ID"
__________Name Length:........4
__________Precision:..........0
__________Scale:..............-127

........tab.column[2] = 
pass through column
NOT a for read column
__________Type:...............NUMBER
__________Max Length:.........22
__________Name:..............."VERS_ID"
__________Name Length:........9
__________Precision:..........0
__________Scale:..............-127

........tab.column[3] = 
pass through column
NOT a for read column
__________Type:...............TIMESTAMP
__________Max Length:.........20
__________Name:..............."CREATED"
__________Name Length:........9

........tab.column[4] = 
pass through column
NOT a for read column
__________Type:...............NUMBER
__________Max Length:.........22
__________Name:..............."C1"
__________Name Length:........4
__________Precision:..........0
__________Scale:..............-127

........tab.column[5] = 
pass through column
NOT a for read column
__________Type:...............VARCHAR2
__________Max Length:.........20
__________Name:..............."C2"
__________Name Length:........4
__________Charset Id:.........AL32UTF8
__________Collation:..........USING_NLS_COMP

........tab.column[6] = 
pass through column
NOT a for read column
__________Type:...............DATE
__________Max Length:.........21
__________Name:..............."C3"
__________Name Length:........4

........columns[1] = 
"ID"
........columns[2] = 
"VERS_ID"

Listing 8: The output of DBMS_TF.TRACE

What happens in the background

It was very interesting to see what query will be actually executed. As shown in Listing 9, you can use DBMS_UTILITY.EXPAND_SQL_TEXT for it. I have formatted the output a little. Quite extensive transformations here! You can see how Oracle does UNPIVOT: for three columns there are three queries combined with UNION ALL. But also important are the highlighted lines where we can see our subquery used as the input for the macro function. Remember, it was just one parameter on line 35 of the Listing 6 where we passed it.

DECLARE
  l_clob CLOB;
BEGIN
  DBMS_UTILITY.expand_sql_text (
    input_sql_text  => 'with data as (select * from diff_test where vers_id > 1)
                        select * from find_diff(data, columns(id, vers_id), columns(created))',
    output_sql_text => l_clob  );
  DBMS_OUTPUT.put_line(l_clob);
END;
/

SELECT "A1"."P_KEYS" "P_KEYS","A1"."COL_NAME" "COL_NAME","A1"."OLD_VALUE" "OLD_VALUE","A1"."NEW_VALUE" "NEW_VALUE" 
FROM  (SELECT "A6"."P_KEYS" "P_KEYS","A6"."COL_NAME" "COL_NAME","A6"."OLD_VALUE" "OLD_VALUE","A6"."NEW_VALUE" "NEW_VALUE" 
       FROM  (SELECT LISTAGG("A7"."P_KEY",'->') WITHIN GROUP ( ORDER BY "A7"."P_KEY") "P_KEYS"
                    ,"A7"."COL_NAME" "COL_NAME"
                    ,MIN("A7"."COL_VALUE") KEEP (DENSE_RANK FIRST  ORDER BY "A7"."P_KEY") "OLD_VALUE"
                    ,MAX("A7"."COL_VALUE") KEEP (DENSE_RANK FIRST  ORDER BY "A7"."P_KEY" DESC ) "NEW_VALUE" 
              FROM  (SELECT "A2"."P_KEY" "P_KEY","A2"."COL_NAME" "COL_NAME","A2"."COL_VALUE" "COL_VALUE" 
                     FROM  (SELECT "A10"."P_KEY" "P_KEY",'C1' "COL_NAME","A10"."C1" "COL_VALUE" 
                            FROM  (SELECT "A9"."P_KEY" "P_KEY",TO_CHAR("A9"."C1") "C1","A9"."C2" "C2",TO_CHAR("A9"."C3",'YYYY-MM-DD HH24:MI:SS') "C3" 
                                   FROM  (SELECT TO_CHAR("A3"."ID")||'#'||TO_CHAR("A3"."VERS_ID") "P_KEY","A3"."C1" "C1","A3"."C2" "C2","A3"."C3" "C3" 
                                          FROM  (SELECT "A4"."ID" "ID","A4"."VERS_ID" "VERS_ID","A4"."CREATED" "CREATED","A4"."C1" "C1","A4"."C2" "C2","A4"."C3" "C3" 
                                                 FROM  (SELECT "A5"."ID" "ID","A5"."VERS_ID" "VERS_ID","A5"."CREATED" "CREATED","A5"."C1" "C1","A5"."C2" "C2","A5"."C3" "C3" 
                                                        FROM "ONFTEST"."DIFF_TEST" "A5" 
                                                        WHERE "A5"."VERS_ID">1) "A4"
                                                ) "A3"
                                          ) "A9"
                                   ) "A10" 
                            UNION ALL 
                            SELECT "A10"."P_KEY" "P_KEY",'C2' "COL_NAME","A10"."C2" "COL_VALUE" 
                            FROM  (SELECT "A9"."P_KEY" "P_KEY",TO_CHAR("A9"."C1") "C1","A9"."C2" "C2",TO_CHAR("A9"."C3",'YYYY-MM-DD HH24:MI:SS') "C3" 
                                   FROM  (SELECT TO_CHAR("A3"."ID")||'#'||TO_CHAR("A3"."VERS_ID") "P_KEY","A3"."C1" "C1","A3"."C2" "C2","A3"."C3" "C3" 
                                          FROM  (SELECT "A4"."ID" "ID","A4"."VERS_ID" "VERS_ID","A4"."CREATED" "CREATED","A4"."C1" "C1","A4"."C2" "C2","A4"."C3" "C3" 
                                                 FROM  (SELECT "A5"."ID" "ID","A5"."VERS_ID" "VERS_ID","A5"."CREATED" "CREATED","A5"."C1" "C1","A5"."C2" "C2","A5"."C3" "C3" 
                                                        FROM "ONFTEST"."DIFF_TEST" "A5" 
                                                        WHERE "A5"."VERS_ID">1) "A4"
                                                 ) "A3"
                                          ) "A9"
                                   ) "A10" 
                            UNION ALL 
                            SELECT "A10"."P_KEY" "P_KEY",'C3' "COL_NAME","A10"."C3" "COL_VALUE" 
                            FROM  (SELECT "A9"."P_KEY" "P_KEY",TO_CHAR("A9"."C1") "C1","A9"."C2" "C2",TO_CHAR("A9"."C3",'YYYY-MM-DD HH24:MI:SS') "C3" 
                                   FROM  (SELECT TO_CHAR("A3"."ID")||'#'||TO_CHAR("A3"."VERS_ID") "P_KEY","A3"."C1" "C1","A3"."C2" "C2","A3"."C3" "C3" 
                                          FROM  (SELECT "A4"."ID" "ID","A4"."VERS_ID" "VERS_ID","A4"."CREATED" "CREATED","A4"."C1" "C1","A4"."C2" "C2","A4"."C3" "C3" 
                                                 FROM  (SELECT "A5"."ID" "ID","A5"."VERS_ID" "VERS_ID","A5"."CREATED" "CREATED","A5"."C1" "C1","A5"."C2" "C2","A5"."C3" "C3" 
                                                        FROM "ONFTEST"."DIFF_TEST" "A5" 
                                                        WHERE "A5"."VERS_ID">1) "A4"
                                                 ) "A3"
                                          ) "A9"
                                   ) "A10"
                            ) "A2"
                    ) "A7" 
              GROUP BY "A7"."COL_NAME" 
              HAVING COUNT(DISTINCT DUMP("A7"."COL_VALUE"))>1) "A6"
        ) "A1"

Listing 9: the actual query looks quite different

Bonus: Flashback query

You do not always have two rows in one table to compare. What if you only have one row which changes over time? Someone has just changed the row of interest and you want to know what is different? There are several ways to do it. But if the chances are good that the version you need can still be found in the undo segment, you can use flashback query and our SQL macro function together to answer the question:

SQL> drop table test_emp_diff;

Table TEST_EMP_DIFF dropped.

SQL> 
SQL> create table test_emp_diff as select * from scott.emp;

Table TEST_EMP_DIFF created.

SQL> column tstp new_value tstp
SQL> select  to_char(systimestamp, 'yyyy-mm-dd HH24:mi:ss.ff6') tstp from dual;

TSTP                      
--------------------------
2020-12-23 09:28:27.692766

SQL> update test_emp_diff set Job = 'DBA' where ename = 'SCOTT';

1 row updated.

SQL> with dat1 as (
  2    select 2 vers, a.* 
  3    from   test_emp_diff a
  4    union all 
  5    select 1 vers, b.* 
  6    from   test_emp_diff 
  7           as of timestamp (to_timestamp('&tstp', 'yyyy-mm-dd HH24:mi:ss.ff6')) b
  8    )
  9  , dat as (Select * from dat1 where ename ='SCOTT')
 10  select * from find_diff (dat, columns(ename, vers));

P_KEYS           COL_NAME OLD_VALUE            NEW_VALUE           
---------------- -------- -------------------- --------------------
SCOTT#1->SCOTT#2 JOB      ANALYST              DBA

Listing 10: Using SQL macro with flashback query

Where to test?

You can test table SQL macros in Oracle Database from 19.6. You can also play with them in LiveSQL. But the best thing is that you can test both scalar and table macros in the new release Oracle 21c with your free tier in Oracle cloud! Really great news! Go for it!

A minor difference I’ve noticed in 21c is that prefixing the table parameter with the function name doesn’t work anymore. I have not yet found out if this was intentional or a bug.

BANNER                                                                          
-----------------------------------------------------------------------------
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production

SQL> 
SQL> create or replace function test_with_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
  2  begin
  3    return 'test_with_function.p_tab';
  4  end;
  5  /

Function TEST_WITH_FUNCTION compiled

SQL> 
SQL> create or replace function test_without_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
  2  begin
  3    return 'p_tab';
  4  end;
  5  /

Function TEST_WITHOUT_FUNCTION compiled

SQL> 
SQL> select * from test_with_function(dual);

SQL Error: ORA-00942: table or view does not exist
00942. 00000 -  "table or view does not exist"

SQL> 
SQL> select * from test_without_function(dual);

D
-
X

---------------
SQL> select banner from v$version;

BANNER                                                                          
-----------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

SQL> 
SQL> create or replace function test_with_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
  2  begin
  3    return 'test_with_function.p_tab';
  4  end;
  5  /

Function TEST_WITH_FUNCTION compiled

SQL> 
SQL> create or replace function test_without_function (p_tab dbms_tf.table_t) return varchar2 sql_macro is
  2  begin
  3    return 'p_tab';
  4  end;
  5  /

Function TEST_WITHOUT_FUNCTION compiled

SQL> 
SQL> select * from test_with_function(dual);

D
-
X

SQL> 
SQL> select * from test_without_function(dual);

D
-
X

Listing 11: Differences in parameter naming 19c vs. 21c

This will probably be my last blog post this year (but who knows?). Next year I promise to post not only about SQL macros 😉 But at least in the next time I am eager to share two more interesting use cases. Merry Christmas! Have a good and healthy 2021!

SQLORA

Just some more Oracle stuff