In a previous post on SQL macros in Oracle Database 20c we saw how SQL macros can be used to create a kind of “parameterized” views to establish a simplified access tier to temporal data. In this post I’d like to explore more possibilities to hide the complexity of SQL statements behind a functional syntax provided by using SQL macros. As an example we’ll stay with a temporal data introduced in the previous post and explore how we can do a temporal join of this versioned data.
What is a temporal join?
About a month ago I had the opportunity to join a Data Warehouse Design Patterns Workshop by Roelant Vos. Yes, despite of #COVID19, at that time it was not yet quite unthinkable to fly to Zurich to attend a workshop in person 🙁
Being one of the many interesting topics of this three-day training, the question of joining temporal data arises e.g. when calculating the dimensions from several versioned tables.
Unlike “point in time” queries, temporal joins produce all possible combinations of validity intervals in the source tables. Although the way you can do it was not new for me, it is quite complex and it led me to the idea of testing the use of SQL macros for that task.
My colleague Philipp Salvisberg, whose data model and data I’m using in this example has blogged about this kind of queries and I have tested the use of row pattern matching clause to merge the resulting time intervals a time ago. I will use Philipp’s picture to show what we are doing:
We produce all the time intervals containing different data from joined tables. The exact way of doing this may differ depending on whether you are using open or closed time intervals, can you have gaps, would you like to merge adjacent intervals with the same data and so on. But the exact shape of the query is not important for our purpose. I’d only like to demonstrate how we can implement some kind of functional syntax to support complex types of queries. Let’s say, we prefer following solution for the problem:
WITH joined AS (
-- gap-aware temporal join
-- produces result_cols to calculate new_group in the subsequent query
SELECT e.empno,
g.valid_from,
LEAST(
e.valid_to,
d.valid_to,
LEAD(g.valid_from - 1, 1, e.valid_to) OVER(
PARTITION BY e.empno ORDER BY g.valid_from
)
) AS valid_to,
e.ename,
e.mgr,
e.hiredate,
e.sal,
e.comm,
e.deptno,
d.dname
FROM empv e
INNER JOIN (SELECT valid_from FROM empv
UNION
SELECT valid_from FROM deptv
UNION
SELECT valid_to + 1 FROM empv
WHERE valid_to != DATE '9999-12-31'
UNION
SELECT valid_to + 1 FROM deptv
WHERE valid_to != DATE '9999-12-31') g
ON g.valid_from BETWEEN e.valid_from AND e.valid_to
INNER JOIN deptv d
ON d.deptno = e.deptno AND g.valid_from BETWEEN d.valid_from AND d.valid_to
)
SELECT empno
, valid_from
, valid_to
, ename
, mgr
, hiredate
, sal
, comm
, deptno
, dname
FROM joined
MATCH_RECOGNIZE (
PARTITION BY empno
, ename
, mgr
, hiredate
, sal
, comm
, deptno
, dname
ORDER BY valid_from
MEASURES FIRST(valid_from) valid_from, LAST(valid_to) valid_to
ONE ROW PER MATCH
PATTERN ( strt nxt* )
DEFINE nxt as valid_from = prev(valid_to) + 1
)
Listing 1: The shape of the query doing temporal join
Well, quite lengthy and complex, isn’t it? Note, that I reduced the join from the Figure 1 to join only two tables. What I’d like to have, is a SQL macro which I could call instead of this whole complex query. Just like this:
SQL>
SQL>
SQL> SELECT * FROM temporal_join ( <parameter> );
SQL>
SQL>
Listing 2: the desired functional syntax
Would it be possible? If you’ve read my previous posts about SQL macros, you’ll be confident it is. Why not? Let’s start.
What parameter do we need to dynamically build a statement like the one from Listing 1? We are joining two tables (let’s start with two for now) so we should know these tables and have two table parameters (DBMS_TF.Table_t). Many things are convention based in this simple example: I suppose to know the the validity column names, that we are using closed time intervals, that we are using a special date 31-DEC-9999 and so on. But there are also things I cannot know and must pass them as parameters. I have to know the business key of the first table (EMPNO) because it is used in PARTITION clause of the analytical function LEAD (line 9 in Listing 1). We define the parameter bk_cols
of type DBMS_TF.COLUMNS_T
for this purpose and can pass just one or even a list of columns in case of a multi-column key.
And I have to know how to join those two tables. For simplification I suppose we will always have an equality conditions and I guess it will be true in 99% of cases. Then we just have to know the column names of the columns used as join criteria. I decided to use just one column list (join_cols DBMS_TF.COLUMNS_T
) but in a special way: a pair of columns to join followed by another pair if any and so on. For example, if the join condition looks like t1.a=t2.b AND t1.c=t2.d
then you’d have COLUMNS(a,b,c,d).
The rest of the implementation is rather boring putting together the desired SQL. I put the function in the package TIME_MACHINE introduced in the previous post about point in time queries in a kind of “parameterized” views:
CREATE OR REPLACE PACKAGE time_machine IS
...
...
FUNCTION temporal_join(t1 DBMS_TF.Table_t
, t2 DBMS_TF.Table_t
, bk_cols DBMS_TF.COLUMNS_T
, join_cols DBMS_TF.COLUMNS_T)
RETURN VARCHAR2 SQL_MACRO(TABLE);
END time_machine;
/
CREATE OR REPLACE PACKAGE BODY time_machine IS
...
...
FUNCTION temporal_join(t1 DBMS_TF.Table_t
, t2 DBMS_TF.Table_t
, bk_cols DBMS_TF.COLUMNS_T
, join_cols DBMS_TF.COLUMNS_T)
RETURN VARCHAR2 SQL_MACRO(TABLE)
AS
-- convention based
v_dates varchar2(1000);
v_select varchar2(4000);
v_buskey varchar2(1000);
v_join_keys varchar2(1000);
v_name DBMS_ID;
v_column_list varchar2(1000);
v_mr varchar2(2000);
BEGIN
v_dates := q'[INNER JOIN (SELECT t1.valid_from FROM t1
UNION
SELECT t2.valid_from FROM t2
UNION
SELECT t1.valid_to + 1 FROM t1
WHERE t1.valid_to != DATE '9999-12-31'
UNION
SELECT t2.valid_to + 1 FROM t2
WHERE t2.valid_to != DATE '9999-12-31') g
ON g.valid_from BETWEEN t1.valid_from AND t1.valid_to]';
FOR I IN 1..t1.column.count LOOP
v_name := t1.column(i).description.name;
IF v_name NOT IN ('"VALID_FROM"', '"VALID_TO"') THEN
V_Select := v_select ||', t1.'||v_name||' as '||
replace(t1.table_name,'"')||'_'||replace(v_name,'"');
v_column_list := v_column_list ||', '||
replace(t1.table_name,'"')||'_'||replace(v_name,'"');
END IF;
IF v_name MEMBER OF bk_cols THEN
v_buskey := v_buskey || ', t1.' || v_name;
END IF;
END LOOP;
FOR I IN 1..t2.column.count LOOP
v_name := t2.column(i).description.name;
IF v_name NOT IN ('"VALID_FROM"', '"VALID_TO"') THEN
V_Select := v_select ||', t2.'||v_name||' as '||
replace(t2.table_name,'"')||'_'||replace(v_name,'"');
v_column_list := v_column_list ||', '||
replace(t2.table_name,'"')||'_'||replace(v_name,'"');
END IF;
END LOOP;
v_column_list := ltrim(v_column_list , ',');
v_buskey := ltrim(v_buskey , ',');
v_select := 'SELECT '||
ltrim(v_select , ',')||
q'[, g.valid_from,
LEAST(
t1.valid_to,
t2.valid_to,
LEAD(g.valid_from - 1, 1, t1.valid_to) OVER(
PARTITION BY ]'||v_buskey||q'[ ORDER BY g.valid_from
)
) AS valid_to
FROM t1 ]'
|| v_dates;
FOR i in 1..join_cols.count LOOP
IF MOD(i,2) = 0 THEN
v_join_keys := 'AND t1.'||join_cols(i-1)|| '= t2.'||join_cols(i);
END IF;
END LOOP;
v_join_keys := 'ON '||ltrim(v_join_keys , 'AND');
v_select := v_select ||' INNER JOIN t2 '||v_join_keys||
' AND g.valid_from BETWEEN t2.valid_from AND t2.valid_to ';
v_mr := ' MATCH_RECOGNIZE ( PARTITION BY '||v_column_list||q'[
ORDER BY valid_from
MEASURES FIRST(valid_from) valid_from, LAST(valid_to) valid_to
ONE ROW PER MATCH
PATTERN ( strt nxt* )
DEFINE nxt as valid_from = prev(valid_to) + 1
)]';
v_select := 'SELECT '||v_column_list||
', r_valid_from valid_from, r_valid_to valid_to '||
'FROM ('||v_select ||') '|| v_mr;
return v_select ;
END;
END time_machine;
/
Listing 3: SQL macro function implementing temporal join
Note, how we are using the names of the table parameters T1 and T2 as table names across the whole strings. Inside the strings, without any concatenation. Remember, they will be replaced with real table names just at the time when the result string is returned out of the function.
To be able to return a unique set of columns we are prefixing all column names with a table name. If you don’t like it, you can do something else, but you should have some convention at place. Otherwise you’ll get “duplicate column name” error. One of the solutions could also be passing column list parameters for each table to define what columns should be in the result set and thus being able to control the uniqueness yourself .
Well, now try to call the function:
SQL> SELECT *
FROM time_machine.temporal_join (t1 => empv
,t2 => deptv
,bk_cols => columns(empno)
,join_cols => columns( deptno, deptno))
WHERE empv_empno = 7788
EMPV_EMPVID EMPV_EMPNO EMPV_ENAME EMPV_JOBNO EMPV_MGR EMPV_HIREDATE EMPV_SAL EMPV_COMM EMPV_DEPTNO DEPTV_DEPTVID DEPTV_DEPTNO DEPTV_DNAME DEPTV_LOC VALID_FROM VALID_TO
----------- ---------- -------------- ---------- ---------- ------------- ---------- ---------- ----------- ------------- ------------ -------------- ------------- ---------- ---------
8 7788 SCOTT 5 7566 19-APR-87 3000 20 2 20 RESEARCH DALLAS 19-APR-87 31-DEC-89
22 7788 Scott 5 7566 19-APR-87 3000 20 2 20 RESEARCH DALLAS 01-JAN-90 28-FEB-90
22 7788 Scott 5 7566 19-APR-87 3000 20 6 20 Research DALLAS 01-MAR-90 31-MAR-90
22 7788 Scott 5 7566 19-APR-87 3000 20 10 20 Research Dallas 01-APR-90 31-MAR-91
36 7788 Scott 5 7566 19-APR-87 3300 20 10 20 Research Dallas 01-APR-91 31-DEC-99
5 rows selected.
Listing 4: Calling a SQL macro function
Looks good! It did what we expected. We can hide really complex implementation behind just one function call. And remember that in fact no PL/SQL functions are involved at runtime, slowing down your query. The SQL macro function is called only once at parse time and generates the query text which will be incorporated into the invoking query. What actually runs, is a SQL statement much like the one in Listing 1. You can verify it by calling DBMS_UTILITY.EXPAND_SQL_TEXT
as I did in a previous post.
Joining more than two tables
Now what about joining three tables? Can we just nest the function calls? Unfortunately not. Neither as inline view, nor as conventional view, nor as common table expression (CTE) aka named subquery aka WITH clause subquery. That seems to be an intended restriction since we get a specific error message: ORA-64630: unsupported use of SQL macro: use of SQL macro inside WITH clause is not supported,
even with conventional views.
>> WITH empdept AS(
SELECT *
FROM time_machine.temporal_join (t1 => empv
,t2 => deptv
,bk_cols => columns(empno)
,join_cols => columns( deptno, deptno))
)
SELECT *
FROM time_machine.temporal_join (t1 => empdept
,t2 => jobv
,bk_cols => columns(empno)
,join_cols => columns( jobno, jobno))
WHERE empv_empno = 7788
*
Error at line 0
ORA-64630: unsupported use of SQL macro: use of SQL macro inside WITH clause is not supported
Listing 5: Using of SQL macro inside WITH clause or views is not supported
So I’d suggest just to implement the second, third, etc. function which accept three, four and more tables. I know, this doesn’t look like an elegant solution. But I don’t see the better one at the moment. In fact you put all these overloaded functions in a package. Another thing I would do, is to provide some generic helper functions to return building blocks for the query in every situation. The task of the SQL macro function would just be to correctly pass their parameters calling these helper functions. If you start with two macros for joining two and three tables at once and one day you’ll see that you need to join four tables, then adding a new overloaded SQL macro function would be a matter of minutes. I’ll try to test it soon and will provide the implementation here. For now, I just implemented two very similar overloaded functions in a package to show that it works.
CREATE OR REPLACE PACKAGE time_machine IS FUNCTION vw (p_tab DBMS_TF.Table_t, p_eff_date date) RETURN VARCHAR2 SQL_MACRO; FUNCTION vw (p_tab DBMS_TF.Table_t, p_eff_date date, p_exp_date date) RETURN VARCHAR2 SQL_MACRO; FUNCTION temporal_join(t1 DBMS_TF.Table_t , t2 DBMS_TF.Table_t , bk_cols DBMS_TF.COLUMNS_T , join_cols DBMS_TF.COLUMNS_T) RETURN VARCHAR2 SQL_MACRO(TABLE); FUNCTION temporal_join(t1 DBMS_TF.Table_t , t2 DBMS_TF.Table_t , t3 DBMS_TF.Table_t , bk_cols DBMS_TF.COLUMNS_T , join_cols1 DBMS_TF.COLUMNS_T , join_cols2 DBMS_TF.COLUMNS_T) RETURN VARCHAR2 SQL_MACRO(TABLE); END time_machine; / CREATE OR REPLACE PACKAGE BODY time_machine IS FUNCTION vw (p_tab DBMS_TF.Table_t, p_eff_date date) RETURN VARCHAR2 SQL_MACRO IS v_query varchar2(500); BEGIN v_query := q'!SELECT * FROM p_tab WHERE p_eff_date between valid_from AND valid_to!'; RETURN v_query; END; FUNCTION vw (p_tab DBMS_TF.Table_t, p_eff_date date, p_exp_date date) RETURN VARCHAR2 SQL_MACRO IS v_query varchar2(500); BEGIN v_query := q'!SELECT * FROM p_tab WHERE valid_from &amp;lt;= p_exp_date AND valid_to &amp;gt;= p_eff_date!'; RETURN v_query; END; FUNCTION temporal_join(t1 DBMS_TF.Table_t , t2 DBMS_TF.Table_t , bk_cols DBMS_TF.COLUMNS_T , join_cols DBMS_TF.COLUMNS_T) RETURN VARCHAR2 SQL_MACRO(TABLE) AS -- convention based v_dates varchar2(1000); v_select varchar2(4000); v_buskey varchar2(1000); v_join_keys varchar2(1000); v_name DBMS_ID; v_column_list varchar2(1000); v_mr varchar2(2000); BEGIN v_dates := q'[INNER JOIN (SELECT t1.valid_from FROM t1 UNION SELECT t2.valid_from FROM t2 UNION SELECT t1.valid_to + 1 FROM t1 WHERE t1.valid_to != DATE '9999-12-31' UNION SELECT t2.valid_to + 1 FROM t2 WHERE t2.valid_to != DATE '9999-12-31') g ON g.valid_from BETWEEN t1.valid_from AND t1.valid_to]'; FOR I IN 1..t1.column.count LOOP v_name := t1.column(i).description.name; IF v_name NOT IN ('"VALID_FROM"', '"VALID_TO"') THEN V_Select := v_select ||', t1.'||v_name||' as '||replace(t1.table_name,'"')||'_'||replace(v_name,'"'); v_column_list := v_column_list ||', '||replace(t1.table_name,'"')||'_'||replace(v_name,'"'); END IF; IF v_name MEMBER OF bk_cols THEN v_buskey := v_buskey || ', t1.' || v_name; END IF; END LOOP; FOR I IN 1..t2.column.count LOOP v_name := t2.column(i).description.name; IF v_name NOT IN ('"VALID_FROM"', '"VALID_TO"') THEN V_Select := v_select ||', t2.'||v_name||' as '||replace(t2.table_name,'"')||'_'||replace(v_name,'"'); v_column_list := v_column_list ||', '||replace(t2.table_name,'"')||'_'||replace(v_name,'"'); END IF; END LOOP; v_column_list := ltrim(v_column_list , ','); v_buskey := ltrim(v_buskey , ','); v_select := 'SELECT '|| ltrim(v_select , ',')|| q'[, g.valid_from, LEAST( t1.valid_to, t2.valid_to, LEAD(g.valid_from - 1, 1, t1.valid_to) OVER( PARTITION BY ]'||v_buskey||q'[ ORDER BY g.valid_from ) ) AS valid_to FROM t1 ]' || v_dates; FOR i in 1..join_cols.count LOOP IF MOD(i,2) = 0 THEN v_join_keys := 'AND t1.'||join_cols(i-1)|| '= t2.'||join_cols(i); END IF; END LOOP; v_join_keys := 'ON '||ltrim(v_join_keys , 'AND'); v_select := v_select ||' INNER JOIN t2 '||v_join_keys|| ' AND g.valid_from BETWEEN t2.valid_from AND t2.valid_to '; v_mr := ' MATCH_RECOGNIZE ( PARTITION BY '||v_column_list||q'[ ORDER BY valid_from MEASURES FIRST(valid_from) R_valid_from, LAST(valid_to) R_valid_to ONE ROW PER MATCH PATTERN ( strt nxt* ) DEFINE nxt as valid_from = prev(valid_to) + 1 )]'; v_select := 'SELECT '||v_column_list|| ', r_valid_from valid_from, r_valid_to valid_to '|| 'FROM ('||v_select ||') '|| v_mr; return v_select ; END; FUNCTION temporal_join(t1 DBMS_TF.Table_t, t2 DBMS_TF.Table_t, t3 DBMS_TF.Table_t, bk_cols DBMS_TF.COLUMNS_T, join_cols1 DBMS_TF.COLUMNS_T, join_cols2 DBMS_TF.COLUMNS_T) RETURN VARCHAR2 SQL_MACRO(TABLE) AS v_h varchar2(300); -- convention based v_dates varchar2(1000); v_select varchar2(4000); v_buskey varchar2(1000); v_join_keys varchar2(1000); v_name DBMS_ID; v_column_list varchar2(1000); v_mr varchar2(2000); BEGIN v_dates := q'[INNER JOIN (SELECT t1.valid_from FROM t1 UNION SELECT t2.valid_from FROM t2 UNION SELECT t2.valid_from FROM t2 UNION SELECT t1.valid_to + 1 FROM t1 WHERE t1.valid_to != DATE '9999-12-31' UNION SELECT t2.valid_to + 1 FROM t2 WHERE t2.valid_to != DATE '9999-12-31' UNION SELECT t2.valid_to + 1 FROM t2 WHERE t2.valid_to != DATE '9999-12-31' ) g ON g.valid_from BETWEEN t1.valid_from AND t1.valid_to]'; FOR I IN 1..t1.column.count LOOP v_name := t1.column(i).description.name; IF v_name NOT IN ('"VALID_FROM"', '"VALID_TO"') THEN V_Select := v_select ||', t1.'||v_name||' as '||replace(t1.table_name,'"')||'_'||replace(v_name,'"'); v_column_list := v_column_list ||', '||replace(t1.table_name,'"')||'_'||replace(v_name,'"'); END IF; IF v_name MEMBER OF bk_cols THEN v_buskey := v_buskey || ', t1.' || v_name; END IF; END LOOP; FOR I IN 1..t2.column.count LOOP v_name := t2.column(i).description.name; IF v_name NOT IN ('"VALID_FROM"', '"VALID_TO"') THEN V_Select := v_select ||', t2.'||v_name||' as '||replace(t2.table_name,'"')||'_'||replace(v_name,'"'); v_column_list := v_column_list ||', '||replace(t2.table_name,'"')||'_'||replace(v_name,'"'); END IF; END LOOP; FOR I IN 1..t3.column.count LOOP v_name := t3.column(i).description.name; IF v_name NOT IN ('"VALID_FROM"', '"VALID_TO"') THEN V_Select := v_select ||', t3.'||v_name||' as '||replace(t3.table_name,'"')||'_'||replace(v_name,'"'); v_column_list := v_column_list ||', '||replace(t3.table_name,'"')||'_'||replace(v_name,'"'); END IF; END LOOP; v_column_list := ltrim(v_column_list , ','); v_buskey := ltrim(v_buskey , ','); v_select := 'SELECT '|| ltrim(v_select , ',')|| q'[, g.valid_from, LEAST( t1.valid_to, t2.valid_to, LEAD(g.valid_from - 1, 1, t1.valid_to) OVER( PARTITION BY ]'||v_buskey||q'[ ORDER BY g.valid_from ) ) AS valid_to FROM t1 ]' || v_dates; FOR i in 1..join_cols1.count LOOP IF MOD(i,2) = 0 THEN v_join_keys := 'AND t1.'||join_cols1(i-1)|| '= t2.'||join_cols1(i); END IF; END LOOP; v_join_keys := 'ON '||ltrim(v_join_keys , 'AND'); v_select := v_select ||' INNER JOIN t2 '||v_join_keys|| ' AND g.valid_from BETWEEN t2.valid_from AND t2.valid_to '; FOR i in 1..join_cols2.count LOOP IF MOD(i,2) = 0 THEN v_join_keys := 'AND t1.'||join_cols2(i-1)|| '= t3.'||join_cols2(i); END IF; END LOOP; v_join_keys := 'ON '||ltrim(v_join_keys , 'AND'); v_select := v_select ||' INNER JOIN t3 '||v_join_keys|| ' AND g.valid_from BETWEEN t3.valid_from AND t3.valid_to '; v_mr := ' MATCH_RECOGNIZE ( PARTITION BY '||v_column_list||q'[ ORDER BY valid_from MEASURES FIRST(valid_from) r_valid_from, LAST(valid_to) r_valid_to ONE ROW PER MATCH PATTERN ( strt nxt* ) DEFINE nxt as valid_from = prev(valid_to) + 1 )]'; v_select := 'SELECT '||v_column_list|| ', r_valid_from valid_from, r_valid_to valid_to '|| 'FROM ('||v_select ||') '|| v_mr; return v_select ; END; END time_machine; /
Listing 6: Implementation of overloaded SQL macro functions
Because the code gets very long really fast, I’ll provide it initially collapsed. Listing 7 shows calling the function for three tables:
SQL> SELECT *
FROM time_machine.temporal_join( t1 => empv
, t2 => deptv
, t3 => jobv
, bk_cols => columns(empno)
, join_cols1 => columns(deptno, deptno)
, join_cols2 => columns(jobno, jobno))
where empv_empno = 7788
EMPV_EMPVID EMPV_EMPNO EMPV_ENAME EMPV_JOBNO EMPV_MGR EMPV_HIREDATE EMPV_SAL EMPV_COMM EMPV_DEPTNO DEPTV_DEPTVID DEPTV_DEPTNO DEPTV_DNAME DEPTV_LOC JOBV_JOBVID JOBV_JOBNO JOBV_JOB VALID_FROM VALID_TO
----------- ---------- -------------- ---------- ---------- ------------- ---------- ---------- ----------- ------------- ------------ -------------- ------------- ----------- ---------- --------- ---------- ---------
8 7788 SCOTT 5 7566 19-APR-87 3000 20 2 20 RESEARCH DALLAS 5 5 ANALYST 19-APR-87 31-DEC-89
22 7788 Scott 5 7566 19-APR-87 3000 20 2 20 RESEARCH DALLAS 5 5 ANALYST 01-JAN-90 28-FEB-90
22 7788 Scott 5 7566 19-APR-87 3000 20 6 20 Research DALLAS 10 5 Analyst 01-MAR-90 31-MAR-90
22 7788 Scott 5 7566 19-APR-87 3000 20 10 20 Research Dallas 10 5 Analyst 01-APR-90 31-MAR-91
36 7788 Scott 5 7566 19-APR-87 3300 20 10 20 Research Dallas 10 5 Analyst 01-APR-91 31-DEC-99
5 rows selected.
Listing 7: Calling a SQL macro function to join three tables
Summary
Besides the certain practical value of the solution for doing temporal joins with just one function call, I think it is a good example of how we can hide complex, recurring, boring and, most important, error-prone SQL syntax inside a SQL macro function. If in the past you had certain biases against using the PL/SQL functions in SQL, remember again, SQL macros are different! They will only be executed at parse time. No extra PL/SQL at runtime, no context switch!
Last but not least, thank you, Roelant Vos, for the insightful three days! And I think the implementation and most importantly the usage of even more of your patterns can be simplified using SQL macros in Oracle 20c.