SQL macros in Oracle 19c

Now that I have created a patched Oracle database docker image with Release Update 19.8 (19.8.0.0.DBRU:200714) to play with, I can finally start testing the backport of SQL macros! (UPDATE: As of now you can play with SQL macros on Autonomous Databases in Oracle Cloud (also free tier!) and on livesql.oracle.com)

SQL Macros have been introduced in Oracle 20c. As of now, Oracle 20c is only available as preview version in Oracle Cloud. That’s why I was quite excited when Connor McDonald pointed out on Twitter that they were backported to 19c. Let’s see if it works, and how.

It is a bug 30324180, where the backport is mentioned. And the fix is first included in 19.6. So, strictly speaking, you do not need 19.8 that I’m working with – 19.6 should also be just fine. (UPDATE: 19.6 is mentioned in the bug document, but I was able to run the example from this post on an ADW database in the cloud with 19.5 right now)

Let’s start with a basic example from my previous post – a scalar macro function JOB_DURATION:

SQL> CREATE OR REPLACE FUNCTION job_duration RETURN VARCHAR2 
  2  SQL_MACRO(SCALAR) 
  3  AS
  4  BEGIN
  5     RETURN q'! FLOOR(MONTHS_BETWEEN (SYSDATE, hiredate)/12) !';
  6  END;
  7  /

Function JOB_DURATION compiled

LINE/COL  ERROR
--------- -----------------------------------------------
2/10      PLS-00103: Encountered the symbol "(" 
when expecting one of the following:     ...

Listing 1: no scalar macros in 19c

Couldn’t compile… Why? The error message complains about the parenthesis around SCALAR, but not about the word SQL_MACRO. There is a good hint in the description of the “backport” bug 30324180 to understand this. Scalar macros were intentionally not backported! So you only have table semantic macros (which are also the default in 20c) and it seems that you cannot explicitly specify the type. The good news here is that you don’t have to change anything when you migrate to 20c. So let’s focus on the table semantic functions then.

Get your data CSV formatted

As an example, I wanted to take something that would be useful in itself and at the same time can show the other capabilities of what can be achieved with table SQL macros. Since I work a lot as an ETL engineer, I am often faced with the task of exporting data as CSV. Not my favorite format, to be honest, especially because you lose a lot of metadata. But there are often project requirements you have no control over. And if you have to export data as CSV, it is very important that it is done consistently in the same way in larger projects. And that’s where SQL macros can be very helpful! Let’s see why and how.

Our goal is to develop a generic function:

  • which we can feed with arbitrary data stream
  • which handles incoming data in a consistent way according to established conventions
  • in particular, the function should produce a header row with column names, format all date and timestamp fields according to ISO 8601 and enclose all text fields with quotation marks, whereby these must first be “escaped” in the text
  • which we can query from getting CSV-formatted output

For example having the view EMPV defined in previous posts, the invocation will be like this:

SQL> select * from get_csv(empv);

CSV_ROW                                                                                             
----------------------------------------------------------------------------------------------------
"EMPVID","EMPNO","ENAME","JOBNO","MGR","HIREDATE","SAL","COMM","DEPTNO","VALID_FROM","VALID_TO"
2,7499,"ALLEN",2,7698,1981-02-20,1600,300,30,1981-02-20,1989-12-31
3,7521,"WARD",2,7698,1981-02-22,1250,500,30,1981-02-22,1989-12-31
4,7566,"JONES",4,7839,1981-04-02,2975,,20,1981-04-02,1989-12-31

Listing 2: the desired usage and output

What then happens with this output, for example, persisting it as a file, is out of scope here.

I wrote about the basics of table SQL macros in a couple of previous posts, for example about building hash keys with SQL macros, so I’m not going to repeat all explanations but rather to show that things are working exactly the same in 19c.

create or replace function get_csv (p_input_data dbms_tf.table_t
                                   ,p_delimiter varchar2 default ','
                                   ,p_quote varchar2 default '"'
                                   ,p_escape varchar2 default '\'
                                   ) return clob sql_macro as
v_sql clob;
v_name dbms_id;
v_header clob;
v_delimiter_rec varchar2(9):= '||'''||p_delimiter||'''||';
v_record clob;
v_rec varchar2(300);
begin
  for i in 1..p_input_data.column.count loop
    v_name := p_input_data.column(i).description.name;
    if p_input_data.column(i).description.type 
         in (dbms_tf.type_varchar2
           , dbms_tf.type_char
           , dbms_tf.type_clob) then 
      if p_quote is not null then 
          -- escape quotation characters in text fields
          v_rec := 'replace('||v_name||','''||p_quote||''','''||p_escape||p_quote||''')';
          -- enclose text in quotation marks
          v_rec :=  ''''||p_quote||'''||'||v_rec||'||'''||p_quote||'''';
      else 
          v_rec := v_name;
      end if;
    elsif p_input_data.column(i).description.type = dbms_tf.type_number then
      v_rec := 'to_char('||v_name||')';
    elsif p_input_data.column(i).description.type = dbms_tf.type_date then
      v_rec := 'to_char('||v_name||',''YYYY-MM-DD'')';
    elsif p_input_data.column(i).description.type = dbms_tf.type_timestamp then
      v_rec := 'to_char('||v_name||',''YYYY-MM-DD"T"HH24:MI:SS.FF6'')';      
    end if;
    v_header := v_header || p_delimiter || v_name;
    v_record := v_record || v_delimiter_rec || v_rec;
  end loop;
  v_header := LTRIM (v_header, p_delimiter);
  v_record := LTRIM (v_record, v_delimiter_rec);
  v_sql := q'[
     with d as (
       select '%PH_HEADER%' as csv_row, 1 as row_order  from dual 
       union all 
       select %PH_RECORD% as csv_row , 2 as row_order from get_csv.p_input_data
       ) 
     select csv_row from d order by row_order
           ]';
  v_sql := replace(replace (v_sql, '%PH_RECORD%', v_record), '%PH_HEADER%' , v_header);
  return v_sql;
end;
/

Listing 3: Implementation of SQL macro function GET_CSV

Let’s look at the implementation in more detail.

  • Line 1-5: Declare the function as sql_macro. If we want a generic function capable to work with arbitrary input data sets we need a parameter of type DBMS_TF.TABLE_T. This is the way we get metadata about the data set helping us to implement the required logic. As with any other PL/SQL function, we can also declare “scalar” parameter when needed along with their default values as well.
  • Line 13: Loop over the column collection we got in P_INPUT_DATA. This contains column names and also data types
  • Lines 15-33: Implement business requirements for handling different data types, e.g. escaping characters, quoting texts, formatting dates, etc. These rules can be as simple or as complex as you need. Important is that you can establish them only once and expose through SQL macro function
  • Lines 34-38: We prepare the strings containing a select list for the column header and the data itself: in a loop and also some post processing
  • Lines 39-47: Now we are ready to build the result string that will be returned from the function. It should be a valid SQL query that will be substituted in in FROM clause of the original query instead of a SQL macro function invocation. In our case this is a UNION ALL combining the header row and the actual formatted data. Ordering by artificial column row_order helps us to output header first. IMPORTANT! Note how we are using the parameter P_INPUT_DATA at line 43! It doesn’t make sense at first sight. But keep in mind, this reference will be substituted with the real table name or view or subquery, whatever you have used in original query as a parameter for the function. EVEN MORE IMPORTANT! This substitution seems only to take place upon returning the string from the function. If you were using the parameter P_INPUT_DATA elsewhere in the function, for instance, to query some data from the table for some reason, this wouldn’t work. See an example in my post about dynamic pivot.

Now we can call the function GET_CSV and produce the desired output:

SQL> select * from get_csv(empv);

CSV_ROW                                                                                             
----------------------------------------------------------------------------------------------------
"EMPVID","EMPNO","ENAME","JOBNO","MGR","HIREDATE","SAL","COMM","DEPTNO","VALID_FROM","VALID_TO"
2,7499,"ALLEN",2,7698,1981-02-20,1600,300,30,1981-02-20,1989-12-31
3,7521,"WARD",2,7698,1981-02-22,1250,500,30,1981-02-22,1989-12-31
4,7566,"JONES",4,7839,1981-04-02,2975,,20,1981-04-02,1989-12-31
5,7654,"MARTIN",2,7698,1981-09-28,1250,1400,30,1981-09-28,1989-12-31
6,7698,"BLAKE",4,7839,1981-05-01,2850,,30,1981-05-01,1989-12-31

Listing 4: the function works as designed

If you have already learned a little about SQL macros, you would know, that rather a different query is executed at runtime. Our function GET_CSV will run only at parse time and return a piece of SQL the original query will select from. How to see what will actually be executed? In Oracle 19c, as well as in Oracle 20c, I have yet only found one way to do this: using DBMS_UTILITY.expand_sql_text. I have formatted the output a bit:

SQL> DECLARE
  2    l_clob CLOB;
  3  BEGIN
  4    DBMS_UTILITY.expand_sql_text (
  5      input_sql_text  => q'[select * from get_csv (empv)]',
  6      output_sql_text => l_clob  );
  7    DBMS_OUTPUT.put_line(l_clob);
  8  END;
  9  /

PL/SQL procedure successfully completed.

SELECT "A1"."CSV_ROW" "CSV_ROW" 
FROM  (
  SELECT "A4"."CSV_ROW" "CSV_ROW" 
  FROM  (
     SELECT "A5"."CSV_ROW" "CSV_ROW" 
     FROM  ( 
       (SELECT '"EMPVID","EMPNO","ENAME","JOBNO","MGR","HIREDATE","SAL","COMM","DEPTNO","VALID_FROM","VALID_TO"' "CSV_ROW",1 "ROW_ORDER" 
        FROM "SYS"."DUAL" "A7") 
        UNION ALL  
       (SELECT TO_CHAR("A6"."EMPVID")||','||
               TO_CHAR("A6"."EMPNO")||','||
               '"'||REPLACE("A6"."ENAME",'"','\"')||'"'||','||
               TO_CHAR("A6"."JOBNO")||','||
               TO_CHAR("A6"."MGR")||','||
               TO_CHAR("A6"."HIREDATE",'YYYY-MM-DD')||','||
               TO_CHAR("A6"."SAL")||','||
               TO_CHAR("A6"."COMM")||','||
               TO_CHAR("A6"."DEPTNO")||','||
               TO_CHAR("A6"."VALID_FROM",'YYYY-MM-DD')||','||
               TO_CHAR("A6"."VALID_TO",'YYYY-MM-DD') "CSV_ROW"
               , 2 "ROW_ORDER" 
        FROM  (
            SELECT "A3"."EMPVID" "EMPVID","A3"."EMPNO" "EMPNO"
                    ,"A3"."ENAME" "ENAME","A3"."JOBNO" "JOBNO"
                    ,"A3"."MGR" "MGR","A3"."HIREDATE" "HIREDATE"
                    ,"A3"."SAL" "SAL","A3"."COMM" "COMM"
                    ,"A3"."DEPTNO" "DEPTNO","A3"."VALID_FROM" "VALID_FROM"
                    ,"A3"."VALID_TO" "VALID_TO" 
            FROM "ONFTEST"."EMPV" "A3"
               ) "A6"
        )
     ) "A5" 
     ORDER BY "A5"."ROW_ORDER"
   ) "A4"
) "A1"

Listing 5: what happens in the background

As you can see, Oracle does a lot of work in background rewriting the original query. Just by comparing the original and the actually executed query, the key advantage of SQL macros becomes obvious. We can implement complex business logic in a consistent way and provide a simple interface through a SQL macro function. This business logic is not encapsulated in a PL/SQL function at run time (causing the context switch) but fully exposed to the database and query optimizer. The SQL macro function is not tied to particular table or view but can be generically used with different data sets, even subqueries.

Speaking of subqueries, you cannot pass them directly as the following listing shows, but you can do it with named queries defined in WITH clause.

SQL> select * from get_csv ((select empno, ename from empv));

SQL Error: ORA-00913: too many values

SQL> 
SQL> 
SQL> with subq as (select empno, ename from empv)
  2  select * from get_csv(subq);

CSV_ROW                       
------------------------------
"EMPNO","ENAME"
7499,"ALLEN"
7521,"WARD"
7566,"JONES"
7654,"MARTIN"
7698,"BLAKE"
...

Listing 6: using with a subquery

Looks good, but I also came across something that looks like a bug to me. When using a subquery as an input for SQL macro and generating some fields directly in this subquery, it will throw ORA-62558, even for some supported data types. In the following listing we can see no problem with the column HIRE_DATE coming from the table, but generating another date column using sysdate is not possible. Converting the generated date to string solves the problem though.

SQL> -- this works
SQL> with subq as (select empno, ename, hiredate 
  2                from empv 
  3                fetch first 5 rows only)
  4  select * from get_csv(subq);

CSV_ROW                                           
--------------------------------------------------
"EMPNO","ENAME","HIREDATE"
7654,"MARTIN",1981-09-28
7521,"WARD",1981-02-22
7566,"JONES",1981-04-02
7369,"SMITH",1980-12-17
7499,"ALLEN",1981-02-20

6 rows selected. 

SQL> 
SQL> -- doesn't work
SQL> with subq as (select empno, ename, hiredate, sysdate as extraction_dt 
  2                from empv
  3                fetch first 5 rows only)
  4  select * from get_csv(subq);

Error at Command Line : 235 Column : 15
Error report -
SQL Error: ORA-62558: Unsupported data types (DATE) for polymorphic table function
62558. 00000 -  "Unsupported data types (%s) for polymorphic table function"
*Cause:    Non-scalar data types were used in polymorphic table function.
           Only scalar data types are supported.
*Action:   Use only scalar data types in polymorphic table function.
SQL> 
SQL> -- works after casting to char datatype
SQL> with subq as (select empno, ename, hiredate
  2                     , to_char(sysdate,'yyyy-mm-dd') as extraction_dt 
  3                from empv
  4                fetch first 5 rows only)
  5  select * from get_csv(subq);

CSV_ROW                                           
--------------------------------------------------
"EMPNO","ENAME","HIREDATE","EXTRACTION_DT"
7654,"MARTIN",1981-09-28,"2020-10-03"
7521,"WARD",1981-02-22,"2020-10-03"
7566,"JONES",1981-04-02,"2020-10-03"
7369,"SMITH",1980-12-17,"2020-10-03"
7499,"ALLEN",1981-02-20,"2020-10-03"

6 rows selected. 

Listing 7: bug with subquery-generated columns

I could experience the same behavior with other data types like TIMESTAMP or INTERVAL and not only with SQL macros but also with polymorphic table functions as well. I think I should open an SR for this. But I don’t really think it is a show stopper for using SQL macros.

UPDATE: I’ve investigated the reasons for this behavior in the follow-up post Using Subqueries and Views with PTF or SQL macros. After all, I’m not sure, whether it is a bug at all. You can better workaround the problem by casting SYSDATE as DATE again.

As you can probably guess, I have become a big fan of SQL macros over the last few months. Okay, they are to be taken with a certain amount of caution, but they bring clear advantages. All in all, I am very excited to be able to work with SQL macros in 19c now. After all, 19c will be the release that I will have to work with for a few more years at most of my customers.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

*