Conditional Logic in SQL

A few days ago Sven Weller has published an excellent post about writing conditional logic in SQL and it reminded me of an example I used in one of my presentations. It is also about conditional logic in SQL but maybe some less obvious use cases in context of function calls and sorting.

Assume we have a table ORDER_ITEMS.

SQL> SELECT * FROM order_items;

 ORDER_ID LINEITEM_ID STATUS   SHIP_DT  
--------- ----------- -------- ---------
        1           1 shipped  01-FEB-16
        1           2 shipped  01-JAN-16
        1           3 open              
        1           4 returned 01-DEC-15

Listing 1

The slightly artificial requirement is to report items „as is“, without grouping, along with the number of shipped and returned items and the first (oldest) shipping date of an item in „shipped“ state. Per order, of course. So the result should be as follows:

ORDER_ID LINEITEM_ID STATUS   SHIP_DT     SHIP_CNT  RET_CNT FIRST_SHI
-------- ----------- -------- --------- ---------- -------- ---------
       1           2 shipped  01-JAN-16          2        1 01-JAN-16
       1           1 shipped  01-FEB-16          2        1 01-JAN-16
       1           4 returned 01-DEC-15          2        1 01-JAN-16
       1           3 open                        2        1 01-JAN-16

Listing 2

Well, since we don’t allowed to group but need to count, the first idea is to use analytic functions. But we need a kind of conditional count or two of them respectively: one for shipped and one for returned items. That is, we have to count rows only if the state has the particular value and skip all other rows. How we can do this?

Since the table we query can be huge, I assume we should avoid nesting and multiple scans of it. One of the possible solutions – and there are always a few – would be to use the described „conditional logic“ when calling analytic functions as shown in Listing 3.

What we are counting is the expression returned by the CASE-operator. We should return any NOT NULL column if the condition is met or NULL otherwise. Why NULL? Because of a special handling of NULL values by the COUNT aggregate function. COUNT(expr) returns the number of rows where expr is not null. See also Nulls and aggregate functions for more information. You can achieve the same effect using SUM and summarizing 1 and 0 values, if you find it to be more intuitive.

SQL> SELECT oi.order_id, oi.lineitem_id, oi.status, oi.ship_dt 
  2  ,      COUNT(CASE WHEN status='shipped' 
  3                   THEN lineitem_id 
  4                   ELSE NULL 
  5               END)  
  6              OVER (PARTITION BY order_id ) ship_cnt
  7  ,      SUM(CASE WHEN status='shipped' 
  8                  THEN 1 
  9                  ELSE 0 
 10             END)  
 11              OVER (PARTITION BY order_id ) ship_cnt2            
 12  FROM order_items oi;

  ORDER_ID LINEITEM_ID STATUS   SHIP_DT     SHIP_CNT  SHIP_CNT2
---------- ----------- -------- --------- ---------- ----------
         1           1 shipped  01-FEB-16          2          2
         1           4 returned 01-DEC-15          2          2
         1           3 open                        2          2
         1           2 shipped  01-JAN-16          2          2

Listing 3

Well, this was the first part of the question. How to find the first shipping date of the items in shipped state? Let us first see how to do it wrong (Listing 4). 😉

Simply using FIRST_VALUE analytical function doesn’t return the correct date: we have found the oldest date, but the lineitem it belongs to is not in shipped state. Somehow we have to take the state into account. How about using the same approach as with counting rows (lines 6-11)? This returns null – why? What happens? We are ordering by SHIP_DT, the oldest date is ’01-DEC-15′, so we found the row in question, that is the line_item 4. Now what should the function FIRST_VALUE return from this row? We have said, the CASE expression should be returned and this CASE expression just evaluates to NULL, since the state of the row is „returned“, not „shipped“.


SQL> SELECT oi.order_id, oi.lineitem_id, oi.status, oi.ship_dt 
  2  ,      FIRST_VALUE(ship_dt) 
  3              OVER (PARTITION BY order_id 
  4                    ORDER BY ship_dt) first_shipment_wrong
  5  
  6  ,      FIRST_VALUE(CASE WHEN status='shipped' 
  7                          THEN ship_dt 
  8                          ELSE NULL 
  9                     END) 
 10              OVER (PARTITION BY order_id 
 11                    ORDER BY ship_dt) also_wrong
 12  FROM order_items oi
 13  ORDER BY order_id, lineitem_id;

  ORDER_ID LINEITEM_ID STATUS   SHIP_DT   FIRST_SHI ALSO_WRON
---------- ----------- -------- --------- --------- ---------
         1           1 shipped  01-FEB-16 01-DEC-15          
         1           2 shipped  01-JAN-16 01-DEC-15          
         1           3 open               01-DEC-15          
         1           4 returned 01-DEC-15 01-DEC-15          
        

Listing 4

We can correct this at least in two ways. First, there is an option to find non-null values: IGNORE NULLS clause, which has been introduced in 11.2. But be aware, with FIRST_VALUE we will also get wrong results because of Oracle’s default windowing clause "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" (Listing 5 lines 2-6). We simply don’t allow the function to look ahead and find non-null values. Provide the correct windowing clause to fix this (lines 8-15). Another way is to use conditional expression while ordering. Doing so we only consider shipping dates with correct state (lines 17-23).


SQL> SELECT oi.order_id, oi.lineitem_id, oi.status, oi.ship_dt 
  2  ,       FIRST_VALUE(CASE WHEN status='shipped' 
  3                           THEN ship_dt 
  4                           ELSE NULL 
  5                      END ) IGNORE NULLS
  6          OVER (PARTITION BY order_id ORDER BY ship_dt) still_wrong
  7  
  8  ,       FIRST_VALUE(CASE WHEN status='shipped' 
  9                           THEN ship_dt 
 10                           ELSE NULL 
 11                      END ) IGNORE NULLS
 12              OVER (PARTITION BY order_id 
 13                    ORDER BY ship_dt 
 14                    ROWS BETWEEN UNBOUNDED PRECEDING 
 15                             AND UNBOUNDED FOLLOWING) now_correct
 16  
 17  ,       FIRST_VALUE(ship_dt) 
 18              OVER (PARTITION BY order_id 
 19                ORDER BY CASE WHEN status='shipped' 
 20                              THEN ship_dt 
 21                              ELSE NULL 
 22                         END 
 23                NULLS LAST) also_correct
 24  FROM order_items oi
 25  ORDER BY order_id, lineitem_id;

  ORDER_ID LINEITEM_ID STATUS   SHIP_DT   STILL_WRO NOW_CORRE ALSO_CORR
---------- ----------- -------- --------- --------- --------- ---------
         1           1 shipped  01-FEB-16 01-JAN-16 01-JAN-16 01-JAN-16
         1           2 shipped  01-JAN-16 01-JAN-16 01-JAN-16 01-JAN-16
         1           3 open               01-JAN-16 01-JAN-16 01-JAN-16
         1           4 returned 01-DEC-15           01-JAN-16 01-JAN-16
 

Listing 5

Putting it all together, we get the required result set (Listing 6). Putting the logic next to the function calls instead of the WHERE clause allows us to combine calculations in one query which otherwise would be spread across multiple queries with their own WHERE clauses.


SQL> SELECT oi.order_id, oi.lineitem_id, oi.status, oi.ship_dt 
  2  ,      COUNT(CASE WHEN status='shipped' 
  3                    THEN lineitem_id 
  4                    ELSE NULL 
  5               END)  
  6         OVER (PARTITION BY order_id ) ship_cnt
  7  
  8  ,      COUNT(CASE WHEN status='returned' 
  9                    THEN lineitem_id 
 10                    ELSE NULL 
 11               END)  
 12         OVER (PARTITION BY order_id ) ret_cnt
 13  
 14  ,      FIRST_VALUE(ship_dt) 
 15              OVER (PARTITION BY order_id 
 16                ORDER BY CASE WHEN status='shipped' 
 17                              THEN ship_dt 
 18                              ELSE NULL 
 19                         END 
 20                NULLS LAST) first_shipment
 21  FROM o12test.order_items oi
 22  ;


ORDER_ID LINEITEM_ID STATUS   SHIP_DT     SHIP_CNT    RET_CNT FIRST_SHI
-------- ----------- -------- --------- ---------- ---------- ---------
       1           2 shipped  01-JAN-16          2          1 01-JAN-16
       1           1 shipped  01-FEB-16          2          1 01-JAN-16
       1           4 returned 01-DEC-15          2          1 01-JAN-16
       1           3 open                        2          1 01-JAN-16

Listing 6

Well, I admit, SQL statements of this kind are not so easy to understand and are difficult to maintain. I don’t want to judge whether they should be used. Now and then the performance gain can clearly outweigh the maintenance headaches, in other situation you can have no performance gain at all. As always, it depends. And I think one should always critically question the requirements, whether the complex things are really need to be done. I just wanted to show the approach as it clearly has to do with conditional logic in SQL. If you nevertheless end up with such SQL, remember it should be extensively commented and documented from day one. Believe me, next day you will not grasp any more what exactly your SQL should do.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert