Null Accepting Semi-Joins in 12c

One of the new query optimizations in 12c are the null-accepting semi-joins. They make subquery unnesting and a semi-join possible in a situation which led to FILTER operation in older versions.

One can see a semi-join with “IN” oder “EXISTS” subqueries, if the optimizer choosed to unnest the subquery.

SQL> CREATE TABLE t1 (a NUMBER, c number);

Table created.

SQL>
SQL> CREATE TABLE t2 (b NUMBER, c number);

Table created.

SQL>
SQL> INSERT INTO t1 VALUES (1, 1);

1 row created.

SQL> INSERT INTO t1 VALUES (NULL, 2);

1 row created.

SQL>
SQL> INSERT INTO t2 VALUES (1, 1);

1 row created.

SQL> INSERT INTO t2 VALUES (NULL, 2);

1 row created.

SQL>
SQL> SELECT * FROM t1
  2  WHERE a IN (SELECT B FROM t2);

         A          C
---------- ----------
         1          1

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

...
----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     6 (100)|
|*  1 |  HASH JOIN SEMI    |      |     1 |    39 |     6   (0)|
|   2 |   TABLE ACCESS FULL| T1   |     2 |    52 |     3   (0)|
|   3 |   TABLE ACCESS FULL| T2   |     2 |    26 |     3   (0)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"="B")

SQL>
SQL> SELECT * FROM t1
  2  WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b);

         A          C
---------- ----------
         1          1

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

...
----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     6 (100)|
|*  1 |  HASH JOIN SEMI    |      |     1 |    39 |     6   (0)|
|   2 |   TABLE ACCESS FULL| T1   |     2 |    52 |     3   (0)|
|   3 |   TABLE ACCESS FULL| T2   |     2 |    26 |     3   (0)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."A"="T2"."B")

Nothing really surprising here. Both subqueries lead to identical execution plans with HASH SEMI JOIN. As expected, the record with NULL-value in T1 was not selected. If one explicitly wants to select the records from T1 where column value of A is NULL, it is possible to do so defining extra WHERE clause with OR:

SQL> ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.2';

Session altered.

SQL>
SQL> SELECT * FROM t1
  2  WHERE (EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b)
  3         OR t1.a IS NULL);

         A          C
---------- ----------
         1          1
                    2

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...

-----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     3 (100)|
|*  1 |  FILTER            |      |       |       |            |
|   2 |   TABLE ACCESS FULL| T1   |     2 |    52 |     3   (0)|
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("T1"."A" IS NULL OR  IS NOT NULL))
   3 - filter("T2"."B"=:B1)

But before 12c we could see that subquery unnesting was not possible anymore. Hence, no hash semi-join but rather a FILTER-Operation, where our subquery will be evaluated once per row. Needless to say, this is not fast enough. And this is what was changed in 12c!

SQL> ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='12.1.0.1';

Session altered.

SQL>
SQL> SELECT * FROM t1
  2  WHERE (EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b)
  3         OR t1.a IS NULL);

         A          C
---------- ----------
         1          1
                    2

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     6 (100)|
|*  1 |  HASH JOIN SEMI NA |      |     1 |    39 |     6   (0)|
|   2 |   TABLE ACCESS FULL| T1   |     2 |    52 |     3   (0)|
|   3 |   TABLE ACCESS FULL| T2   |     2 |    26 |     3   (0)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."A"="T2"."B")

You can see the corresponding step of the execution plan on line 22: Null Accepting Semi Join. First, a check would be done, whether null values are there in the join column of the left table. The corresponding rows will be returned if found. Then the semi-join will be performed with the rest of the rows. This approach is not limited to Hash-Join: null-accepting Nested Loops or Merge Semi-Joins are also possible.

SQL> SELECT  * FROM t1
  2  WHERE (EXISTS (SELECT /*+ NL_SJ */ 1 FROM t2 WHERE t1.a=t2.b)
  3         OR t1.a IS NULL);

         A          C
---------- ----------
         1          1
                    2

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     7 (100)|
|   1 |  NESTED LOOPS SEMI NA|      |     1 |    39 |     7   (0)|
|   2 |   TABLE ACCESS FULL  | T1   |     2 |    52 |     3   (0)|
|*  3 |   TABLE ACCESS FULL  | T2   |     1 |    13 |     2   (0)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T1"."A"="T2"."B")

But what I don’t understand yet: why is it only for subqueries with EXISTS? We’ve seen that IN subqueries are being transformed to EXISTS correlated subqueries resulting in identical execution plans with semi-joins. But not if we combine the subquery with IS NULL clause:

SQL> SELECT  * FROM t1
  2  WHERE (t1.a IN (SELECT t2.b FROM t2)
  3         OR t1.a IS NULL);

         A          C
---------- ----------
         1          1
                    2

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     3 (100)|
|*  1 |  FILTER            |      |       |       |            |
|   2 |   TABLE ACCESS FULL| T1   |     2 |    52 |     3   (0)|
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("T1"."A" IS NULL OR  IS NOT NULL))
   3 - filter("T2"."B"=:B1)

Source: White Paper “Optimizer with Oracle Database 12c”

Leave a Reply

Your email address will not be published. Required fields are marked *