One of the new query optimizations in 12c are the null-accepting semi-joins. They make subquery unnesting and a semi-join possible in a situation which led to FILTER operation in older versions.
One can see a semi-join with “IN” oder “EXISTS” subqueries, if the optimizer choosed to unnest the subquery.
SQL> CREATE TABLE t1 (a NUMBER, c number);
Table created.
SQL>
SQL> CREATE TABLE t2 (b NUMBER, c number);
Table created.
SQL>
SQL> INSERT INTO t1 VALUES (1, 1);
1 row created.
SQL> INSERT INTO t1 VALUES (NULL, 2);
1 row created.
SQL>
SQL> INSERT INTO t2 VALUES (1, 1);
1 row created.
SQL> INSERT INTO t2 VALUES (NULL, 2);
1 row created.
SQL>
SQL> SELECT * FROM t1
2 WHERE a IN (SELECT B FROM t2);
A C
---------- ----------
1 1
SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)|
|* 1 | HASH JOIN SEMI | | 1 | 39 | 6 (0)|
| 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)|
| 3 | TABLE ACCESS FULL| T2 | 2 | 26 | 3 (0)|
----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"="B")
SQL>
SQL> SELECT * FROM t1
2 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b);
A C
---------- ----------
1 1
SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)|
|* 1 | HASH JOIN SEMI | | 1 | 39 | 6 (0)|
| 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)|
| 3 | TABLE ACCESS FULL| T2 | 2 | 26 | 3 (0)|
----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."A"="T2"."B")
Nothing really surprising here. Both subqueries lead to identical execution plans with HASH SEMI JOIN. As expected, the record with NULL-value in T1 was not selected. If one explicitly wants to select the records from T1 where column value of A is NULL, it is possible to do so defining extra WHERE clause with OR:
SQL> ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.2';
Session altered.
SQL>
SQL> SELECT * FROM t1
2 WHERE (EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b)
3 OR t1.a IS NULL);
A C
---------- ----------
1 1
2
SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)|
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)|
|* 3 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("T1"."A" IS NULL OR IS NOT NULL))
3 - filter("T2"."B"=:B1)
But before 12c we could see that subquery unnesting was not possible anymore. Hence, no hash semi-join but rather a FILTER-Operation, where our subquery will be evaluated once per row. Needless to say, this is not fast enough. And this is what was changed in 12c!
SQL> ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='12.1.0.1';
Session altered.
SQL>
SQL> SELECT * FROM t1
2 WHERE (EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b)
3 OR t1.a IS NULL);
A C
---------- ----------
1 1
2
SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)|
|* 1 | HASH JOIN SEMI NA | | 1 | 39 | 6 (0)|
| 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)|
| 3 | TABLE ACCESS FULL| T2 | 2 | 26 | 3 (0)|
----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."A"="T2"."B")
You can see the corresponding step of the execution plan on line 22: Null Accepting Semi Join. First, a check would be done, whether null values are there in the join column of the left table. The corresponding rows will be returned if found. Then the semi-join will be performed with the rest of the rows. This approach is not limited to Hash-Join: null-accepting Nested Loops or Merge Semi-Joins are also possible.
SQL> SELECT * FROM t1
2 WHERE (EXISTS (SELECT /*+ NL_SJ */ 1 FROM t2 WHERE t1.a=t2.b)
3 OR t1.a IS NULL);
A C
---------- ----------
1 1
2
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 7 (100)|
| 1 | NESTED LOOPS SEMI NA| | 1 | 39 | 7 (0)|
| 2 | TABLE ACCESS FULL | T1 | 2 | 52 | 3 (0)|
|* 3 | TABLE ACCESS FULL | T2 | 1 | 13 | 2 (0)|
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("T1"."A"="T2"."B")
But what I don’t understand yet: why is it only for subqueries with EXISTS? We’ve seen that IN subqueries are being transformed to EXISTS correlated subqueries resulting in identical execution plans with semi-joins. But not if we combine the subquery with IS NULL clause:
SQL> SELECT * FROM t1
2 WHERE (t1.a IN (SELECT t2.b FROM t2)
3 OR t1.a IS NULL);
A C
---------- ----------
1 1
2
SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
...
----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)|
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)|
|* 3 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("T1"."A" IS NULL OR IS NOT NULL))
3 - filter("T2"."B"=:B1)
