One of the new query optimizations in 12c are the null-accepting semi-joins. They make subquery unnesting and a semi-join possible in a situation which led to FILTER operation in older versions.
One can see a semi-join with “IN” oder “EXISTS” subqueries, if the optimizer choosed to unnest the subquery.
SQL> CREATE TABLE t1 (a NUMBER, c number); Table created. SQL> SQL> CREATE TABLE t2 (b NUMBER, c number); Table created. SQL> SQL> INSERT INTO t1 VALUES (1, 1); 1 row created. SQL> INSERT INTO t1 VALUES (NULL, 2); 1 row created. SQL> SQL> INSERT INTO t2 VALUES (1, 1); 1 row created. SQL> INSERT INTO t2 VALUES (NULL, 2); 1 row created. SQL> SQL> SELECT * FROM t1 2 WHERE a IN (SELECT B FROM t2); A C ---------- ---------- 1 1 SQL> SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR); ... ---------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 6 (100)| |* 1 | HASH JOIN SEMI | | 1 | 39 | 6 (0)| | 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)| | 3 | TABLE ACCESS FULL| T2 | 2 | 26 | 3 (0)| ---------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("A"="B") SQL> SQL> SELECT * FROM t1 2 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b); A C ---------- ---------- 1 1 SQL> SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR); ... ---------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 6 (100)| |* 1 | HASH JOIN SEMI | | 1 | 39 | 6 (0)| | 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)| | 3 | TABLE ACCESS FULL| T2 | 2 | 26 | 3 (0)| ---------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."A"="T2"."B")
Nothing really surprising here. Both subqueries lead to identical execution plans with HASH SEMI JOIN. As expected, the record with NULL-value in T1 was not selected. If one explicitly wants to select the records from T1 where column value of A is NULL, it is possible to do so defining extra WHERE clause with OR:
SQL> ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.2'; Session altered. SQL> SQL> SELECT * FROM t1 2 WHERE (EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b) 3 OR t1.a IS NULL); A C ---------- ---------- 1 1 2 SQL> SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR); ... ----------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3 (100)| |* 1 | FILTER | | | | | | 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)| |* 3 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)| ----------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("T1"."A" IS NULL OR IS NOT NULL)) 3 - filter("T2"."B"=:B1)
But before 12c we could see that subquery unnesting was not possible anymore. Hence, no hash semi-join but rather a FILTER-Operation, where our subquery will be evaluated once per row. Needless to say, this is not fast enough. And this is what was changed in 12c!
SQL> ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='12.1.0.1'; Session altered. SQL> SQL> SELECT * FROM t1 2 WHERE (EXISTS (SELECT 1 FROM t2 WHERE t1.a=t2.b) 3 OR t1.a IS NULL); A C ---------- ---------- 1 1 2 SQL> SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR); ... ---------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 6 (100)| |* 1 | HASH JOIN SEMI NA | | 1 | 39 | 6 (0)| | 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)| | 3 | TABLE ACCESS FULL| T2 | 2 | 26 | 3 (0)| ---------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."A"="T2"."B")
You can see the corresponding step of the execution plan on line 22: Null Accepting Semi Join. First, a check would be done, whether null values are there in the join column of the left table. The corresponding rows will be returned if found. Then the semi-join will be performed with the rest of the rows. This approach is not limited to Hash-Join: null-accepting Nested Loops or Merge Semi-Joins are also possible.
SQL> SELECT * FROM t1 2 WHERE (EXISTS (SELECT /*+ NL_SJ */ 1 FROM t2 WHERE t1.a=t2.b) 3 OR t1.a IS NULL); A C ---------- ---------- 1 1 2 SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR); ... ------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 7 (100)| | 1 | NESTED LOOPS SEMI NA| | 1 | 39 | 7 (0)| | 2 | TABLE ACCESS FULL | T1 | 2 | 52 | 3 (0)| |* 3 | TABLE ACCESS FULL | T2 | 1 | 13 | 2 (0)| ------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T1"."A"="T2"."B")
But what I don’t understand yet: why is it only for subqueries with EXISTS? We’ve seen that IN subqueries are being transformed to EXISTS correlated subqueries resulting in identical execution plans with semi-joins. But not if we combine the subquery with IS NULL clause:
SQL> SELECT * FROM t1 2 WHERE (t1.a IN (SELECT t2.b FROM t2) 3 OR t1.a IS NULL); A C ---------- ---------- 1 1 2 SQL> SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR); ... ---------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3 (100)| |* 1 | FILTER | | | | | | 2 | TABLE ACCESS FULL| T1 | 2 | 52 | 3 (0)| |* 3 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)| ---------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("T1"."A" IS NULL OR IS NOT NULL)) 3 - filter("T2"."B"=:B1)