In the first part we looked at general preconditions for online statistics gathering to work and some restrictions. In this part we’ll take a look at what happens with direct path loads into partitioned tables.
Partitioning
There are also some points to consider while working with partitioned tables. First, inserting into an empty partitioned table will gather only global table statistics. Listing 1 shows that. Also as expected, if we then issue subsequent INSERT statement, which bulk inserts into another (yet empty) partition, nothing will happen.
SQL> DROP table t_part purge; Table T_PART dropped. SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20)) partition by range (pkey) (partition P0 values less than (0), partition P1 values less than (1), partition P2 values less than (2), partition P3 values less than (3)); Table T_PART created. SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','FALSE') PL/SQL procedure successfully completed. SQL> INSERT /*+ append */ INTO t_part SELECT -1 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTI GLOBA LAST_ANALYZED NUM_ROWS ---------- ----- ----- ------------------- ---------- T_PART NO 21-05-2018 05:10:46 10000 T_PART P0 NO T_PART P1 NO T_PART P3 NO T_PART P2 NO SQL> -- Now insert into another (empty) partition SQL> INSERT /*+ append */ INTO t_part SELECT 0 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> -- No statistics have been gathered this time SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTI GLOBA LAST_ANALYZED NUM_ROWS ---------- ----- ----- ------------------- ---------- T_PART NO 21-05-2018 05:10:46 10000 T_PART P0 NO T_PART P1 NO T_PART P3 NO T_PART P2 NO
Listing 1 – inserting into an empty partitioned table gathers only global stats
But you can use an extended syntax and explicitly provide the partition to insert into (Listing 2). What happens then? If this partition was empty, then partition level statistics will be gathered online. No global statistics that time:
SQL> DROP table t_part purge; Table T_PART dropped. SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20)) partition by range (pkey) (partition P0 values less than (0), partition P1 values less than (1), partition P2 values less than (2), partition P3 values less than (3)); Table T_PART created. SQL> INSERT /*+ append */ INTO t_part PARTITION (P2) SELECT 1 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTI GLOBA LAST_ANALYZED NUM_ROWS ---------- ----- ----- ------------------- ---------- T_PART NO T_PART P1 NO T_PART P2 NO 21-05-2018 05:30:24 10000 T_PART P0 NO T_PART P3 NO
Listing 2 – partition level stats if explicitly specifying partition
Even better, if we change the preference for gathering global table statistics to incremental, the synopsis for the changed partition will be created altogether. The prerequisite is using the extended syntax explicitly specifying the partition.
SQL> DROP table t_part purge; Table T_PART dropped. SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20)) partition by range (pkey) (partition P0 values less than (0), partition P1 values less than (1), partition P2 values less than (2), partition P3 values less than (3)); Table T_PART created. SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE') PL/SQL procedure successfully completed. SQL> INSERT /*+ append */ INTO t_part PARTITION (P3) SELECT 2 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTI GLOBA LAST_ANALYZED NUM_ROWS ---------- ----- ----- ------------------- ---------- T_PART NO T_PART P0 NO T_PART P1 NO T_PART P2 NO T_PART P3 NO 21-05-2018 12:06:57 10000 SQL> -- Check Synopsis SQL> SELECT o.name Table_Name, p.subname Partition_name, c.name "Column", h.analyzetime "Synopsis Creation Time" FROM sys.WRI$_OPTSTAT_SYNOPSIS_HEAD$ h, sys.OBJ$ o, sys.COL$ c, ( ( SELECT TABPART$.bo# BO#, TABPART$.obj# OBJ# FROM sys.TABPART$ tabpart$ ) UNION ALL ( SELECT TABCOMPART$.bo# BO#, TABCOMPART$.obj# OBJ# FROM sys.TABCOMPART$ tabcompart$ ) ) tp, sys.OBJ$ p WHERE o.name = 'T_PART' AND tp.obj# = p.obj# AND h.bo# = tp.bo# AND h.group# = tp.obj# * 2 AND h.bo# = c.obj#(+) AND h.intcol# = c.intcol#(+) AND h.bo# = o.obj# ORDER BY 4,1,2,3; TABLE_NAME PARTI Colum Synopsis Creation Time ---------- ----- ----- ---------------------- T_PART P3 ID 21-05-2018 12:06:57 T_PART P3 PKEY 21-05-2018 12:06:57 T_PART P3 S 21-05-2018 12:06:57
Listing 3 – partition level plus synopsis if explicitly specifying partition
But what happens if we don’t use the extended syntax leaving the incremental preference to be true? Nothing will happen, no online statistics at all!
SQL> DROP table t_part purge; Table T_PART dropped. SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20)) partition by range (pkey) (partition P0 values less than (0), partition P1 values less than (1), partition P2 values less than (2), partition P3 values less than (3)); Table T_PART created. SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE') PL/SQL procedure successfully completed. SQL> -- without partition-syntax SQL> INSERT /*+ append */ INTO t_part SELECT 0 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTI GLOBA LAST_ANALYZED NUM_ROWS ---------- ----- ----- ------------------- ---------- T_PART NO T_PART P1 NO T_PART P2 NO T_PART P3 NO T_PART P0 NO SQL> -- Check Synopsis SQL> SELECT o.name Table_Name, p.subname Partition_name, c.name "Column", h.analyzetime "Synopsis Creation Time" FROM sys.WRI$_OPTSTAT_SYNOPSIS_HEAD$ h, sys.OBJ$ o, sys.COL$ c, ( ( SELECT TABPART$.bo# BO#, TABPART$.obj# OBJ# FROM sys.TABPART$ tabpart$ ) UNION ALL ( SELECT TABCOMPART$.bo# BO#, TABCOMPART$.obj# OBJ# FROM sys.TABCOMPART$ tabcompart$ ) ) tp, sys.OBJ$ p WHERE o.name = 'T_PART' AND tp.obj# = p.obj# AND h.bo# = tp.bo# AND h.group# = tp.obj# * 2 AND h.bo# = c.obj#(+) AND h.intcol# = c.intcol#(+) AND h.bo# = o.obj# ORDER BY 4,1,2,3; no rows selected
Listing 4 – no stats with incremental = true and not explicitly specifying partition
We can show the different cases in the table:
Incremental Preference TRUE | Incremental Preference FALSE | |
---|---|---|
Partition name specified | partition-level stats + synopsis | partition-level stats |
No partition name specified | No stats at all! | global stats |
Well, I have one more question, which I think can simply be a bug. If we don’t reference partitions explicitly and thus get only table-level statistics, why is the column GLOBAL_STATS in USER_TAB_STATISTICS shown as NO? The meaning of NO is that table-level statistics have been aggregated from partition-level statistics and YES means they have been explicitly gathered or set. Well, in our case they are definitely not aggregated from partition-level but are gathered on the fly
Interval Partitioning
There are two reasons to explicitly mention interval partitioning. The first reason is that in 12.1 no online statistics gathering takes place with bulk INSERT in the case of interval partitioned tables. It is a known bug (Doc ID 20562261.8) and it is fixed in 12.2. The second reason is the question that may arise with interval partitioning: how can we get partition-level statistics? Without specifying a partition we’ll get only global statistics. On the other hand, we don’t create empty partitions beforehand, instead they are automatically created by an INSERT. But how can I specify a partition which doesn’t exist yet? Well, there is a way to do this: the extended partition syntax allows not only to specify the partition name, but also the value. The database automatically determines the right partition. This is shown in Listing 5. Interesting fact, if the table preference is to use incremental statistics, then a synopsis for affected partition is also created but we cannot see its creation time.
SQL> DROP table t_part purge; Table T_PART dropped. SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20)) partition by range (pkey) interval (1) (partition P0 values less than (0) ); Table T_PART created. SQL> -- without partition-syntax SQL> INSERT /*+ append */ INTO t_part SELECT 0 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTITION_ GLOBA LAST_ANALYZED NUM_ROWS ---------- ---------- ----- ------------------- ---------- T_PART NO 23-05-2018 03:45:10 10000 T_PART SYS_P961 NO T_PART P0 NO SQL> -- With partition for SQL> INSERT /*+ append */ INTO t_part partition for (1) tp SELECT 1 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTITION_ GLOBA LAST_ANALYZED NUM_ROWS ---------- ---------- ----- ------------------- ---------- T_PART NO 23-05-2018 03:45:10 10000 T_PART SYS_P961 NO T_PART P0 NO T_PART SYS_P962 NO 23-05-2018 03:45:10 10000 SQL> -- With partition for + incremental SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE') PL/SQL procedure successfully completed. SQL> INSERT /*+ append */ INTO t_part partition for (2) tp SELECT 2 pkey, level AS id , lpad (level, 20, '0') s FROM dual CONNECT BY level <= 10000; 10,000 rows inserted. SQL> COMMIT; Commit complete. SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows FROM user_tab_statistics s WHERE table_name IN ('T_PART'); TABLE_NAME PARTITION_ GLOBA LAST_ANALYZED NUM_ROWS ---------- ---------- ----- ------------------- ---------- T_PART NO 23-05-2018 03:45:10 10000 T_PART SYS_P961 NO T_PART P0 NO T_PART SYS_P962 NO 23-05-2018 03:45:10 10000 T_PART SYS_P963 NO 23-05-2018 03:45:11 10000 SQL> -- Check Synopsis SQL> SELECT o.name Table_Name, p.subname Partition_name, c.name "Column", h.analyzetime "Synopsis Creation Time" FROM sys.WRI$_OPTSTAT_SYNOPSIS_HEAD$ h, sys.OBJ$ o, sys.COL$ c, ( ( SELECT TABPART$.bo# BO#, TABPART$.obj# OBJ# FROM sys.TABPART$ tabpart$ ) UNION ALL ( SELECT TABCOMPART$.bo# BO#, TABCOMPART$.obj# OBJ# FROM sys.TABCOMPART$ tabcompart$ ) ) tp, sys.OBJ$ p WHERE o.name = 'T_PART' AND tp.obj# = p.obj# AND h.bo# = tp.bo# AND h.group# = tp.obj# * 2 AND h.bo# = c.obj#(+) AND h.intcol# = c.intcol#(+) AND h.bo# = o.obj# ORDER BY 4,1,2,3; TABLE_NAME PARTITION_ Colum Synopsis Creation Time ---------- ---------- ----- ---------------------- T_PART SYS_P963 ID T_PART SYS_P963 PKEY T_PART SYS_P963 S
Listing 5 – explicitly specifying partition in case of interval partitioning
Conclusion
Having representative statistics (which doesn’t automatically mean they always have to be fresh though) is essential for ETL Performance. Having looked at some use cases in part 1 and part 2, we have seen that there are some specifics to consider. If you plan to rely on online statistics gathering for some of your ETL steps, better double check that the feature is really working each time you run your ETL process.