Online Statistics Gathering for ETL – Part 2

In the first part we looked at general preconditions for online statistics gathering to work and some restrictions. In this part we’ll take a look at what happens with direct path loads into partitioned tables.

Partitioning

There are also some points to consider while working with partitioned tables. First, inserting into an empty partitioned table will gather only global table statistics. Listing 1 shows that. Also as expected, if we then issue subsequent INSERT statement, which bulk inserts into another (yet empty) partition, nothing will happen.

SQL> DROP table t_part purge;

Table T_PART dropped.

SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20))
partition by range (pkey)
(partition P0 values less than (0),
partition P1 values less than (1),
partition P2 values less than (2),
partition P3 values less than (3));

Table T_PART created.

SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','FALSE')

PL/SQL procedure successfully completed.

SQL> INSERT /*+ append */ INTO   t_part
SELECT -1 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTI GLOBA LAST_ANALYZED         NUM_ROWS
---------- ----- ----- ------------------- ----------
T_PART           NO    21-05-2018 05:10:46      10000
T_PART     P0    NO                                  
T_PART     P1    NO                                  
T_PART     P3    NO                                  
T_PART     P2    NO                                  

SQL> -- Now insert into another (empty) partition
SQL> INSERT /*+ append */ INTO   t_part
SELECT 0 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> -- No statistics have been gathered this time
SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTI GLOBA LAST_ANALYZED         NUM_ROWS
---------- ----- ----- ------------------- ----------
T_PART           NO    21-05-2018 05:10:46      10000
T_PART     P0    NO                                  
T_PART     P1    NO                                  
T_PART     P3    NO                                  
T_PART     P2    NO                   

Listing 1 – inserting into an empty partitioned table gathers only global stats

But you can use an extended syntax and explicitly provide the partition to insert into (Listing 2). What happens then? If this partition was empty, then partition level statistics will be gathered online. No global statistics that time:

SQL> DROP table t_part purge;

Table T_PART dropped.

SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20))
partition by range (pkey)
(partition P0 values less than (0),
partition P1 values less than (1),
partition P2 values less than (2),
partition P3 values less than (3));

Table T_PART created.

SQL> INSERT /*+ append */ INTO   t_part PARTITION (P2)
SELECT 1 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTI GLOBA LAST_ANALYZED         NUM_ROWS
---------- ----- ----- ------------------- ----------
T_PART           NO                                  
T_PART     P1    NO                                  
T_PART     P2    NO    21-05-2018 05:30:24      10000
T_PART     P0    NO                                  
T_PART     P3    NO                                  
 

Listing 2 – partition level stats if explicitly specifying partition

Even better, if we change the preference for gathering global table statistics to incremental, the synopsis for the changed partition will be created altogether. The prerequisite is using the extended syntax explicitly specifying the partition.

SQL> DROP table t_part purge;

Table T_PART dropped.

SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20))
partition by range (pkey)
(partition P0 values less than (0),
partition P1 values less than (1),
partition P2 values less than (2),
partition P3 values less than (3));

Table T_PART created.

SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE')

PL/SQL procedure successfully completed.

SQL> INSERT /*+ append */ INTO   t_part PARTITION (P3)
SELECT 2 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTI GLOBA LAST_ANALYZED         NUM_ROWS
---------- ----- ----- ------------------- ----------
T_PART           NO                                  
T_PART     P0    NO                                  
T_PART     P1    NO                                  
T_PART     P2    NO                                  
T_PART     P3    NO    21-05-2018 12:06:57      10000

SQL> -- Check Synopsis
SQL> SELECT o.name         Table_Name,
       p.subname      Partition_name,
       c.name         "Column",
       h.analyzetime  "Synopsis Creation Time"
FROM   sys.WRI$_OPTSTAT_SYNOPSIS_HEAD$ h,
       sys.OBJ$ o,
       sys.COL$ c,
       ( ( SELECT TABPART$.bo#  BO#,
                  TABPART$.obj# OBJ#
           FROM   sys.TABPART$ tabpart$ )
         UNION ALL
         ( SELECT TABCOMPART$.bo#  BO#,
                  TABCOMPART$.obj# OBJ#
           FROM   sys.TABCOMPART$ tabcompart$ ) ) tp,
       sys.OBJ$ p
WHERE  o.name = 'T_PART' AND
       tp.obj# = p.obj# AND
       h.bo# = tp.bo# AND
       h.group# = tp.obj# * 2 AND
       h.bo# = c.obj#(+) AND
       h.intcol# = c.intcol#(+) AND
       h.bo# = o.obj#
ORDER  BY 4,1,2,3;

TABLE_NAME PARTI Colum Synopsis Creation Time
---------- ----- ----- ----------------------
T_PART     P3    ID    21-05-2018 12:06:57   
T_PART     P3    PKEY  21-05-2018 12:06:57   
T_PART     P3    S     21-05-2018 12:06:57   
 

Listing 3 – partition level plus synopsis if explicitly specifying partition

But what happens if we don’t use the extended syntax leaving the incremental preference to be true? Nothing will happen, no online statistics at all!

SQL> DROP table t_part purge;

Table T_PART dropped.

SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20))
partition by range (pkey)
(partition P0 values less than (0),
partition P1 values less than (1),
partition P2 values less than (2),
partition P3 values less than (3));

Table T_PART created.

SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE')

PL/SQL procedure successfully completed.

SQL> -- without partition-syntax
SQL> INSERT /*+ append */ INTO   t_part
SELECT 0 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTI GLOBA LAST_ANALYZED         NUM_ROWS
---------- ----- ----- ------------------- ----------
T_PART           NO                                  
T_PART     P1    NO                                  
T_PART     P2    NO                                  
T_PART     P3    NO                                  
T_PART     P0    NO                                  

SQL> -- Check Synopsis
SQL> SELECT o.name         Table_Name,
       p.subname      Partition_name,
       c.name         "Column",
       h.analyzetime  "Synopsis Creation Time"
FROM   sys.WRI$_OPTSTAT_SYNOPSIS_HEAD$ h,
       sys.OBJ$ o,
       sys.COL$ c,
       ( ( SELECT TABPART$.bo#  BO#,
                  TABPART$.obj# OBJ#
           FROM   sys.TABPART$ tabpart$ )
         UNION ALL
         ( SELECT TABCOMPART$.bo#  BO#,
                  TABCOMPART$.obj# OBJ#
           FROM   sys.TABCOMPART$ tabcompart$ ) ) tp,
       sys.OBJ$ p
WHERE  o.name = 'T_PART' AND
       tp.obj# = p.obj# AND
       h.bo# = tp.bo# AND
       h.group# = tp.obj# * 2 AND
       h.bo# = c.obj#(+) AND
       h.intcol# = c.intcol#(+) AND
       h.bo# = o.obj#
ORDER  BY 4,1,2,3;

no rows selected
 

Listing 4 – no stats with incremental = true and not explicitly specifying partition

We can show the different cases in the table:

Incremental Preference TRUEIncremental Preference FALSE
Partition name specifiedpartition-level stats + synopsispartition-level stats
No partition name specifiedNo stats at all!global stats

Well, I have one more question, which I think can simply be a bug. If we don’t reference partitions explicitly and thus get only table-level statistics, why is the column GLOBAL_STATS in USER_TAB_STATISTICS shown as NO? The meaning of NO is that table-level statistics have been aggregated from partition-level statistics and YES means they have been explicitly gathered or set. Well, in our case they are definitely not aggregated from partition-level but are gathered on the fly

Interval Partitioning

There are two reasons to explicitly mention interval partitioning. The first reason is that in 12.1 no online statistics gathering takes place with bulk INSERT in the case of interval partitioned tables. It is a known bug (Doc ID 20562261.8) and it is fixed in 12.2. The second reason is the question that may arise with interval partitioning: how can we get partition-level statistics? Without specifying a partition we’ll get only global statistics. On the other hand, we don’t create empty partitions beforehand, instead they are automatically created by an INSERT. But how can I specify a partition which doesn’t exist yet? Well, there is a way to do this: the extended partition syntax allows not only to specify the partition name, but also the value. The database automatically determines the right partition. This is shown in Listing 5. Interesting fact, if the table preference is to use incremental statistics, then a synopsis for affected partition is also created but we cannot see its creation time.

SQL> DROP table t_part purge;

Table T_PART dropped.

SQL> CREATE TABLE t_part (pkey number, id number, s varchar2(20))
partition by range (pkey) interval (1)
(partition P0 values less than (0)
);

Table T_PART created.

SQL> -- without partition-syntax
SQL> INSERT /*+ append */ INTO   t_part 
SELECT 0 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTITION_ GLOBA LAST_ANALYZED         NUM_ROWS
---------- ---------- ----- ------------------- ----------
T_PART                NO    23-05-2018 03:45:10      10000
T_PART     SYS_P961   NO                                  
T_PART     P0         NO                                  

SQL> -- With partition for
SQL> INSERT /*+ append */ INTO   t_part partition for (1) tp
SELECT 1 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTITION_ GLOBA LAST_ANALYZED         NUM_ROWS
---------- ---------- ----- ------------------- ----------
T_PART                NO    23-05-2018 03:45:10      10000
T_PART     SYS_P961   NO                                  
T_PART     P0         NO                                  
T_PART     SYS_P962   NO    23-05-2018 03:45:10      10000

SQL> -- With partition for + incremental
SQL> exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE')

PL/SQL procedure successfully completed.

SQL> INSERT /*+ append */ INTO   t_part partition for (2) tp
SELECT 2 pkey, level AS id , lpad (level, 20, '0') s
FROM   dual
CONNECT BY level <= 10000;

10,000 rows inserted.

SQL> COMMIT;

Commit complete.

SQL> SELECT table_name, partition_name, global_stats, last_analyzed, num_rows
FROM   user_tab_statistics s
WHERE  table_name IN ('T_PART');

TABLE_NAME PARTITION_ GLOBA LAST_ANALYZED         NUM_ROWS
---------- ---------- ----- ------------------- ----------
T_PART                NO    23-05-2018 03:45:10      10000
T_PART     SYS_P961   NO                                  
T_PART     P0         NO                                  
T_PART     SYS_P962   NO    23-05-2018 03:45:10      10000
T_PART     SYS_P963   NO    23-05-2018 03:45:11      10000

SQL> -- Check Synopsis
SQL> SELECT o.name         Table_Name,
       p.subname      Partition_name,
       c.name         "Column",
       h.analyzetime  "Synopsis Creation Time"
FROM   sys.WRI$_OPTSTAT_SYNOPSIS_HEAD$ h,
       sys.OBJ$ o,
       sys.COL$ c,
       ( ( SELECT TABPART$.bo#  BO#,
                  TABPART$.obj# OBJ#
           FROM   sys.TABPART$ tabpart$ )
         UNION ALL
         ( SELECT TABCOMPART$.bo#  BO#,
                  TABCOMPART$.obj# OBJ#
           FROM   sys.TABCOMPART$ tabcompart$ ) ) tp,
       sys.OBJ$ p
WHERE  o.name = 'T_PART' AND
       tp.obj# = p.obj# AND
       h.bo# = tp.bo# AND
       h.group# = tp.obj# * 2 AND
       h.bo# = c.obj#(+) AND
       h.intcol# = c.intcol#(+) AND
       h.bo# = o.obj#
ORDER  BY 4,1,2,3;

TABLE_NAME PARTITION_ Colum Synopsis Creation Time
---------- ---------- ----- ----------------------
T_PART     SYS_P963   ID                          
T_PART     SYS_P963   PKEY                        
T_PART     SYS_P963   S                           
 

Listing 5 – explicitly specifying partition in case of interval partitioning

Conclusion

Having representative statistics (which doesn’t automatically mean they always have to be fresh though) is essential for ETL Performance. Having looked at some use cases in part 1 and part 2, we have seen that there are some specifics to consider. If you plan to rely on online statistics gathering for some of your ETL steps, better double check that the feature is really working each time you run your ETL process.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *