{"id":454,"date":"2018-05-29T21:27:53","date_gmt":"2018-05-29T19:27:53","guid":{"rendered":"http:\/\/blog.sqlora.com\/de\/?p=454"},"modified":"2018-11-04T20:00:57","modified_gmt":"2018-11-04T18:00:57","slug":"online-statistics-gathering-for-etl-part-1","status":"publish","type":"post","link":"https:\/\/blog.sqlora.com\/de\/online-statistics-gathering-for-etl-part-1\/","title":{"rendered":"Online Statistics Gathering for ETL &#8211; Part 1"},"content":{"rendered":"<p>Online Statistics Gathering has been introduced in 12c and is a very handy feature for ETL and batch jobs developers. However the devil is in the detail. There are some points to remember. Let&#8217;s take a closer look.<!--more--><\/p>\n<p><strong>Introduction <\/strong><\/p>\n<p>If you are not familiar with the concept of this feature yet, here are some useful links:<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/12.2\/tgsql\/optimizer-statistics-concepts.html#TGSQL-GUID-E920301F-FFC4-4AB7-8DEB-11BA56088E0B\" rel=\"noopener\" target=\"_blank\">Documentation <\/a><\/li>\n<li><a href=\"https:\/\/sqlmaria.com\/2017\/01\/03\/online-statistics-gathering\/\" rel=\"noopener\" target=\"_blank\">Maria Colgan<\/a><\/li>\n<li><a href=\"https:\/\/oracle-base.com\/articles\/12c\/online-statistics-gathering-for-bulk-loads-12cr1\" rel=\"noopener\" target=\"_blank\">Tim Hall Oracle-Base<\/a><\/li>\n<li><a href=\"https:\/\/danischnider.wordpress.com\/2015\/12\/23\/online-statistics-gathering-in-oracle-12c\/\" rel=\"noopener\" target=\"_blank\">Dani Schnider<\/a><\/li>\n<\/ul>\n<p>But I&#8217;ll try to give a short introduction too. The official name of the feature is <strong>Online Statistics Gathering for Bulk Loads<\/strong>. So when exactly would the table statistics be automatically gathered  and when not? <\/p>\n<p>It&#8217;s quite clear with <strong>C<\/strong>reate <strong>T<\/strong>able <strong>A<\/strong>s <strong>S<\/strong>elect (CTAS): they would (if no restrictions apply). The next type of supported bulk load scenario is <code><strong>INSERT INTO ... SELECT<\/strong><\/code> into an <strong>empty table<\/strong> using a direct path insert. That means, your INSERT should be used with a hint APPEND or should be done in parallel. Don&#8217;t forget to enable parallel DML at the session level or with a hint <em><strong>enable_parallel_dml<\/strong><\/em> &#8211; the <strong><em>parallel<\/em><\/strong> hint alone is not enough. <\/p>\n<p><strong>Parameter, Hints and Execution Plans<\/strong><\/p>\n<p>There is an underscore parameter <em><strong>_optimizer_gather_stats_on_load<\/strong><\/em> which is TRUE by default. Should you want to disable the feature, this parameter can be set to FALSE. There are also two optimizer hints: <code><strong>GATHER_OPTIMIZER_STATISTICS<\/strong><\/code> and <code><strong>NO_GATHER_OPTIMIZER_STATISTICS<\/strong><\/code>. Don&#8217;t misunderstand them! You cannot tell the optimizer to gather (piggyback) statistics online if the main condition (bulk insert into an empty segment) is not met. These hints are only useful in conjunction with the global setting via <em><strong>_optimizer_gather_stats_on_load<\/strong><\/em>: if set to TRUE, you can disable the feature for individual statements via <code><strong>NO_GATHER_OPTIMIZER_STATISTICS<\/strong><\/code> and vice versa, if globally set to FALSE, you can selectively enable piggybacking using <code><strong>GATHER_OPTIMIZER_STATISTICS<\/strong><\/code> hint.<\/p>\n<p>How do you know whether online statistics will be gathered? Doing explain plan you will see an extra step <code><strong>OPTIMIZER STATISTICS GATHERING<\/strong><\/code>.<\/p>\n<pre class=\"brush: sql; highlight: [6]; title: ; notranslate\" title=\"\">\r\n-------------------------------------------------\r\n| Id  | Operation                        | Name |\r\n-------------------------------------------------\r\n|   0 | INSERT STATEMENT                 |      |\r\n|   1 |  LOAD AS SELECT                  | T2   |\r\n|   2 |   OPTIMIZER STATISTICS GATHERING |      |\r\n|   3 |    TABLE ACCESS FULL             | T    |\r\n-------------------------------------------------\r\n<\/pre>\n<p>But in fact you cannot know. This step only means the online statistics collection will be considered at runtime, because the optimizer doesn&#8217;t check whether the segment is really empty at parse time.<\/p>\n<p><strong>What about MERGE?<\/strong><\/p>\n<p>This question comes in mind, because in fact the INSERT part of the MERGE will also be done using direct path write if we were using an APPEND hint or doing it in parallel. As an ETL scenario it is not uncommon to load only records that not already exist in the target. And since you don&#8217;t need to provide both WHEN MATCHED and WHEN NOT MATCHED (since 10g), one could construct a MERGE doing only an insert which, from my point of view, will be semantically the same as an INSERT INTO &#8230; AS SELECT with a NOT IN or NOT EXISTS subquery. The latter will gather statistics online. But what can we expect from MERGE? MERGE is not mentioned in the documentation, but in my opinion it&#8217;s worth checking. <\/p>\n<pre class=\"brush: sql; collapse: true; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\r\nSQL&gt; CREATE TABLE t_merge (id number, s varchar2(20))\r\nTable created.\r\n\r\nSQL&gt; --- MERGE\r\nSQL&gt; MERGE \/*+ append *\/ \r\nINTO   t_merge t\r\nUSING (\r\n       SELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\n       FROM   dual\r\n       CONNECT BY level &lt;= 10000) q\r\nON   (q.id = t.id)\r\nWHEN NOT MATCHED THEN INSERT VALUES (q.id, q.s)\r\n10000 rows merged.\r\n\r\n-- This error message means we were doing a bulk insert (direct path write) into T_MERGE\r\nSQL&gt; SELECT count(*) FROM t_merge\r\n&gt;&gt; SELECT count(*) FROM t_merge\r\n                     *\r\nError at line 1\r\nORA-12838: cannot read\/modify an object after modifying it in parallel \r\n\r\nSQL&gt; COMMIT\r\nCommit complete.\r\n\r\nSQL&gt; -- but we don&#039;t have any statistics\r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name IN (&#039;T_MERGE&#039;)\r\n\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_MERGE                                                \r\n1 row selected.\r\n\r\n<\/pre>\n<p><em><strong>Listing 1 &#8211; No Online Statistics Gathering with MERGE <\/strong><\/em><\/p>\n<p>Unfortunately no. Despite of doing a direct path write into an empty table, MERGE doesn&#8217;t piggyback optimizer statistics. Of course it&#8217;s not a big problem, you should just be aware of it. In one of my recent projects I was wondering, why don&#8217;t my dimensions have any statistics after initial load? These dimensions were quite stable: once initially loaded they rarely got few new records. So I decided not to gather optimizer statistics after each load, but rely on standard Oracle statistics gathering job instead. And for initial load the Online Statistics Gathering will do the job, right? NO! Because the load process was implemented using MERGE. So I had two options: to re-implement the mappings using an INSERT or to explicitly gather optimizer statistics after initial load (or still after each load). Opted for the latter.<\/p>\n<p><strong>What does &#8222;empty&#8220; exactly mean?<\/strong><\/p>\n<p>An interesting question, quite relevant for designing ETL processes. If we plan to implement some intermediate tables and rely on online statistics gathering, how to ensure this works each time we repeat the load job? Let&#8217;s test it.<\/p>\n<pre class=\"brush: sql; collapse: true; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\nSQL&gt; CREATE TABLE t_delete (id NUMBER, s VARCHAR2(20))\r\nTable created.\r\nSQL&gt; CREATE TABLE t_truncate_reuse (id NUMBER, s VARCHAR2(20))\r\nTable created.\r\nSQL&gt; CREATE TABLE t_truncate_drop (id NUMBER, s VARCHAR2(20))\r\nTable created.\r\nSQL&gt; CREATE TABLE t_rollback (id NUMBER, s VARCHAR2(20))\r\nTable created.\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_delete\r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10000 rows created.\r\n\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_truncate_drop \r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10000 rows created.\r\n\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_truncate_reuse\r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10000 rows created.\r\nSQL&gt; COMMIT\r\nCommit complete.\r\n\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_rollback\r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10000 rows created.\r\n\r\nSQL&gt; -- Statistics are published without COMMIT\r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name = &#039;T_ROLLBACK&#039;\r\n\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_ROLLBACK                     14.04.18           10000\r\n1 row selected.\r\n\r\nSQL&gt; ROLLBACK\r\nRollback complete.\r\n\r\nSQL&gt; -- but after ROLLBACK they are gone\r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name IN (&#039;T_DELETE&#039;, &#039;T_ROLLBACK&#039;, \r\n                      &#039;T_TRUNCATE_DROP&#039;, &#039;T_TRUNCATE_REUSE&#039;)\r\n\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_TRUNCATE_REUSE               14.04.18           10000\r\nT_TRUNCATE_DROP                14.04.18           10000\r\nT_ROLLBACK                                             \r\nT_DELETE                       14.04.18           10000\r\n\r\n4 rows selected.\r\n\r\nSQL&gt; -- now delete rows and truncate tables\r\nSQL&gt; DELETE FROM t_delete\r\n10000 rows deleted.\r\nSQL&gt; TRUNCATE TABLE t_truncate_drop DROP STORAGE\r\nTable truncated.\r\nSQL&gt; TRUNCATE TABLE t_truncate_reuse REUSE STORAGE\r\nTable truncated.\r\n\r\nSQL&gt; -- INSERT more data and check if statistics had been updated\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_delete\r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 20000\r\n20000 rows created.\r\n\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_truncate_drop \r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 20000\r\n20000 rows created.\r\n\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_truncate_reuse\r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 20000\r\n20000 rows created.\r\n\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_rollback\r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 20000\r\n20000 rows created.\r\n\r\nSQL&gt; COMMIT\r\nCommit complete.\r\n\r\nSQL&gt; -- do we have the new statistics?\r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name in (&#039;T_DELETE&#039;, &#039;T_ROLLBACK&#039;, \r\n                      &#039;T_TRUNCATE_DROP&#039;, &#039;T_TRUNCATE_REUSE&#039;)\r\n\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_ROLLBACK                                             \r\nT_DELETE                       14.04.18           10000\r\nT_TRUNCATE_REUSE               14.04.18           10000\r\nT_TRUNCATE_DROP                14.04.18           20000\r\n\r\n4 rows selected.\r\n\r\n<\/pre>\n<p><em><strong>Listing 2 &#8211; when will the segment be considered empty?<\/strong><\/em><\/p>\n<p>This example provides some quite interesting findings:<\/p>\n<ul>\n<li>After <strong>deleting<\/strong> all rows the segment is not considered empty. The next <code>INSERT \/*+ append*\/ <\/code>will not gather statistics.<\/li>\n<li>A <strong>rollback<\/strong> of a direct path INSERT leaves the segment in state that is not considered empty. The next INSERT&nbsp;\/*+ append*\/ will not gather statistics.<\/li>\n<li>After <strong>truncating<\/strong> the table with <strong>DROP STORAGE<\/strong> option (this is default) the next INSERT&nbsp;\/*+ append*\/ will gather statistics. But again, using <strong>TRUNCATE TABLE &#8230; REUSE STORAGE<\/strong> prevents online statistics gathering afterwards.<\/li>\n<li>INSERT \/*+ append*\/ publishes the optimizer statistics already before committing. Another sessions can see them. The ROLLBACK take them away though.<\/li>\n<\/ul>\n<p>The only way we can rely on online statistics gathering in a repeatable process without dropping the table is to use TRUNCATE TABLE without an option or with explicit DROP STORAGE option before inserting new data.<\/p>\n<p>The fact that the optimizer statistics are immediately available after INSERT statement is indeed documented. But the placement of this sentence at the end of <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/12.2\/tgsql\/optimizer-statistics-concepts.html#GUID-0B027A47-087D-4D7B-ABD4-2210D4C37ECB\" rel=\"noopener\" target=\"_blank\">10.3.3.2 Global Statistics During Inserts into Empty Partitioned Tables <\/a> is IMHO wrong or at least understated. <\/p>\n<p><strong>Cursor invalidation<\/strong><br \/>\nInitially I wanted to investigate the behavior concerning cursor invalidation, but I didn&#8217;t managed to get a consistent test results. If you are gathering  object statistics via DBMS_STATS you have control over cursor invalidation using the parameter NO_INVALIDATE. With Online Statistics Gathering  you have no control. With the following script I believed to prove that the cursor was invalidated immediately after the INSERT which gathered stats, even without committing. But running this some more times I saw that it was not invalidated every time. On the other hand, if I turned the INSERT to be a conventional load, the cursor was invalidated too, sometimes not.  <\/p>\n<pre class=\"brush: sql; collapse: true; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\r\nSQL&gt; set linesize 300 pages 10000\r\nSQL&gt; column table_name format a30\r\nSQL&gt; DROP table t_rollback purge\r\nTable dropped.\r\nSQL&gt; CREATE TABLE t_rollback (id NUMBER, s VARCHAR2(20))\r\nTable created.\r\n\r\nSQL&gt; -- we will check if this cursor will be invalidated\r\nSQL&gt; SELECT \/* TEST_CURSOR_INVALIDATION *\/* FROM t_rollback\r\nno rows selected.\r\n\r\nSQL&gt; COLUMN sql_id new_value sql_id\r\n\r\nSQL&gt; -- find the cursor in the shared pool\r\nSQL&gt; SELECT sql_id, LOADED_VERSIONS,EXECUTIONS,INVALIDATIONS\r\nFROM   v$sql s\r\nWHERE  sql_text LIKE &#039;%TEST_CURSOR_INVALIDATION%&#039; \r\nAND    UPPER(sql_text) NOT LIKE &#039;%V$SQL%&#039;\r\n\r\nSQL_ID        LOADED_VERSIONS EXECUTIONS INVALIDATIONS\r\n------------- --------------- ---------- -------------\r\n027kac8gfsud8               1          1             0\r\n1 row selected.\r\n\r\nSQL&gt; -- Insert data but not commit\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_rollback\r\nSELECT level AS id , lpad (level, 20, &#039;0&#039;) s\r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10000 rows created.\r\n\r\nSQL&gt; -- the statistics are already published\r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name = &#039;T_ROLLBACK&#039;\r\n\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_ROLLBACK                     14.04.18           10000\r\n1 row selected.\r\n\r\nSQL&gt; -- and the Cursor has been invalidated\r\nSQL&gt; SELECT sql_id, LOADED_VERSIONS,EXECUTIONS,INVALIDATIONS\r\nFROM   v$sql s\r\nWHERE  sql_id = &#039;027kac8gfsud8&#039;\r\n\r\nSQL_ID        LOADED_VERSIONS EXECUTIONS INVALIDATIONS\r\n------------- --------------- ---------- -------------\r\n027kac8gfsud8               1          1             1\r\n1 row selected.\r\n\r\nSQL&gt; ROLLBACK\r\nRollback complete.\r\n\r\nSQL&gt; -- statistics are gone after rollback\r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name = &#039;T_ROLLBACK&#039;\r\n\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_ROLLBACK                                             \r\n1 row selected.\r\n\r\n<\/pre>\n<p><em><strong>Listing 3 &#8211; Cursor invalidation, the results are not consistent though!<\/strong><\/em><\/p>\n<p>At this point I had strong doubt, whether it was still worth further investigation. I still believe the cursor will be invalidated, but is it a problem within an ETL process? In my opinion not really. The online statistics only kick in if loading into an empty segment. <\/p>\n<p>Following cases are possible: <\/p>\n<ul>\n<li>With CTAS there can be no cursors using THIS new segment to invalidate;<\/li>\n<li>With direct path INSERT there could be the stats in place which were gathered on empty segment, so it is highly preferable to invalidate all cursors after loading data; <\/li>\n<li>With direct path INSERT there are &#8222;old&#8220; stats in place which were gathered before truncate. Well, in that case your cursors have to be re-optimized again, but the parse time is generally not a problem in data warehouse environments. If you are loading data which can significantly differ from load to load, than again, the invalidation will be desirable. If the data is pretty the same and you are really having the problem with cursor invalidation, you could prevent re-gathering already representative  statistics by locking them or using the hint NO_GATHER_OPTIMIZER_STATS. <\/li>\n<\/ul>\n<p><strong>Some pitfalls and restrictions<\/strong><\/p>\n<p>There are some restrictions <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/12.2\/tgsql\/optimizer-statistics-concepts.html#GUID-CDDB5A54-0991-4E68-A9D7-2305777B608B\" rel=\"noopener\" target=\"_blank\">documented<\/a>, but should you have doubts, keep in mind to re-read this section (and MOS Notes) for your version. There are better explanations in 12.2. Some restrictions were lifted, some of them where lifted even in 12.1 but this seems not to be reflected in the documentation.<\/p>\n<p><em>What kind of statistics?<\/em><\/p>\n<p>You get only table and basic column statistics. No histograms. No index statistics will be gathered. Recently I could observe a nasty side effect. There was an index on a big table which occasionally was empty at the time the standard statistics maintenance job was active. After that both table and index statistics were reflecting that fact (number or distinct keys = 0). And while table stats were re-gathered online, the index stats were not. And guess what? The optimizer has chosen to use index statistics for its cardinality estimates resulting in a nested loop. So you should cater for index statistics yourself.<\/p>\n<p><em>Columns<\/em><\/p>\n<p>Such a simple thing as not specifying all columns in INSERT prevents online statistics gathering:<\/p>\n<pre class=\"brush: sql; collapse: true; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\r\nSQL&gt; CREATE TABLE t_not_all_columns (id number, s varchar2(20))\r\nTable created.\r\n\r\nSQL&gt; INSERT \/*+ append *\/ INTO   t_not_all_columns (ID)\r\nSELECT level AS id \r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10000 rows created.\r\n\r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name IN (&#039;T_NOT_ALL_COLUMNS&#039;)\r\n\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_NOT_ALL_COLUMNS                                      \r\n1 row selected.\r\n\r\n<\/pre>\n<p><em><strong>Listing 4 &#8211; not all columns specified &#8211; no statistics gathering<\/strong><\/em><\/p>\n<p>This was not mentioned at all in the 12.1 documentation. But 12.2 has a very good explanation. If all missing columns have default values online statistics gathering will kick in. Now I asked myself, if this works, why should virtual columns be a problem, as 12.1 documentation says? Tested it and indeed, it works with virtual columns even in 12.1 (Listing 5) <\/p>\n<pre class=\"brush: sql; collapse: true; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\r\n&gt; SELECT * FROM v$version\r\nBANNER                                                                               CON_ID\r\n-------------------------------------------------------------------------------- ----------\r\nOracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production              0\r\nPL\/SQL Release 12.1.0.2.0 - Production                                                    0\r\nCORE\t12.1.0.2.0\tProduction                                                                  0\r\nTNS for Linux: Version 12.1.0.2.0 - Production                                            0\r\nNLSRTL Version 12.1.0.2.0 - Production                                                    0\r\n\r\n&gt; ----------------\r\n&gt; --not all columns\r\n&gt; DROP table t_not_all_columns purge\r\n\r\ntable T_NOT_ALL_COLUMNS dropped.\r\n\r\n&gt; CREATE TABLE t_not_all_columns \r\n( id number\r\n, s varchar2(20))\r\n\r\ntable T_NOT_ALL_COLUMNS created.\r\n\r\n&gt; INSERT \/*+ append *\/ INTO   t_not_all_columns (ID)\r\nSELECT level AS id \r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10,000 rows inserted.\r\n\r\n&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name IN (&#039;T_NOT_ALL_COLUMNS&#039;)\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_NOT_ALL_COLUMNS                                      \r\n\r\n&gt; -----------\r\n&gt; --default\r\n&gt; DROP table t_not_all_columns_default purge\r\n\r\ntable T_NOT_ALL_COLUMNS_DEFAULT dropped.\r\n\r\n&gt; CREATE TABLE t_not_all_columns_default \r\n( id number\r\n, s varchar2(20) default &#039;ABC&#039;)\r\n\r\ntable T_NOT_ALL_COLUMNS_DEFAULT created.\r\n\r\n&gt; INSERT \/*+ append *\/ INTO   t_not_all_columns_default (ID)\r\nSELECT level AS id \r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10,000 rows inserted.\r\n\r\n&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name IN (&#039;T_NOT_ALL_COLUMNS_DEFAULT&#039;)\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_NOT_ALL_COLUMNS_DEFAULT      29-MAY-18          10000\r\n\r\n&gt; ----------\r\n&gt; --virtual\r\n&gt; DROP table t_not_all_columns_virtual purge\r\n\r\ntable T_NOT_ALL_COLUMNS_VIRTUAL dropped.\r\n\r\n&gt; CREATE TABLE t_not_all_columns_virtual \r\n( id number\r\n, s varchar2(20) generated always as (&#039;ABC&#039;) virtual)\r\n\r\ntable T_NOT_ALL_COLUMNS_VIRTUAL created.\r\n\r\n&gt; INSERT \/*+ append *\/ INTO   t_not_all_columns_virtual (ID)\r\nSELECT level AS id \r\nFROM   dual\r\nCONNECT BY level &lt;= 10000\r\n10,000 rows inserted.\r\n\r\n&gt; SELECT table_name, last_analyzed, num_rows\r\nFROM   user_tables \r\nWHERE  table_name IN (&#039;T_NOT_ALL_COLUMNS_VIRTUAL&#039;)\r\nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\r\n------------------------------ ------------- ----------\r\nT_NOT_ALL_COLUMNS_VIRTUAL      29-MAY-18          10000\r\n \r\n<\/pre>\n<p><em><strong>Listing 5 &#8211; columns with default values and virtual columns<\/strong><\/em><\/p>\n<p>I should say, I don&#8217;t really understand, why not specifying a column which has no default value should be a problem. After all, if I gather statistics afterwards and some columns are completely empty, I&#8217;ll get statistics on them reflecting exactly that fact. <\/p>\n<p><em>Extended Stats<\/em><\/p>\n<p>There is a bug 18425876 &#8211; if you have created extended stats before filling the table, this prevents online statistics gathering. I could reproduce this bug only in 12.1.0.1 though. MOS states it was fixed in 12.2, but it seems it have been backported to 12.1.0.2 too.<\/p>\n<pre class=\"brush: sql; collapse: true; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\nSQL&gt; SELECT * FROM v$version;\r\n\r\nBANNER                                                                               CON_ID                                                                                                                                                                                                                 \r\n-------------------------------------------------------------------------------- ----------                                                                                                                                                                                                                 \r\nOracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0                                                                                                                                                                                                                 \r\n...\r\nSQL&gt; \r\nSQL&gt; \r\nSQL&gt; DROP table t_ext_stats purge;\r\n\r\nTable dropped.\r\n\r\nSQL&gt; \r\nSQL&gt; CREATE TABLE t_ext_stats\r\n  2  ( id number\r\n  3  , s varchar2(20));\r\n\r\nTable created.\r\n\r\nSQL&gt; \r\nSQL&gt; variable x varchar2(50)\r\nSQL&gt; \r\nSQL&gt; exec :x:= DBMS_STATS.CREATE_EXTENDED_STATS(user, &#039;T_EXT_STATS&#039;,&#039;(ID,S)&#039;)\r\n\r\nPL\/SQL procedure successfully completed.\r\n\r\nSQL&gt; \r\nSQL&gt; INSERT \/*+ append *\/ INTO\t t_ext_stats (ID, S)\r\n  2  SELECT level AS id, &#039;ABC&#039;\r\n  3  FROM   dual\r\n  4  CONNECT BY level &lt;= 10000;\r\n\r\n10000 rows created.\r\n\r\nSQL&gt; \r\nSQL&gt; \r\nSQL&gt; SELECT table_name, last_analyzed, num_rows\r\n  2  FROM   user_tables\r\n  3  WHERE  table_name IN (&#039;T_EXT_STATS&#039;);\r\n\r\nTABLE_NAME                     LAST_ANAL   NUM_ROWS                                                                                                                                                                                                                                                         \r\n------------------------------ --------- ----------                                                                                                                                                                                                                                                         \r\nT_EXT_STATS                                                                                                                                                                                                                                                                                                  \r\n<\/pre>\n<p><em><strong>Listing 6 &#8211; extended statistics prevent online statistics gathering<\/strong><\/em><\/p>\n<p>In the <a href=\"http:\/\/blog.sqlora.com\/en\/online-statistic-gathering-for-etl-part-2\/\" rel=\"noopener\" target=\"_blank\">second part<\/a> we&#8217;ll take a look at partitioning.<\/p>\n<div class=\"crp-list-container\"><h3 class=\"crp-list-title\">Related Posts<\/h3><ul class=\"crp-list\"><li class=\"crp-list-item crp-list-item-image-none\"><div class=\"crp-list-item-title\"><a href=\"https:\/\/blog.sqlora.com\/de\/online-statistics-gathering-for-etl-part-2\/\" target=\"_blank\">Online Statistics Gathering for ETL - Part 2<\/a><\/div><\/li><li class=\"crp-list-item crp-list-item-image-none\"><div class=\"crp-list-item-title\"><a href=\"https:\/\/blog.sqlora.com\/de\/online-statistics-gathering-for-etl-part-3\/\" target=\"_blank\">Online Statistics Gathering for ETL \u2013 Part 3<\/a><\/div><\/li><\/ul><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Online Statistics Gathering has been introduced in 12c and is a very handy feature for ETL and batch jobs developers. However the devil is in the detail. There are some points to remember. Let&#8217;s take a closer look.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,11,39,68,3,22],"tags":[71,10,70,69],"class_list":["post-454","post","type-post","status-publish","format-standard","hentry","category-12c","category-cbo","category-data-warehouse","category-etl","category-oracle","category-trivadis","tag-bulk-insert","tag-merge","tag-on-the-fly","tag-statistiken"],"_links":{"self":[{"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/posts\/454","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/comments?post=454"}],"version-history":[{"count":5,"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/posts\/454\/revisions"}],"predecessor-version":[{"id":475,"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/posts\/454\/revisions\/475"}],"wp:attachment":[{"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/media?parent=454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/categories?post=454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.sqlora.com\/de\/wp-json\/wp\/v2\/tags?post=454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}