{"id":2214,"date":"2024-08-27T13:24:41","date_gmt":"2024-08-27T11:24:41","guid":{"rendered":"https:\/\/blog.sqlora.com\/en\/?p=2214"},"modified":"2024-08-28T20:26:07","modified_gmt":"2024-08-28T18:26:07","slug":"online-statistics-gathering-update-2024","status":"publish","type":"post","link":"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-update-2024\/","title":{"rendered":"Online Statistics Gathering: Update 2024"},"content":{"rendered":"\n<p>Online Statistics Gathering has been introduced in Oracle 12c. I think it is a very important building block to establish a consistent approach in dealing with optimizer statistics in your ETL jobs. I have already written some posts about it a while ago (<a href=\"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-for-etl-part-1\/\" target=\"_blank\" rel=\"noreferrer noopener\">first<\/a>, <a href=\"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-for-etl-part-2\/\" target=\"_blank\" rel=\"noreferrer noopener\">second<\/a> and <a href=\"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-for-etl-part-3\/\" target=\"_blank\" rel=\"noreferrer noopener\">third<\/a>). When I was preparing a talk about optimizer statistics for data engineers recently, I took a closer look at it again. Oracle has now continuously improved the feature over several versions and has lifted many restrictions. Let&#8217;s see what I mean.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">TL;DR<\/h2>\n\n\n\n<p>There are a couple of changes in the behavior and\/or documentation since the introduction of the feature (Oracle 19c to 23ai):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Index statistics are gathered too. This is only valid for unique indexes in 19c, but also for non-unique in 23ai. There are no stats for bitmap indexes though.<\/li>\n\n\n\n<li>Omitting columns without default values is no restriction anymore.<\/li>\n\n\n\n<li>If using MERGE and writing into an empty segment using direct path writes, online statistics will be  gathered too. But this behavior is not documented.<\/li>\n\n\n\n<li>If loading into a partitioned table with its incremental statistics preference set to TRUE but not using an extended syntax to explicitly specify a partition, still no statistics are gathered. This was a known and documented restriction, but it has now disappeared from the documentation <\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Recap<\/h2>\n\n\n\n<p>The idea behind this feature is that if you are loading data using direct path write into an empty segment, optimizer statistics will be gathered on the fly as a part of your loading SQL statement. What statistics exactly will be gathered, what statements are eligible, some limitations &#8211; this is what has changed  <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Index Statistics<\/h2>\n\n\n\n<p>Initially, index statistics were not gathered at all. That was last mentioned in the documentation of Oracle 18c: <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>While gathering online statistics, the database does not gather index statistics or create histograms.<\/p>\n<cite><a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/18\/tgsql\/optimizer-statistics-concepts.html#GUID-E920301F-FFC4-4AB7-8DEB-11BA56088E0B\" target=\"_blank\" rel=\"noreferrer noopener\">Optimizer Statistics Concepts 18c<\/a><\/cite><\/blockquote>\n\n\n\n<p>Looking at the same paragraph in the <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/19\/tgsql\/optimizer-statistics-concepts.html#GUID-E920301F-FFC4-4AB7-8DEB-11BA56088E0B\" target=\"_blank\" rel=\"noreferrer noopener\">19c docs<\/a>, only histograms are mentioned<\/p>\n\n\n\n<p>So, what does it mean? Are index statistics gathered as well in 19c? Let&#8217;s test it.<\/p>\n\n\n\n<p><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; CREATE TABLE t_with_ind (id number, s varchar2(20));\n\nTable T_WITH_IND created.\n\nSQL&gt; CREATE UNIQUE INDEX i1 on t_with_ind(id);\n\nINDEX I1 created.\n\nSQL&gt; CREATE INDEX i2 on t_with_ind(s);\n\nIndex I2 created.\n\nSQL&gt; SELECT table_name, last_analyzed, num_rows\n  2  FROM   user_tables \n  3  WHERE  table_name IN (&#039;T_WITH_IND&#039;);\n\nTABLE_NAME LAST_ANA   NUM_ROWS\n---------- -------- ----------\nT_WITH_IND                    \n\n\nSQL&gt; SELECT table_name, index_name, last_analyzed, num_rows,  blevel, leaf_blocks, distinct_keys, clustering_factor\n  2  FROM   user_indexes \n  3  WHERE  table_name IN (&#039;T_WITH_IND&#039;);\n\nTABLE_NAME INDEX_NAME LAST_ANA   NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR\n---------- ---------- -------- ---------- ---------- ----------- ------------- -----------------\nT_WITH_IND I1         17\/08\/24          0          0           0             0                 0\nT_WITH_IND I2         17\/08\/24          0          0           0             0                 0\n\n \nSQL&gt; INSERT \/*+ append *\/ INTO   T_WITH_IND (ID,S)\n  2  SELECT level AS id, LPAD(level,20,&#039;0&#039;) \n  3  FROM   dual\n  4  CONNECT BY level &lt;= 10000;\n\n10.000 rows inserted.\n\nSQL&gt; commit;\n\nCommit complete.\n\nSQL&gt; SELECT table_name, last_analyzed, num_rows\n  2  FROM   user_tables \n  3  WHERE  table_name IN (&#039;T_WITH_IND&#039;);\n\nTABLE_NAME LAST_ANA   NUM_ROWS\n---------- -------- ----------\nT_WITH_IND 17\/08\/24      10000\n\nSQL&gt; SELECT table_name, index_name, last_analyzed, num_rows, blevel, leaf_blocks, distinct_keys, clustering_factor\n  2  FROM   user_indexes i\n  3  WHERE  table_name IN (&#039;T_WITH_IND&#039;);\n\nTABLE_NAME INDEX_NAME LAST_ANA   NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR\n---------- ---------- -------- ---------- ---------- ----------- ------------- -----------------\nT_WITH_IND I1         17\/08\/24      10000          1          22         10000              4127\nT_WITH_IND I2         17\/08\/24          0          0           0             0                 0\n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 1<\/em><\/strong>: <strong><em>Only unique indexes in 19c<\/em><\/strong><\/p>\n\n\n\n<p>And indeed, we can see that statistics for the first (unique) index were gathered but not for the non-unique one. I haven&#8217;t found a bug for this behavior but I have found a post by <a href=\"https:\/\/itoug.it\/blog\/2019\/11\/online-statistics-gathering-doesnt-work-on-indexes-with-duplicate-values-on-oracle-19c\/\" target=\"_blank\" rel=\"noreferrer noopener\">Donatello Settembrino<\/a>, describing exactly this observation. As Christian Antognini commented, in Oracle 20c (which was available only as a preview) the statistics for non-unique indexes are gathered too. I will skip 20c and 21c altogether, but we can check this in Oracle 23ai:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; CREATE TABLE t_with_ind (id number, s varchar2(20))\nTable created.\nSQL&gt; CREATE INDEX i1 on t_with_ind(id)\nIndex created.\nSQL&gt; CREATE INDEX i2 on t_with_ind(s)\nIndex created.\nSQL&gt; CREATE INDEX i3 on t_with_ind(id,s)\nIndex created.\nSQL&gt; CREATE BITMAP INDEX bi1 on t_with_ind(id) invisible\nIndex created.\nSQL&gt; SELECT table_name, last_analyzed, num_rows\nFROM   user_tables \nWHERE  table_name IN (&#039;T_WITH_IND&#039;)\n\nTABLE_NAME LAST_ANALYZED           NUM_ROWS\n---------- --------------------- ----------\nT_WITH_IND                                 \n1 row selected.\nSQL&gt; SELECT table_name, index_name, last_analyzed, num_rows,  blevel, leaf_blocks, distinct_keys, clustering_factor\nFROM   user_indexes \nWHERE  table_name IN (&#039;T_WITH_IND&#039;)\n\nTABLE_NAME INDEX_NAME LAST_ANALYZED           NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR\n---------- ---------- --------------------- ---------- ---------- ----------- ------------- -----------------\nT_WITH_IND I1         18.08.2024 09:39:31            0          0           0             0                 0\nT_WITH_IND I2         18.08.2024 09:39:31            0          0           0             0                 0\nT_WITH_IND I3         18.08.2024 09:39:31            0          0           0             0                 0\nT_WITH_IND BI1        18.08.2024 09:39:31            0          0           0             0                 0\n\n4 rows selected.\n\nSQL&gt; INSERT \/*+ append *\/  INTO   T_WITH_IND (ID,S)\nSELECT level AS id , LPAD(level,20,&#039;0&#039;) \nFROM   dual\nCONNECT BY level &lt;= 100000\n100000 rows created.\nSQL&gt; commit\nCommit complete.\n\nSQL&gt; SELECT table_name, last_analyzed, num_rows\nFROM   user_tables \nWHERE  table_name IN (&#039;T_WITH_IND&#039;)\n\nTABLE_NAME LAST_ANALYZED           NUM_ROWS\n---------- --------------------- ----------\nT_WITH_IND 18.08.2024 09:39:39       100000\n1 row selected.\nSQL&gt; SELECT table_name, index_name, last_analyzed, num_rows, blevel, leaf_blocks, distinct_keys, clustering_factor\nFROM   user_indexes i\nWHERE  table_name IN (&#039;T_WITH_IND&#039;)\n\nTABLE_NAME INDEX_NAME LAST_ANALYZED           NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR\n---------- ---------- --------------------- ---------- ---------- ----------- ------------- -----------------\nT_WITH_IND I1         18.08.2024 09:39:39       100000          1         231        100000             41508\nT_WITH_IND I2         18.08.2024 09:39:39       100000          2         448        100000             41508\nT_WITH_IND I3         18.08.2024 09:39:39       100000          2         543        100000             41508\nT_WITH_IND BI1        18.08.2024 09:39:31            0          0           0             0                 0\n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 2: Unique and non-unique B-tree indexes got statistics but not a bitmap index<\/em><\/strong><\/p>\n\n\n\n<p>As we can see, statistics for B-Tree indexes are collected online both for unique and non-unique indexes, but we still get no statistics for bitmaps indexes. Having seen that, I still believe the best practice for bulk loading large amounts of data would be to disable\/drop the indexes first, then load the data and rebuild\/recreate the indexes afterwards.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Not all columns specified<\/h2>\n\n\n\n<p>Up to Oracle 18c we could find the following <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/18\/tgsql\/optimizer-statistics-concepts.html#GUID-E920301F-FFC4-4AB7-8DEB-11BA56088E0B\" target=\"_blank\" rel=\"noreferrer noopener\">restriction<\/a>: <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>It is loaded using an&nbsp;<code>INSERT INTO ... SELECT<\/code>, and&nbsp;<em>neither<\/em>&nbsp;of the following conditions is true: all columns of the target table are specified, or a subset of the target columns are specified and the unspecified columns have default values.<\/p>\n\n\n\n<p>Put differently, the database&nbsp;<em>only<\/em>&nbsp;gathers statistics automatically for bulk loads when either all columns of the target table are specified, or a subset of the target columns are specified and the unspecified columns have default values.&nbsp;<\/p>\n<\/blockquote>\n\n\n\n<p>If you omit the column <strong><em>col_no_def<\/em><\/strong> with no default value in the INSERT SQL, online statistics gathering does not take place  in Oracle 12.1 to 18c as shown in Listing 3.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; CREATE TABLE t_not_all_columns (id number, col_no_def varchar2(20))\nTable created.\n \nSQL&gt; INSERT \/*+ append *\/ INTO   t_not_all_columns (ID)\nSELECT level AS id \nFROM   dual\nCONNECT BY level &lt;= 10000\n10000 rows created.\n \nSQL&gt; SELECT table_name, last_analyzed, num_rows\nFROM   user_tables \nWHERE  table_name IN (&#039;T_NOT_ALL_COLUMNS&#039;)\n \nTABLE_NAME                     LAST_ANALYZED   NUM_ROWS\n------------------------------ ------------- ----------\nT_NOT_ALL_COLUMNS                                      \n1 row selected.\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 3: The behavior up to 18c <\/em><\/strong><\/p>\n\n\n\n<p>Is the restriction now gone? Yes, the same code works from 19c: <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; CREATE TABLE t_not_all_columns (id number, col_no_def varchar2(20));\n\nTable T_NOT_ALL_COLUMNS created.\n\nSQL&gt; INSERT \/*+ append *\/ INTO   t_not_all_columns (ID)\n  2  SELECT level AS id \n  3  FROM   dual\n  4  CONNECT BY level &lt;= 10000;\n\n10.000 rows inserted.\n\nSQL&gt; commit;\n\nCommit complete.\n\nSQL&gt; SELECT table_name, last_analyzed, num_rows\n  2  FROM   user_tables \n  3  WHERE  table_name IN (&#039;T_NOT_ALL_COLUMNS&#039;);\n\nTABLE_NAME                                 LAST_ANA   NUM_ROWS\n------------------------------------------ -------- ----------\nT_NOT_ALL_COLUMNS                          18\/08\/24      10000\n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 4: omitting columns with no default values is no problem from 19c<\/em><\/strong><\/p>\n\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-2925154690547867\" crossorigin=\"anonymous\"><\/script><ins class=\"adsbygoogle\" style=\"display:block; text-align:center;\" data-ad-client=\"ca-pub-2925154690547867\" \ndata-ad-slot=\"2727403138\" \ndata-ad-layout=\"in-article\"\ndata-ad-format=\"fluid\"><\/ins>\n<script> \n(adsbygoogle = window.adsbygoogle || []).push({}); \n<\/script>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Online Statistics also using MERGE<\/h2>\n\n\n\n<p>MERGE statement was never mentioned in the documentation as eligible for online statistics gathering. In fact, if you are loading data into an empty segment and specify then APPEND hint, it is doing the same as pure INSERT, right? Listing 5 shows the check in Oracle 19c. ORA-12838 indicates that direct path write has happened, but, nevertheless,  we got no statistics in this case.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; CREATE TABLE t_merge (id number, s varchar2(20));\n\nTable T_MERGE created.\n\nSQL&gt; MERGE \/*+ append *\/\n  2  INTO   t_merge t\n  3  USING (\n  4         SELECT level AS id , lpad (level, 20, &#039;0&#039;) s\n  5         FROM   dual\n  6         CONNECT BY level &lt;= 10000) q\n  7  ON   (q.id = t.id)\n  8  WHEN NOT MATCHED THEN INSERT VALUES (q.id, q.s);\n\n10.000 rows merged.\n\nSQL&gt; SELECT count(*) FROM t_merge;\n\nSQL Error: ORA-12838: cannot read\/modify an object after modifying it in parallel\n12838. 00000 -  &quot;cannot read\/modify an object after modifying it in parallel&quot;\n\nSQL&gt; COMMIT;\n\nCommit complete.\n\nSQL&gt; SELECT table_name, last_analyzed, num_rows\n  2  FROM   user_tables \n  3  WHERE  table_name IN (&#039;T_MERGE&#039;);\n\nTABLE_NAME                    LAST_ANA   NUM_ROWS\n------------------------------ -------- ----------\nT_MERGE                                                                                                                                             \n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 5: MERGE still not gathering stats in 19c<\/em><\/strong><\/p>\n\n\n\n<p>Let&#8217;s check it in Oracle 23ai! First, this is new to Oracle 23ai, we got no error trying to read segments immediately after they were written via direct path. No ORA-12838 &#8211; nice! But the statics were gathered online. I&#8217;ve changed the check query a bit to show the column <strong>NOTES<\/strong> from <strong>user_tab_col_statistics<\/strong>. The text STATS_ON_LOAD indicates that Online Statistics Gathering has kicked in. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; CREATE TABLE t_merge (id number, s varchar2(20));\n\nTable T_MERGE created.\n\nSQL&gt; MERGE \/*+ append *\/\n  2  INTO   t_merge t\n  3  USING (\n  4         SELECT level AS id , lpad (level, 20, &#039;0&#039;) s\n  5         FROM   dual\n  6         CONNECT BY level &lt;= 10000) q\n  7  ON   (q.id = t.id)\n  8  WHEN NOT MATCHED THEN INSERT VALUES (q.id, q.s);\n\n10.000 rows merged.\n\nSQL&gt; SELECT count(*) FROM t_merge;\n\n  COUNT(*)\n----------\n     10000\n\nSQL&gt; COMMIT;\n\nCommit complete.\n\nSQL&gt; SELECT table_name, column_name, last_analyzed, notes\n  2  FROM   user_tab_col_statistics\n  3  WHERE  table_name IN (&#039;T_MERGE&#039;);\n\nTABLE_NAME           COLUMN_NAM LAST_ANA NOTES                                                                                              \n-------------------- ---------- -------- ---------------------\nT_MERGE              ID         18\/08\/24 STATS_ON_LOAD                                                                                      \nT_MERGE              S          18\/08\/24 STATS_ON_LOAD \n<\/pre><\/div>\n\n\n<p><strong><em>Listing 6: MERGE in Oracle 23ai &#8211; statistics are gathered<\/em><\/strong><\/p>\n\n\n\n<p>Well, good to know that it works now, but I would also like to see the documentation updated at this point. Otherwise, if we will want to benefit from it, we&#8217;ll be using undocumented behavior, right?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Partitioning <\/h2>\n\n\n\n<p>Bulk loading data into partitioned tables, there are some point to consider, what statistics will be gathered depending on syntax you are using and whether incremental statistics preference is set for the table. I wrote about it in <a href=\"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-for-etl-part-2\/\" target=\"_blank\" rel=\"noreferrer noopener\">my post in 2018<\/a> and I will repeat the matrix here again. The interesting point is that if you set the incremental preference to TRUE and don&#8217;t use extended syntax (i.e. explicitly specifying the target partition) then you&#8217;ll get no statistics at all.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th class=\"has-text-align-center\" data-align=\"center\">Incremental Preference TRUE<\/th><th class=\"has-text-align-center\" data-align=\"center\">Incremental Preference FALSE<\/th><\/tr><\/thead><tbody><tr><td><strong>Partition name specified<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">partition-level stats + synopsis<\/td><td class=\"has-text-align-center\" data-align=\"center\">partition-level stats<\/td><\/tr><tr><td><strong>No partition name specified<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">NO STATS AT ALL<\/td><td class=\"has-text-align-center\" data-align=\"center\">global stats<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>That is explicitly mentioned under <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/18\/tgsql\/optimizer-statistics-concepts.html#GUID-E920301F-FFC4-4AB7-8DEB-11BA56088E0B\" target=\"_blank\" rel=\"noreferrer noopener\">restrictions<\/a> in the documentation of Oracle 18c:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>It is partitioned,&nbsp;<code>INCREMENTAL<\/code>&nbsp;is set to&nbsp;<code>true<\/code>, and partition-extended syntax is&nbsp;<em>not<\/em>&nbsp;used.For example, assume that you execute&nbsp;<code>DBMS_STATS.SET_TABLE_PREFS(null, 'sales', incremental', 'true')<\/code>. In this case, the database does not gather statistics for&nbsp;<code>INSERT INTO sales SELECT<\/code>, even when&nbsp;<code>sales<\/code>&nbsp;is empty. However, the database does gather statistics automatically for&nbsp;<code>INSERT INTO sales PARTITION<\/code>&nbsp;<code>(sales_q4_2000) SELECT<\/code>.<\/p>\n<\/blockquote>\n\n\n\n<p>And again, from 19c onwards the restriction is not on the list anymore. What does it mean? What statistics will be gathered online in this case? I have tested it in Oracle 23ai: <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; CREATE TABLE t_part (pkey number, id number, s varchar2(20))\n  2  partition by range (pkey)\n  3  (partition P0 values less than (0),\n  4  partition P1 values less than (1),\n  5  partition P2 values less than (2),\n  6  partition P3 values less than (3));\n\nTable T_PART created.\n\nSQL&gt; exec dbms_stats.set_table_prefs(null,&#039;T_PART&#039;,&#039;INCREMENTAL&#039;,&#039;TRUE&#039;)\n\nPL\/SQL procedure successfully completed.\n\nSQL&gt; INSERT \/*+ append *\/ INTO   t_part --PARTITION (P2)\n  2  SELECT 1 pkey, level AS id , lpad (level, 20, &#039;0&#039;) s\n  3  FROM   dual\n  4  CONNECT BY level &lt;= 10000;\n\n10.000 rows inserted.\n\nSQL&gt; COMMIT;\n\nCommit complete.\n\nSQL&gt; select table_name, partition_name, object_type, last_analyzed, num_rows\n  2  from sys.user_tab_statistics cs\n  3  where table_name IN (&#039;T_PART&#039;);\n\nTABLE_NAME           PARTI OBJECT_TYPE     LAST_ANA   NUM_ROWS\n-------------------- ----- --------------- -------- ----------\nT_PART                     TABLE                              \nT_PART               P0    PARTITION                          \nT_PART               P1    PARTITION                          \nT_PART               P2    PARTITION                          \nT_PART               P3    PARTITION                          \n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 7: Oracle 23ai &#8211; still no stats for incremental=true and no extended syntax used<\/em><\/strong><\/p>\n\n\n\n<p>As we can see, no statistics are collected online in case I don&#8217;t specify the partition name (or the key using <strong>PARTITION FOR<\/strong>) explicitly. Obviously, we have a bug in the documentation or a bug in the implementation if it is really supposed to work now.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>I think it&#8217;s great that Oracle continues to improve this valuable feature over several major releases. <\/p>\n\n\n\n<p>Although I don&#8217;t think you should always rely on the online gathering of index statistics during bulk loads, it could be useful in some load scenarios, mainly for one-off loads. For loading large amount of data on a regular basis it can be better to set the indexes unusable or drop them and then recreate\/rebuild after loading data.<\/p>\n\n\n\n<p>I think it is very important that the restriction about incomplete column lists is lifted, because it could lead to hard-to-find bugs. It is a best practice to always provide an explicit column list doing an INSERT, both in INSERT as well as in the SELECT part of it. So, if you were to extend the table but not yet the INSERT statements, everything would still work but the statistics would no longer be collected online. This has now been resolved.<\/p>\n\n\n\n<p>I was pleasantly surprised that MERGE with APPEND-Hint will also gather statistics online if inserting into an empty segment. It is somehow consistent and what one would expect, I think. Nevertheless, you can only benefit from it once &#8211; only if the segment is still empty. After that no online statistics gathering will take place. On the other hand, if you yourself ensure that the target table is emptied each time beforehand, than you don&#8217;t really need MERGE since WHEN MATCHED THEN UPDATE doesn&#8217;t make sense on empty table and simple INSERT will do the work.<\/p>\n\n\n\n<p>To be honest, i think that lifting of the restriction concerning loading data into partitions with incremental property set to TRUE was just removing it from the docs. But it seems still to be there.<\/p>\n\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-2925154690547867\" crossorigin=\"anonymous\"><\/script><ins class=\"adsbygoogle\" style=\"display:block; text-align:center;\" data-ad-client=\"ca-pub-2925154690547867\" \ndata-ad-slot=\"2727403138\" \ndata-ad-layout=\"in-article\"\ndata-ad-format=\"fluid\"><\/ins>\n<script> \n(adsbygoogle = window.adsbygoogle || []).push({}); \n<\/script>\n\n\n<div class=\"crp-list-container\"><h3 class=\"crp-list-title\">Related Posts<\/h3><ul class=\"crp-list\"><li class=\"crp-list-item crp-list-item-image-none\"><div class=\"crp-list-item-title\"><a href=\"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-for-etl-part-1\/\" target=\"_blank\">Online Statistics Gathering for ETL &#8211; Part 1<\/a><\/div><\/li><li class=\"crp-list-item crp-list-item-image-none\"><div class=\"crp-list-item-title\"><a href=\"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-for-etl-part-2\/\" target=\"_blank\">Online Statistics Gathering for ETL &#8211; Part 2<\/a><\/div><\/li><li class=\"crp-list-item crp-list-item-image-none\"><div class=\"crp-list-item-title\"><a href=\"https:\/\/blog.sqlora.com\/en\/online-statistics-gathering-for-etl-part-3\/\" target=\"_blank\">Online Statistics Gathering for ETL \u2013 Part 3<\/a><\/div><\/li><\/ul><\/div>","protected":false},"excerpt":{"rendered":"<p>Online Statistics Gathering has been introduced in Oracle 12c. I think it is a very important building block to establish a consistent approach in dealing with optimizer statistics in your ETL jobs. I have already written some posts about it a while ago (first, second and third). When I was preparing a talk about optimizer [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2317,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10,50,71,2],"tags":[81,98,149],"class_list":["post-2214","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cbo","category-data-warehouse","category-etl","category-oracle","tag-bulk-load","tag-etl","tag-optimizer-statistics"],"_links":{"self":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/2214","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/comments?post=2214"}],"version-history":[{"count":32,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/2214\/revisions"}],"predecessor-version":[{"id":2324,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/2214\/revisions\/2324"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/media\/2317"}],"wp:attachment":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/media?parent=2214"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/categories?post=2214"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/tags?post=2214"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}