{"id":2398,"date":"2025-01-05T18:44:20","date_gmt":"2025-01-05T16:44:20","guid":{"rendered":"https:\/\/blog.sqlora.com\/en\/?p=2398"},"modified":"2025-01-06T16:45:20","modified_gmt":"2025-01-06T14:45:20","slug":"when-compression-expands-the-hidden-pitfalls-of-hcc","status":"publish","type":"post","link":"https:\/\/blog.sqlora.com\/en\/when-compression-expands-the-hidden-pitfalls-of-hcc\/","title":{"rendered":"When Compression Expands: The Hidden Pitfalls of HCC"},"content":{"rendered":"\n<p>Hybrid Columnar Compression (HCC) in Oracle Exadata is a prime example of how physical infrastructure can significantly enhance the efficiency of data management and analysis\u2014when used correctly. This is an area that also affects developers and data engineers: what it is, how it works, and what to expect. If HCC is used in your project, then everyone on the team should be aware of it, including the specifics of what, where, and how it is implemented. Otherwise, there may be a few surprises, as in the example I encountered recently: instead of achieving the expected storage savings and improved performance, there was a huge increase in space consumption, along with negative performance effects. Read on to find out what happened.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>There are a couple of good sources to start with learning about Hybrid Columnar Compression. Just to name a few:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.oracle.com\/a\/ocom\/docs\/database\/hybrid-columnar-compression-brief.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Oracle White Paper<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/jonathanlewis.wordpress.com\/category\/oracle\/exadata\/hcc\/\" target=\"_blank\" rel=\"noreferrer noopener\">A couple of excellent notes by Jonathan Lewis<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/link.springer.com\/book\/10.1007\/978-1-4302-6242-8\" target=\"_blank\" rel=\"noreferrer noopener\">Book &#8220;Expert Oracle Exadata&#8221;<\/a><\/li>\n<\/ul>\n\n\n\n<p>I assume you already have a basic understanding of how HCC works. To follow, you should at least understand that multiple consecutive data blocks make up a Compression Unit (CU). Multiple rows are stored compressed in a CU and the storage is by column (column-major) in contrast to otherwise typical row-major storage in Oracle database. All columns belonging to one row are always completely contained in a single CU.<\/p>\n\n\n\n<p>Let&#8217;s first start with creating an uncompressed table using sample data set. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; create table hcc_test_raw nocompress as\nselect lpad(rownum,10, &#039;0&#039;) id\n,      lpad(dbms_random.value,1000, &#039;x&#039;) text\nfrom    dual connect by level &lt;= 100000\n\nTable created.\n\nSQL&gt; select  t.num_rows\n,       t.avg_row_len\n,       t.blocks\n,       round(SUM(bytes)\/1024\/1024) AS actual_size_mb\n,       sum(s.extents)\nfrom    user_tables t\njoin    user_segments s on s.segment_name = t.table_name\nwhere   table_name = &#039;HCC_TEST_RAW&#039;\ngroup by t.num_rows\n,        t.avg_row_len\n,        t.compress_for\n,        t.blocks\n\n  NUM_ROWS AVG_ROW_LEN     BLOCKS  ACTUAL_SIZE_MB SUM(S.EXTENTS)\n---------- ----------- ----------  -------------- --------------\n    100000        1012      14436             116             29\n1 row selected.\n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 1: Introducing the test dataset and an uncompressed table for comparison<\/em><\/strong><\/p>\n\n\n\n<p>Now let&#8217;s take the same data set but create the table using <strong>COMPRESS FOR QUERY HIGH<\/strong> this time (Listing 2).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; create table hcc_test compress for query high as\nselect lpad(rownum,10, &#039;0&#039;) id\n,      lpad(dbms_random.value,1000, &#039;x&#039;) text\nfrom    dual connect by level &lt;= 100000\n\nTable created.\n\nSQL&gt; select  t.num_rows\n,       t.avg_row_len\n,       t.blocks\n,       t.compress_for\n,       round(SUM(bytes)\/1024\/1024) AS actual_size_mb\n,       sum(s.extents)\nfrom user_tables t\njoin    user_segments s on s.segment_name = t.table_name\nwhere table_name = &#039;HCC_TEST&#039;\ngroup by t.num_rows\n,       t.avg_row_len\n,       t.compress_for\n,       t.blocks\n\n  NUM_ROWS AVG_ROW_LEN     BLOCKS COMPRESS_FOR ACTUAL_SIZE_MB SUM(S.EXTENTS)\n---------- ----------- ---------- ------------ -------------- --------------\n    100000        1012        344 QUERY HIGH                4              1\n1 row selected.\n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 2: The same data set in a table which is compressed FOR QUERY HIGH<\/em><\/strong><\/p>\n\n\n\n<p>Very good space saving from 29 to just one extent, right? Using a <a href=\"https:\/\/jonathanlewis.wordpress.com\/2012\/08\/07\/compression-units-4\/\" target=\"_blank\" rel=\"noreferrer noopener\">script provided by Jonathan Lewis<\/a> we can show that we have 84 Compression Units (CU) mostly consisting of four consecutive blocks and most of them containing 1200+ rows. Take into account that <strong>dbms_rowid.rowid_block_number<\/strong> reports the header block of the CU in case of HCC and the gap to the next BLOCK_NO are the blocks belonging to the same CU.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nbreak on report\ncompute sum of count(*) on report\ncompute count of block_no on report\n\nselect\n   dbms_rowid.rowid_relative_fno(rowid) file_no,\n   dbms_rowid.rowid_block_number(rowid) block_no,\ncount(*)\nfrom hcc_test\ngroup by\n   dbms_rowid.rowid_relative_fno(rowid),\n   dbms_rowid.rowid_block_number(rowid)\norder by 1,2;\n\n   FILE_NO   BLOCK_NO   COUNT(*)\n---------- ---------- ----------\n       120       1034       1035\n       120       1037       1210\n       120       1041       1209\n       120       1045       1208\n       120       1049       1208\n       120       1053       1207\n\n   ...\n\n       120       1360       1208\n       120       1364        793\n           ----------           \ncount              84           \nsum                       100000\n<\/pre><\/div>\n\n\n<p><em><strong>Listing 3: Distribution of the rows to the Compression Units <\/strong><\/em><\/p>\n\n\n\n<p><\/p>\n\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-2925154690547867\" crossorigin=\"anonymous\"><\/script><ins class=\"adsbygoogle\" style=\"display:block; text-align:center;\" data-ad-client=\"ca-pub-2925154690547867\" \ndata-ad-slot=\"2727403138\" \ndata-ad-layout=\"in-article\"\ndata-ad-format=\"fluid\"><\/ins>\n<script> \n(adsbygoogle = window.adsbygoogle || []).push({}); \n<\/script>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">HCC using conventional INSERT <\/h2>\n\n\n\n<p>When HCC was introduced in Oracle 11.2 and Exadata V2, we learned that rows only get compressed when written via direct path writes. Later, however, there was a change: starting with Oracle Database 12.2 also conventional <strong>INSERT SELECT<\/strong> without <strong>APPEND<\/strong> hint <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/12.2\/newft\/new-features.html#GUID-62E7B06E-3010-48E0-BACC-1DFABE346213\" target=\"_blank\" rel=\"noreferrer noopener\">will use HCC.<\/a> This sounds great if the use case is to add some data later to a segment which is compressed using HCC, like shown in Listing 4. Using the function <strong>get_compression_type<\/strong> from the package <strong>DBMS_COMPRESSION<\/strong> provided by Oracle we can prove that all rows in the table are compressed using FOR QUERY HIGH setting even though we have not used direct path insert.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; insert into hcc_test (id, text)    \nselect lpad(rownum,10, &#039;0&#039;) id\n,      lpad(dbms_random.value,1000, &#039;x&#039;) text\nfrom    dual connect by level &lt;= 10000\n10000 rows created.\nSQL&gt; commit\nCommit complete.\nSQL&gt; select  count(*),\n        CASE DBMS_COMPRESSION.get_compression_type (user, &#039;HCC_TEST&#039;, rowid)\n            WHEN 1  THEN &#039;COMP_NOCOMPRESS&#039;\n            WHEN 2  THEN &#039;COMP_FOR_OLTP&#039;\n            WHEN 4  THEN &#039;COMP_FOR_QUERY_HIGH&#039;\n            WHEN 8  THEN &#039;COMP_FOR_QUERY_LOW&#039;\n            WHEN 16 THEN &#039;COMP_FOR_ARCHIVE_HIGH&#039;\n            WHEN 32 THEN &#039;COMP_FOR_ARCHIVE_LOW&#039;\n            WHEN 64 THEN &#039;COMP_BLOCK&#039;\n       END AS compression_type\n  from hcc_test \n  group by CASE DBMS_COMPRESSION.get_compression_type (user, &#039;HCC_TEST&#039;, rowid)\n            WHEN 1  THEN &#039;COMP_NOCOMPRESS&#039;\n            WHEN 2  THEN &#039;COMP_FOR_OLTP&#039;\n            WHEN 4  THEN &#039;COMP_FOR_QUERY_HIGH&#039;\n            WHEN 8  THEN &#039;COMP_FOR_QUERY_LOW&#039;\n            WHEN 16 THEN &#039;COMP_FOR_ARCHIVE_HIGH&#039;\n            WHEN 32 THEN &#039;COMP_FOR_ARCHIVE_LOW&#039;\n            WHEN 64 THEN &#039;COMP_BLOCK&#039;\n       END\n\n  COUNT(*) COMPRESSION_TYPE     \n---------- ---------------------\n    110000 COMP_FOR_QUERY_HIGH  \n1 row selected.\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 4: Adding more data via conventional INSERT: also new data is compressed<\/em><\/strong><\/p>\n\n\n\n<p>This looks fine so far, but what if our use case involves not just adding new data, but completely replacing the table&#8217;s data? And for some reason we would not use <strong>TRUNCATE (DROP STORAGE)<\/strong> but a simple <strong>DELETE<\/strong> for that? Would HCC reuse the existing empty blocks? Let&#8217;s test.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Recreate the table from scratch first as in Listing 2\nSQL&gt; delete from hcc_test\n100000 rows deleted.\nSQL&gt; insert into hcc_test (id, text)    \nselect lpad(rownum,10, &#039;0&#039;) id\n,      lpad(dbms_random.value,1000, &#039;x&#039;) text\nfrom    dual connect by level &lt;= 100000\n100000 rows created.\nSQL&gt; commit\nCommit complete.\nSQL&gt; begin\n  dbms_stats.gather_table_stats(user,&#039;HCC_TEST&#039;);\nend;\n PL\/SQL procedure successfully completed.\n\nSQL&gt; select  t.num_rows\n,       t.avg_row_len\n,       t.blocks\n,       t.compress_for\n,       round(SUM(bytes)\/1024\/1024) AS actual_size_mb\n,       sum(s.extents)\nfrom user_tables t\njoin    user_segments s on s.segment_name = t.table_name\nwhere table_name = &#039;HCC_TEST&#039;\ngroup by t.num_rows\n,       t.avg_row_len\n,       t.compress_for\n,       t.blocks\n\n  NUM_ROWS AVG_ROW_LEN     BLOCKS COMPRESS_FOR  ACTUAL_SIZE_MB SUM(S.EXTENTS)\n---------- ----------- ---------- ------------- -------------- --------------\n    100000        1012       1016 QUERY HIGH                 8              2\n1 row selected.\n\n-- Repeat and check the segment size:\n...\n  NUM_ROWS AVG_ROW_LEN     BLOCKS COMPRESS_FOR  ACTUAL_SIZE_MB SUM(S.EXTENTS)\n---------- ----------- ---------- ------------- -------------- --------------\n    100000        1012       1520 QUERY HIGH                12              3\n\n-- Repeat and check the segment size:\n...\n  NUM_ROWS AVG_ROW_LEN     BLOCKS COMPRESS_FOR  ACTUAL_SIZE_MB SUM(S.EXTENTS)\n---------- ----------- ---------- ------------- -------------- --------------\n    100000        1012       2024 QUERY HIGH                16              4\n\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 5: Conventional INSERT is populating new extents each time<\/em><\/strong><\/p>\n\n\n\n<p>Each time we repeated the <strong>DELETE<\/strong> and <strong>INSERT SELECT<\/strong> pairs, the segment grew by one extent (or 4 Mb).  Repeating the exercise from the Listing 3 we will see another picture:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n   FILE_NO   BLOCK_NO   COUNT(*)\n---------- ---------- ----------\n       610      37120          1\n       610      37121       1208\n       610      37125       1086\n       610      37127          1\n       610      37129        123\n       610      37130       1208\n       610      37135          1\n       610      37136        862\n       610      37159       1208\n       610      37164       1208\n       610      37167          1\n       610      37168       1086\n      \n...\n\n       625      38839          1\n       625      38847          1\n       649      21865       1208\n       649      21871          1\n       649      21924       1084\n       649      21950        125\n           ----------           \ncount             165           \nsum                       100000\n\n165 rows selected.\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 6: The distribution of rows to CU&#8217;s is not that uniform after conventional INSERT<\/em><\/strong><\/p>\n\n\n\n<p>The behavior of how the CU&#8217;s are created when using conventional INSERT SELECT is slightly different: we get significantly more CU&#8217;s (about twice as many), some of them consist of only one block and contain fewer rows (some even only one). But that&#8217;s not the main problem here. After each repetition we see completely different block numbers. They are not reused!<br><br>Okay, you may, of course, ask, why would one implement the approach like this? Well, using DELETE instead of TRUNCATE before the INSERT &#8211; without the HCC aspect &#8211; can be inefficient, but not the biggest sin. It can even be desired from the data availability point of view. Only through this interaction with HCC does it become a disaster. If this is done by two people: one (a data engineer) implements ETL in this way and the other (a DBA) enables compression later? That&#8217;s why it&#8217;s important to talk to each other.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Materialized View Trap<\/h2>\n\n\n\n<p>And now to the real-world example I mentioned at the beginning. Another possibility where this behavior can be overlooked very quickly is the refresh of materialized views. If the parameter <strong>atomic_refresh=true<\/strong> is used (this is a default), this is exactly the case: the materialized view is refreshed by a combination of <strong>DELETE<\/strong> and <strong>INSERT<\/strong> in the same transaction. Here I have often experienced that the parameter is not changed from the default, even if the requirement for the visibility of the data does not exist. Consider following example:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSQL&gt; create materialized view mv_hcc_test as \nselect * from hcc_test\nMaterialized View created.\n\nSQL&gt; select  t.num_rows, t.avg_row_len, t.compress_for\n,       round(SUM(bytes)\/1024\/1024) AS actual_size_mb\n,       sum(s.extents)\nfrom    user_tables t\njoin    user_segments s on s.segment_name = t.table_name\nwhere table_name = &#039;MV_HCC_TEST&#039;\ngroup by t.num_rows, t.avg_row_len, t.compress_for\n\n  NUM_ROWS AVG_ROW_LEN COMPRESS_FOR  ACTUAL_SIZE_MB SUM(S.EXTENTS)\n---------- ----------- ------------- -------------- --------------\n    100000        1012 QUERY HIGH                 4              1\n1 row selected.\n\nSQL&gt; begin\n  dbms_mview.refresh(&#039;MV_HCC_TEST&#039;,method=&gt;&#039;C&#039;);\nend;\n PL\/SQL procedure successfully completed.\n\nSQL&gt; select  t.num_rows, t.avg_row_len, t.compress_for\n,       round(SUM(bytes)\/1024\/1024) AS actual_size_mb\n,       sum(s.extents)\nfrom    user_tables t\njoin    user_segments s on s.segment_name = t.table_name\nwhere table_name = &#039;MV_HCC_TEST&#039;\ngroup by t.num_rows, t.avg_row_len, t.compress_for\n\n  NUM_ROWS AVG_ROW_LEN COMPRESS_FOR  ACTUAL_SIZE_MB SUM(S.EXTENTS)\n---------- ----------- ------------- -------------- --------------\n    100000        1012 QUERY HIGH                 8              2\n1 row selected.\n<\/pre><\/div>\n\n\n<p><strong><em>Listing 7: After refresh with the same data the segment size has doubled<\/em><\/strong><\/p>\n\n\n\n<p>Consider the following scenario: As a data engineer, you have created a new materialized view with atomic refresh which should be refreshed on a daily basis. You didn&#8217;t even think about any compression at first. But what you didn&#8217;t know: your DBA colleague has set the COMPRESS FOR QUERY HIGH setting for the entire tablespace. After a year of regularly refreshing your materialized view you will consume 365 times the needed size! And this will also have an impact on performance: all full table scans have to deal with a segment 365 times larger than necessary.<\/p>\n\n\n\n<p>What can you do? Check if atomic refresh is actually required, consider using other methods like out-of-place MV refresh, schedule a regular re-compressing of the table and so on.  Know the implications and talk to each other!<\/p>\n\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-2925154690547867\" crossorigin=\"anonymous\"><\/script><ins class=\"adsbygoogle\" style=\"display:block; text-align:center;\" data-ad-client=\"ca-pub-2925154690547867\" \ndata-ad-slot=\"2727403138\" \ndata-ad-layout=\"in-article\"\ndata-ad-format=\"fluid\"><\/ins>\n<script> \n(adsbygoogle = window.adsbygoogle || []).push({}); \n<\/script>\n","protected":false},"excerpt":{"rendered":"<p>Hybrid Columnar Compression (HCC) in Oracle Exadata is a prime example of how physical infrastructure can significantly enhance the efficiency of data management and analysis\u2014when used correctly. This is an area that also affects developers and data engineers: what it is, how it works, and what to expect. If HCC is used in your project, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2432,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[50,71,151,2],"tags":[152,153,154,155,156],"class_list":["post-2398","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-warehouse","category-etl","category-exadata","category-oracle","tag-exadata","tag-hcc","tag-hybrid-columnar-compression","tag-materialized-view","tag-mv"],"_links":{"self":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/2398","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/comments?post=2398"}],"version-history":[{"count":40,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/2398\/revisions"}],"predecessor-version":[{"id":2443,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/posts\/2398\/revisions\/2443"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/media\/2432"}],"wp:attachment":[{"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/media?parent=2398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/categories?post=2398"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.sqlora.com\/en\/wp-json\/wp\/v2\/tags?post=2398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}