MERGE and ORA-30926 – Part 2 or differences in write consistency between update and merge

In the first part we’ve seen how Oracle actually executes the Merge Statement three times if it hits ORA-30926. To understand what’s going on, we should refresh the concept behind so called “update restarts” sometimes also called “mini rollbacks”. This concept is very well explained by Tom Kyte: Part I, Part II and Part III. If you are not familiar with this “write consistency” concept, I strongly encourage you to follow this links.

Many thanks also to Sergey Markelenkov for his excellent paper on write consistency and statement restarts, unfortunately only available in russian. It is definitely worth a translation but it is 40 pages long. I will briefly present some of his observations and conclusions and how they are related to our MERGE.

UPDATE restart

We can first show how statement restart happens with UPDATE. I assumed this must work the same with UPDATE part of a MERGE statement, but it doesn’t, as we will see! Let’s go!

First we create our test table and a row trigger on it. It references no :OLD or :NEW values and will be used only to show how often it fires. We then update the row with ID=2 to ID=3 without committing this change.


--Session 1
DROP TABLE t_target;

CREATE TABLE t_target  (id NUMBER, val NUMBER);

INSERT INTO t_target (id, val) VALUES  (1,1);
INSERT INTO t_target (id, val) VALUES  (2,1);

COMMIT;

CREATE OR REPLACE TRIGGER test_trg
BEFORE UPDATE ON t_target
FOR EACH ROW
BEGIN
  DBMS_OUTPUT.PUT_LINE('Fired');
END;
/

UPDATE t_target SET id = 3 WHERE id = 2;

In another session we try to update all rows with ID less than 3. Of course, this session will block, waiting for session 1 to commit or rollback.


--Session 2

UPDATE t_target t SET  t.val = t.val + 10 WHERE t.id < 3;

But what outcome can we expect from the UPDATE in Session 2 in case we commit in Session 1? Would it update two records or just one? As we have started the UPDATE in Session 2 there were two rows with ID < 3. But as we have committed in the session 1, the second row happens not to fulfill the WHERE condition anymore.


UPDATE t_target t
SET    t.val = t.val + 10 
WHERE  t.id < 3;

Fired
Fired
Fired

1 row updated.
...
-------------------------------------------------------------------
| Id  | Operation          | Name     | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |          |      3 |        |      0 |
|   1 |  UPDATE            | T_TARGET |      3 |        |      0 |
|*  2 |   TABLE ACCESS FULL| T_TARGET |      3 |      2 |      4 |
-------------------------------------------------------------------

Here we can see the restart of our update. It has first identified the rows to be updated using consistent mode reads with respect to the start time of UPDATE. Then it try to place a lock on these rows and update them. It then gets the data block to be updated in current mode, because we want to change the most recent copy of the data. And then it checks, whether the search condition is still satisfied using the current and consistent copies of the block. If it is not satisfied anymore, the whole statement will be restarted. All changes made up to this time will be rolled back. We get new SCN and select the rows to be updated in consistent mode with respect to this new SCN, placing locks on these rows (aka SELECT FOR UPDATE). If it was not possible to lock all these rows, it will attempt it again, up to 5000 times. After all rows were locked it can then safely update them.

Back to our example: the UPDATE in session 2 has updated the first row with ID=1, the trigger fired first time. It then had to wait for the session 1 not being able to place a lock on the row with ID=2. After commit in session 1 it was possible. The trigger fired the second time. But then the check has identified that this row doesn’t satisfy the WHERE condition anymore, because the session 1 has updated the column ID to 3. A rollback was performed. A new attempt of the UPDATE starts with new SCN and can see the effects of the Session 1 in consistent mode, thus selecting only the row with ID=1 for updating. It then updates this row, the trigger fires the third time now.

We can see how Oracle shows 3 Starts in explain plan output. Also A-Rows is 4, although we have updated only one row. I think, it was incremented to two at first attempt, then to three making SELECT FOR UPDATE and then to four while actually updating the row.

We can also verify a restart is taking place by looking at session statistics before and after executing the UPDATE:

  • calls to kcmgas – we can see requests of a new SCN
  • rollback changes – undo records applied – we can see rollback without explicitly asking it
  • table scan rows gotten – will be higher as expected

For me it’s a kind of optimistic locking implemented directly in the database. Just as an application prevents lost updates comparing the queried and actual data in the database just before updating it, Oracle ensures that updates are really deterministic and are only changing the data seen at their start time. And if it was not possible, the “start time” will be redefined 😉   Joking apart, it is a very reasonable feature.

Some general notes: not only UPDATE Statements, but also DELETE, SELECT FOR UPDATE and MERGE are subject to statement restarts. The restart can take place if the transaction operates in READ COMMITTED isolation level, which is default in Oracle. If we are using SERIALIZABLE and meet the conditions for a restart then ORA-08177 can’t serialize access error will be thrown.

There is another situation, which can lead to the restart: if we have a BEFORE row trigger referencing :OLD values and these were changed since the statement began. We don’t cover that case in this post.

And what about MERGE?

But wait a minute! We actually wanted to investigate the behavior of a MERGE statement, don’t we? Does an UPDATE-part of a MERGE behave the same? I would say, it should definitely do so, but…

We’ll do the same in session 1. In session 2 we will replace the update with the merge and see what happens after the session 1 commits:

MERGE INTO t_target t
USING (select 1 from dual) q
ON    (t.id < 3) 
WHEN MATCHED THEN UPDATE SET  t.val = 0; 

Fired 
Fired 

2 rows merged. 

--------------------------------------------------------------------
| Id  | Operation            | Name     | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------
|   0 | MERGE STATEMENT      |          |      1 |        |      0 |
|   1 |  MERGE               | T_TARGET |      1 |        |      0 |
|   2 |   VIEW               |          |      1 |        |      2 |
|   3 |    NESTED LOOPS      |          |      1 |      2 |      2 |
|   4 |     FAST DUAL        |          |      1 |      1 |      1 |
|*  5 |     TABLE ACCESS FULL| T_TARGET |      1 |      2 |      2 |
--------------------------------------------------------------------

SELECT * FROM t_target;

 ID   VAL
---------- ----------
  1     0
  3     0

Obviously no restart here! Our MERGE has just updated the column VAL to 0 on a row that is not satisfying a condition ID < 3 anymore!

But what happens, if we modify our example so that session 1 will be updating the column VAL instead of ID?

--Session 1

UPDATE t_target SET val = 3 WHERE id = 2;

--Session 2

MERGE /*+ gather_plan_statistics */ INTO t_target t
USING (SELECT 10 val FROM dual) q
ON    (t.id < 3)  
WHEN MATCHED THEN UPDATE SET  t.val = t.val+q.val; 

Fired 
Fired 
Fired 
Fired 

2 rows merged. 
...
--------------------------------------------------------------------
| Id  | Operation            | Name     | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------
|   0 | MERGE STATEMENT      |          |      3 |        |      0 |
|   1 |  MERGE               | T_TARGET |      3 |        |      0 |
|   2 |   VIEW               |          |      3 |        |      6 |
|   3 |    NESTED LOOPS      |          |      3 |      2 |      6 |
|   4 |     FAST DUAL        |          |      3 |      1 |      3 |
|*  5 |     TABLE ACCESS FULL| T_TARGET |      3 |      2 |      6 |
--------------------------------------------------------------------

SELECT * FROM t_target;

 ID VAL
--- ---
  1  11
  2  13

It can be clearly seen that a restart took place. Our MERGE has updated the column VAL adding 10 to the value of 3 which in turn was the effect of the session 1 that we had committed AFTER(!) starting our merge.

And what if our session will be updating the same row twice? No update in Session 1, just

MERGE /*+ gather_plan_statistics */ INTO t_target t
USING (SELECT 2 id, 10 val FROM dual
       UNION ALL
       SELECT 2 id, 10 val FROM dual) q
ON (t.id = q.id)
WHEN MATCHED THEN UPDATE SET  t.val = t.val + q.val;
Fired
Fired
Fired

MERGE /*+ gather_plan_statistics */ INTO t_target t
                                         *
ERROR at line 1:
ORA-30926: unable to get a stable set of rows in the source tables

--------------------------------------------------------------------
| Id  | Operation            | Name     | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------
|   0 | MERGE STATEMENT      |          |      3 |        |      0 |
|   1 |  MERGE               | T_TARGET |      3 |        |      0 |
|   2 |   VIEW               |          |      3 |        |      5 |
|*  3 |    HASH JOIN         |          |      3 |      2 |      5 |
|   4 |     VIEW             |          |      3 |      2 |      6 |
|   5 |      UNION-ALL       |          |      3 |        |      6 |
|   6 |       FAST DUAL      |          |      3 |      1 |      3 |
|   7 |       FAST DUAL      |          |      3 |      1 |      3 |
|   8 |     TABLE ACCESS FULL| T_TARGET |      3 |      2 |      6 |
--------------------------------------------------------------------

First of all, the MERGE was restarted. But then ORA-30926 was thrown. A trace file is generated in which we can see what Oracle complains:

Compare value:
  cmpp            : 0x6da03ff1
  cmpl            : 2
  cmpf            : 0
  Data: < c1 02 >
Current value:
  u->kduudcol[(piececol)]    : 0x70c9bff1
  u->kduudlen[(piececol)]    : 2
  u->kduudflag[(piececol)]   : 0
  Data: < c1 0c >

So the first update changes the value of the column VAL to 11, which is “C1 0C” in internal number format. The second update realizes, that the current value “C1 0C” is now different from the value from the consistent read mode “C1 02” (internal representation of 1). My understanding is: at this point in time it is clear that ORA-30926 will be thrown. Why does Oracle attempt the same procedure with restart in this case?

Conclusions

There are two aspects we may consider while updating rows in a table, regardless whether we are using UPDATE or MERGE. The first one is: what rows will be updated? Can other DML activity, which takes place after the start of the update, affect this amount of rows? And the second one is: can we suffer from “lost updates”?

As far as my observations are correct, we can say:

  • A long running  UPDATE can see and respect changes made and committed after its start, if these will influence the WHERE condition and thus the amount of rows to update. An UPDATE will change only the rows consistently identified with respect to the SCN of update start or newly requested SCN in case it comes to restart. Or maybe it will fail at all, if it was not possible to lock all the rows after the restart. But there is no mechanism that protect us from “lost updates”.
  • A long running MERGE will update only the rows consistently identified with respect to the SCN of MERGE start. The outcome is predefined on this time point. If some transaction changes the data used in ON condition and commits, it will have no effect – the rows will be updated anyway  with the MERGE started prior to that. On the other hand, MERGE will be restarted on mismatch of the data to be updated in SET clause, thus protecting us from lost updates for the duration of a MERGE  statement.

This difference is somewhat confusing for me. Maybe it is a bug. I’ve found a discussion an AskTom, where Tom Kyte agrees and has filed a bug (2006!), which he mentions here. This seems to be a non-public bug, since I couldn’t find it on MOS.

Hdr: 5214824 Abstract: READ CONSISTENCY NOT WORKING CORRECTLY ON MERGE STATEMENT

According to the discussion on Ask Tom this bug complains the absence of a restart with ON-columns. But the behavior with restart taking place with changed SET-columns is not documented as well, as far as I can see.  The only thing I could find in the documentation was:

MERGE is a deterministic statement. You cannot update the same row of the target table multiple times in the same MERGE statement.

But this cannot explain the behavior.

Another bug of MERGE in my opinion is that Oracle is doing a statement restart even if the mismatch on data in SET-columns is caused by the same session. It will hit ORA-30926 in this case but just doing the whole overhead before throwing the error. I just can’t find any reasonable explanation why one must do this. Can you?

The whole investigation shows just one more time, how dangerous it can be to batch update the data while another DML activity is present, whether using UPDATE or MERGE. The effect can vary from consuming much more resources as expected to unpredictable and wrong results.

Also note that the mentioned feature of a MERGE, seeing the changes on columns in SET and restarting a MERGE, can in no case be seen as as substitution for some optimistic locking  concept in your application. It is about the changes which were made after your MERGE began and not for the time between you queried the data to show and edit it in your application  and then executed the MERGE.

Useful Links

Discussion on Asktom on write consistency

Another Discussion on Merge

Oracle Documentation – Merge Statement

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *