Monday, September 28, 2009

CRUD with the Oracle MERGE statement

Most of this post came about after a lengthy search of the clicker-web for information on Oracle materialized views. There are three or fifty situations at work where such a construct might be helpful to performance and overall health of the system (like when five different folks are running the same query over and over and over again).

Most situations that could benefit from a materialized view are "complex materialized views" according to Oracle manuals and guide (i.e. I am doing something more complicated than "hello world" in SQL). Others have documented that the materialized view update is really a MERGE behind the scenes. I am quite familiar with a MERGE and that was actually my first choice before I thought to myself "there is probably something already created for this complicated crap". Why recreate the square wheel, right? In this case the square wheel might be more efficient in the long run.

First, I will create a quick table and add some constraints to it:

CREATE TABLE table_full_of_crud AS
SELECT
LEVEL AS id
, TRUNC(SYSDATE,'YEAR')+(LEVEL-1) AS some_stupid_date
, CASE MOD(LEVEL,7) WHEN 0 THEN NULL ELSE TO_CHAR(TRUNC(SYSDATE,'YEAR')+(LEVEL-1),'DAY"," MONTH D YYYY') END AS varchar_crud
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 120;
ALTER TABLE table_full_of_crud
ADD (CONSTRAINT pk_table_full_of_crud PRIMARY KEY(id))
MODIFY (some_stupid_date NOT NULL);

Here is what I consider to be a well-formed, structured MERGE that will handle new rows, updated rows and rows that need to be deleted:

MERGE /*+ append */ INTO table_full_of_crud  m
USING (
SELECT
NVL(v1.id,t1.id) AS id
, NVL(v1.some_stupid_date,TRUNC(SYSDATE)-12345) AS some_stupid_date
, v1.varchar_crud
FROM table_full_of_crud t1
FULL JOIN (
SELECT * FROM (
SELECT
LEVEL AS id
, TRUNC(SYSDATE,'YEAR')+(LEVEL-1) AS some_stupid_date
, CASE MOD(LEVEL,20)
WHEN 0 THEN NULL
WHEN 7 THEN TO_CHAR(TRUNC(SYSDATE,'YEAR')+(LEVEL-1),'DAY"," MONTH D YYYY')
ELSE TO_CHAR(TRUNC(SYSDATE,'YEAR')+(LEVEL-1),'MM/DD/YYYY')
END AS varchar_crud
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 160
)
WHERE MOD(id,69) > 0
) v1 ON v1.id = t1.id
) u
ON (m.id = u.id)
WHEN MATCHED THEN
UPDATE SET
m.some_stupid_date = u.some_stupid_date
, m.varchar_crud = u.varchar_crud
WHERE NOT(
(m.some_stupid_date = u.some_stupid_date)
AND DECODE(m.varchar_crud,u.varchar_crud,'EQ') IS NOT NULL
)
DELETE
WHERE m.some_stupid_date = (TRUNC(SYSDATE)-12345)
AND m.varchar_crud IS NULL
WHEN NOT MATCHED THEN
INSERT (m.id, m.some_stupid_date, m.varchar_crud)
VALUES (u.id, u.some_stupid_date, u.varchar_crud);

Here is a line by line explanation of all this crap:

  1. The start of the MERGE statement, with append hint (direct path, etc.) INTO table; I use a table alias of "m" to indicate this is the table having data merged into it (makes sense)
  2. USING; we will be using a subquery
  3. SELECT (our subquery)
  4. The NVL of the primary key is required due to the FULL [OUTER] JOIN; if the row needs deleted then the merge data primary key will be NULL, if inserted the original primary key will be NULL, if update neither will be NULL
  5. The NVL of this column is due to the NOT NULL constraint and to use as our "delete trigger"; "delete triggers" should be a value that will never appear in the column
  6. VARCHAR2 column
  7. FROM the table that will be the target of the merge;
  8. FULL [OUTER] JOIN is used to return the rows that exist in either table
  9. SELECT the merge data; this subquery will contain new rows and rows that might need updating in the merge target table
  10. data
  11. data
  12. data
  13. data
  14. data modify some rows to null values
  15. data keep some rows the same
  16. data modify the rest
  17. data
  18. data
  19. data
  20. data
  21. data delete the 69th key value (remove it from the subquery)
  22. data
  23. End of using subquery; assign table alias of "u" (again, makes sense)
  24. ON here is where (usually) primary key values are matched back to each other and influence the next two sections; what to do when things do or do not match
  25. If the condition above evaluates to TRUE then the "WHEN MATCHED THEN" section is performed
  26. UPDATE the target table
  27. SET column value
  28. SET column value
  29. I use the WHERE clause of the merge update clause to determine if the row columns have changed or not; if they have NOT changed then why have the undo and redo overhead (???)
  30. If a column cannot be NULL then a simple comparison will work
  31. If a column can be NULL then DECODE can be used on both columns in the comparison
  32. End of NOT(
  33. DELETE from the target table
  34. The WHERE clause check for the "delete trigger(s)" in (TRUNC(SYSDATE)-12345)
  35. AND check for null columns (the FULL JOIN will produce NULL values in columns without corresponding rows in the merge data)
  36. The "WHEN NOT MATCHED THEN" is performed when merge data cannot be matched to rows in the merge target
  37. So INSERT (columns)
  38. With new VALUES (from the merge subquery data)

The USING sub-query will almost always contain a complex piece of SQL to collect new and updated rows. The handle deletes of rows in the merge target table it needs to be joined to the query data with a FULL OUTER JOIN. If deletes are not needed, then the FULL JOIN is not needed. Here is a visual representation of the FULL OUTER JOIN:

ID1 X1   Y1     ID2 X2   Y2
1 data data 1 data data
2 data data 2 data data
3 data data 4 data data
4 data data 5 data data
5 data data 6 data data

After FULL JOIN

ID1  ID2  X2   Y2
1 1 data data
2 2 data data
3 NULL NULL NULL
4 4 data data
5 5 data data
NULL 6 data data

The non-primary key columns in rows that need to be deleted will be NULL. This can be used in the topmost section of the USING sub-query (via NVL) to create delete trigger data, which brings me to the DELETE mechanism.

The hardest part of the above statement was getting the DELETE clause to work. Reading the Oracle 10g Database SQL Reference helped somewhat; especially this point:

The only rows affected by this clause are those rows in the destination table that are updated by the merge operation. That is, the DELETE WHERE condition evaluates the updated value, not the original value that was evaluated by the UPDATE SET ... WHERE condition. If a row of the destination table meets the DELETE condition but is not included in the join defined by the ON clause, then it is not deleted.

This means the DELETE does not use column data in the USING clause sub-query; very important. Some column in the UPDATE must be updated for the DELETE to work. To handle this set one or more columns to values that would never appear (in combination) in the data and use those values as a delete trigger.

I always add a WHERE clause to the UPDATE to determine if anything has changed. If nothing has changed then why generate the undo and redo data? Simply compare non-null columns and then null-evaluated columns to determine if data have changed.

The INSERT clause is the same as the INSERT statement we all know and love.

One discussion I came across on the Internet was about using a UNION ALL instead of the FULL OUTER JOIN. The UNION ALL performs better (one full index scan) than the FULL JOIN (two index scans and then a UNION-ALL). UNION ALL is a set operator, however, and updated rows would be duplicated (one unique row from the merge table and one from the query data). Something could probably be written but the FULL JOIN is simple, easy to maintain and worth whatever fractional overhead (compared to the actual merge) incurred.

No comments: