Thursday, October 29, 2009

Oracle Data Pump for data refreshes part 3

Part one and part two.

This part gives you the UNIX scripts and SQL that seem to work best in the environments I work in. the standard disclaimers apply. This likely is not the best way of going about this (or it might be) and I provide zero support for this. If it works for you, fantastic; send me some beer. If it frustrates you to no end because most of it works but you constantly find little annoyances and Oracle manuals are absolutely no help; welcome to my world.

The following UNIX steps assume the user on the server has appropriate privileges to execute Oracle binaries and has the environment set correctly.

prod_expdp.sh

This shell script sets the ORACLE_SID environment variable and then executes the data pump export utility twice; once for data and once for sequence metadata.

#!/bin/sh
# prod_expdp.sh - run datapump exports for a complete refresh of PROD to TEST
#
ORACLE_SID=prod
expdp test_schema/testing PARFILE=prod_expdata.par
expdp test_schema/testing PARFILE=prod_expmeta.par

prod_expdata.par

The next two files are parameter files used by the export. The contents of the first file, for the data extract, is self-explanatory. The PARALLEL parameter was not used but included here because you will want some level of parallelism for optimal extract performance. The %U on the DUMPFILE parameter assigns a unique number to each extract file for parallel processing (4 files in this example). This parameter file should contain any EXCLUDE statements for tables that do not need to be exported.

DIRECTORY=test_dp_folder
DUMPFILE=prod_data_%U.dmp
LOGFILE=prod_data_exp.log
CONTENT=DATA_ONLY
SCHEMAS=test_schema
PARALLEL=4

prod_expmeta.par

The second parameter file is for sequence metadata. Why two exports? I have two reasons why, in my opinion, there should be two. The main reason was to separate metadata (small) from table data (massive). You cannot do a metadata and data import in the same import job without doing all objects and I did not want the second step processing a gigantic data extract file just to get the microscopic slice of metadata. Second reason, INCLUDE and EXCLUDE are exclusive; only one can be used per execution and most extracts will want to exclude table data (if any).

DIRECTORY=test_dp_folder
DUMPFILE=prod_meta_%U.dmp
LOGFILE=prod_meta_exp.log
CONTENT=METADATA_ONLY
SCHEMAS=test_schema
INCLUDE=SEQUENCE

Extracted files will need to be relocated to the test server.

test_impdp.sh

This shell script sets the ORACLE_SID environment variable, executes SQL*Plus to build some SQL, executes the data pump import utility twice - once for data and once for sequence metadata, and then executes SQL*Plus to do post import SQL and invalid object compiles.

#!/bin/sh
# test_impdp.sh - run pre/post SQL create SQL, datapump imports for a complete refresh of PROD to TEST
#
ORACLE_SID=test
sqlplus -s test_schema/testing @test_impsql.sql
impdp test_schema/testing PARFILE=test_impdata.par
impdp test_schema/testing PARFILE=test_impmeta.par
sqlplus -s test_schema/testing @test_imp_post.sql
sqlplus -s test_schema/testing @compile_invalid.sql
sqlplus -s test_schema/testing @compile_invalid.sql
sqlplus -s test_schema/testing @compile_invalid.sql

test_impdata.par

The data import parameter file is similar to the data export parameter file but with the extra TABLE_EXISTS_ACTION parameter instructing the data pump to truncate (empty) the table if it exists.

DIRECTORY=test_dp_folder
DUMPFILE=prod_data_%U.dmp
LOGFILE=prod_data_imp.log
CONTENT=DATA_ONLY
SCHEMAS=test_schema
PARALLEL=4
TABLE_EXISTS_ACTION=TRUNCATE

test_impmeta.par

The metadata import parameter file is self-explanatory.

DIRECTORY=test_dp_folder
DUMPFILE=prod_meta_%U.dmp
LOGFILE=prod_meta_imp.log
CONTENT=METADATA_ONLY
SCHEMAS=test_schema

test_impsql.sql

This SQL script is where all the magic happens. This is where all of the pain-in-the-ass, that is damned annoying, why oh why Oracle do you punish us you gigantic corporate fudge-packers, gets taken care of. This script dynamically builds two SQL scripts: a pre-import script and a post-import script. All enabled triggers and referential constraints (foreign keys) must be queries so they can be disabled and then enabled. Do not enable a constraint or trigger that was disabled; it was likely disabled for a reason likely by some developer behind your back. The script must also query all sequences so they can be dropped before import. The data pump import does not have a parameter or option to replace an existing sequence, so sequences must be dropped pre-import. Dropping the sequence causes two problems: privileges granted to those sequences will be lost and all source types containing the sequence will become invalid. The post-import SQL must grant privileges so this SQL script will query existing privileges (the schema owner must be able to query the all_tab_privs view).

SET FEEDBACK OFF TERMOUT OFF TRIMSPOOL ON VERIFY OFF PAGESIZE 0 LINESIZE 200
SPOOL test_imp_pre.sql
SELECT 'ALTER TRIGGER '||trigger_name||' DISABLE;'
FROM user_triggers
WHERE base_object_type = 'TABLE' AND status = 'ENABLED'
ORDER BY trigger_name;
SELECT 'ALTER TABLE '||table_name||' DISABLE CONSTRAINT '||constraint_name||';'
FROM user_constraints
WHERE constraint_type = 'R' AND status = 'ENABLED'
ORDER BY table_name;
SELECT 'DROP SEQUENCE '||sequence_name||';'
FROM user_sequences
ORDER BY sequence_name;
SPOOL OFF
SPOOL test_imp_post.sql
SELECT 'ALTER TRIGGER '||trigger_name||' ENABLE;'
FROM user_triggers
WHERE base_object_type = 'TABLE' AND status = 'ENABLED'
ORDER BY trigger_name;
SELECT 'ALTER TABLE '||table_name||' ENABLE CONSTRAINT '||constraint_name||';'
FROM user_constraints
WHERE constraint_type = 'R' AND status = 'ENABLED'
ORDER BY table_name;
SELECT 'GRANT '||privilege||' ON '||object_name||' TO '||grantee||';'
FROM user_objects
JOIN all_tab_privs ON table_schema = USER AND table_name = object_name
WHERE object_type = 'SEQUENCE'
ORDER BY object_name;
SPOOL OFF
@test_imp_pre.sql
QUIT

compile_invalid.sql

The generic SQL script will query invalid source objects (made invalid by the sequence being dropped), create SQL to compile those invalid source objects, and then execute that SQL. This generic script will need to be executed at least once but up to six times due to objects being invalidated when an object is compiled. For example: if procedure A calls procedure B that uses sequence C, both A and B will be invalid when the sequence is dropped. If A then B is compiled, A will be valid until B is compiled then will become invalid. The second execution of the compile script will compile A and since nothing depends upon A we are done.

SET FEEDBACK OFF TERMOUT OFF TRIMSPOOL ON VERIFY OFF PAGESIZE 0 LINESIZE 200
SPOOL compile_invalid_pass.sql
SELECT 'ALTER '||DECODE(SUBSTR(object_type,1,7),'PACKAGE','PACKAGE',object_type)||' '||object_name||' COMPILE'||DECODE(object_type,'PACKAGE BODY',' BODY;','PACKAGE',' PACKAGE;',';')
FROM user_objects
WHERE object_type IN ('TRIGGER','PACKAGE BODY','PACKAGE','FUNCTION','PROCEDURE') AND status = 'INVALID'
ORDER BY object_name, object_type;
SPOOL OFF
@compile_invalid_pass.sql

Sequence of Events

  1. run export script
    1. export table data
    2. export metadata for sequences
  2. move exported data to test server (if not on the production server)
  3. run import script
    1. run SQL to create pre and post-import SQL
      1. query referential constraints that need to be disabled
      2. query triggers that need to be disabled
      3. query sequences that need to be dropped
      4. query referential constraints that need to be enabled
      5. query triggers that need to be enabled
      6. query privileges to sequences that need to be granted on sequences that were dropped and then imported
      7. run the pre-import SQL
        1. disable referential constraints
        2. disable triggers
        3. drop sequences
    2. import table data
    3. import metadata for sequences
    4. run the post-import SQL
      1. enable referential constraints
      2. enable triggers
      3. grant privileges on sequences to users
    5. run compile SQL
      1. query for invalid objects
      2. compile source types made invalid after the sequence drop
    6. run compile SQL
      1. query for invalid objects
      2. compile source types made invalid after the last compile
    7. run compile SQL
      1. query for invalid objects
      2. compile source types made invalid after the last compile

You are probably thinking to yourself “holy fudging fudge, all of that bullfrog for a stupid little sequence?” I know I am. Hundreds of other individuals are thinking the same thing. There are other ways of handling sequences. Generate DDL for the sequence and then drop and create it; but that has the same effect as above (loss of privileges and invalid objects). Select MAX() and use that value to build SQL to drop and create it; same problems as above. Select MAX(), alter the increment by property of the sequence, get the next value, then restore the increment by property. The last two would require prior knowledge of the table(s) and column(s) using the sequence. One of our production databases is a vendor package that has over forty sequences so some up-front work would be required to match sequence to column to table. The last solution is cleanest because it does not require the recompilation of source objects or restoration of privileges. The whole issue could be avoided in the value of the sequence was considered a data object instead of a metadata object.

The next part will hopefully deal with the dirty details like how much undo/redo is generated, what about performance, etc.

2 comments:

Anonymous said...

very useful. thank you

Anonymous said...

very useful indeed. :) thanks for sharing your knowledge.