ABIS Infor - 2011-10

Oracle: Play, Rewind, Replay

Kris Van Thillo (ABIS) - 1 September 2011

Abstract

Never change a winning team, right?

But change is inevitable - change is what keeps us going, keeps us evolving as human beings! So we have to prepare for it. Make sure that when change is needed, you are ready for it.

Be prepared!

Oracle DBA's need to be prepared for many changes: to application code, to SQL statements, to database schema's, hardware environment, instance configuration, ... Oracle 11g offers a vast set of tools and techniques allowing DBA's to effectively predict and monitor the impact of 'change'.

Time to rewind ....

We all remember the ACID (Atomicity, Consistency, Isolation, Durability) test when first being introduced to relational databases and database changes through 'transactions' or 'logical units of work': Atomicity, Consistency and Isolation of database actions are key. Furthermore, all modifications need to be Durable and persistent once committed. Durability is typically ensured through the use of database backups and transaction redo logs that enable the restoration of committed transactions in spite of any subsequent software and/or hardware failures. If however we want to test the impact of change, we should be able to undo and redo transactions over and over again in controlled, yet changed, environments.

Enter 'flashback operations'.

In a nutshell, Oracle flashback operations allow for the restoration of database content relative to a specific moment in history. Crucial to flashback is the idea that no backups are used; indeed, the current, consistent, database is used to 'substract', 'remove', 'undo' committed transactions! The following operations are supported 'out-of-the-box' without any special configuration whatsoever - granted, some system configuration might be advisable:

  • the 'flashback query' feature allows for queries to retrieve the state of tables as of a given timestamp or 'moment of change' (SCN).
	select *
	from tbaccad.tutcourses
	as of timestamp systimestamp - interval '120' minute
  • the 'DBMS_FLASHBACK' procedure allows for an entire set of transactions to be executed at a given timestamp or SCN - remember 'Back to the Future'? However, you can not change 'the past'!
	execute dbms_flashback.enable_at_time(
	to_timestamp(sysdate - interval '120' minute));
  • the 'flashback table' feature restores the state of rows in a table as of a point in time in the past, but also restores the table's indexes, triggers, and constraints!
	flashback table tutcourses, tutsessions, tutenrolments
to timestamp systimestamp - interval '120' minute;
  • other 'simple' flashback operations include 'flashback drop', 'flashback transaction backout', 'flashback version query', 'flashback transaction query', and 'flashback data archive'.
  • A more complex flashback option - perhaps - is the 'flashback database' option. Indeed, once a database has been put in flashback mode, an entire database can be 'flashed back'.
	startup mount;
	alter database archivelog;     -- retain redo logs as archives
	alter database flashback on;   -- retain flashback logs
	alter database open;
	>> do something! <<
	shutdown;
	startup mount;
	flashback database 
	to timestamp sysdate-1/24;     -- lets go back one hour to get rid of something!
	alter database open resetlogs; -- oops, what a pitty!

For a thorough and complete theoretical discussion of temporal data, versioning, and version data management - time travel - the interested reader is referred to 'De tuf-tuf-club', Suske en Wiske, Studio Vandersteen (1952)).

... and start all over again!

Oracle's 'Database Replay' feature allows for realistic and efficient testing of the impact of changes by recreating the production workload on a test system while maintaining the characteristics of that same workload. The idea is as follows:

  1. capture the workload on the production system
  2. i.e. capture 'files' are generated, containing all relevant information needed for replay, including SQL text, bind variables, wall clock time, system change numbers, ... Capture 'filters' can be applied (on different levels: user, program, module, action, service, session). The capture 'result' is platform, and to some extent version, independent. It is a good idea to coordinate the 'start of capture' moment with the 'current database state' of the database system on which the replay will be performed!

    Make sure to have summary 'status' reports available generated on your production system when the capture process was terminated. These reports are essential when evaluating the effects of the replay!

    Some actions against the data are NOT captured - refer to version-specific Oracle documentation.

  3. process the captured workload
  4. i.e. captured files have to be processed for replay. Based on the characteristics of the system on which the replay will be executed, meta-data needs to be generated (once per 'replay system'). The processed workload can afterwards be replayed repeatedly on that same system.

  5. replay the workload
  6. i.e. replay clients are started to process and execute the captured workload. Replay clients submit calls to the test system with the exact same timing and concurrency as the capture system - the default replay 'synchronization feature' - and thus puts the same load on the system as seen in the production environment.

    During replay, runtime re-mapping is performed. Basically, this means that Oracle is aware that some bind variables referring to ROWIDS, LOB-locators, ref-cursors are context aware, and hence, should be re-interpreted (corrected) based on the characteristics of the system on which the replay is to be performed. If application logic involves wall-clock related logic, it might be required to reset the system clock at the start of the replay operation.

  7. analyse the results of the replay

i.e. an evaluation is required with respect to the results of the replay compared to the workload capture. Topics to consider include for example:

  • 'error divergence': did the replay result into unknown, possibly new, errors? Have expected errors occurred?
  • 'data divergence': did the replay result into a different rowcount, distinct aggregated values, ... Compare with the 'status' reports generated at the end of the capture phase!
  • 'session failure': did a replay session crash?

The primary interface to configure - and use - the replay feature is Oracle enterprise Manager; however, a command line interface is provided through a number of PLSQL packages.

Let's Play!

So - ready for some 'what if' analysis? Because that is basically what is possible today.

  • provide for a stable 'test' environment - a reference basline; use 'flashback database' to restore the required baseline over and over again if so desired/if required;
  • apply the required modifications - the 'change' we need to manage or evaluate;
  • run the replay operation, and verify what happens.

When combining the flashback/replay features proved by the Oracle database, the DBA can quickly and effectively evaluate the consequences of 'change'; and can decide on corrective actions that need to be applied (and tested again) to a production environment before enabling that same change in a production environment!