Thursday 22 May 2008

Oracle Data Guard and High Availability Physical Standby Database configuration using Oracle 10g R2 and Ubuntu 7.10

In this post I would like to share my experience of setting up an Oracle Data Guard (DG) Fast-Start Failover High Availability (HA) environment with a Physical Standby Database and Data Guard Broker enabled, using 2 PCs with Ubuntu 7.10 Desktop and Oracle 10g R2.


Data Guard and High Availability is not much of DBA work really, there is no tuning, data modelling or core DBA skills like SQL involved.

This is all about the chit-chat between two servers, lots of networking magic-do, and one server informing the other when it is down. Is like SysAdmin disguised as a DBA. You can also call it a "poor man's RAC", I suppose.

It took me about 2-3 hours, depending on the size of your database, to set up and successfully implement the following.

Summary of steps

My instructions will be as brief and as neat as possible. Here we go:


1. Configure the primary database

1.1 Enable Forced Logging on the primary database

SQL> conn / as sysdba
SQL> alter database force logging

1,2 Create a password file with orapwd on then primary database
$ orapwd file=/usr/local/oracle/product/10.2.0.1/dbs/orapwtest01 password=kubi entries=2

1.3 Configure standby redo Logs on the primary database, these will be needed when the primary changes role and becomes standby

SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo01.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo02.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo03.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo04.log' SIZE 50M;

1.4 Get a better than the default standby_archive_dest location
SQL> alter system set standby_archive_dest='/u01/oradata/test01/standby_archive_dest';

1.5 Create pfile from the existing database spfile, if you have used dbca to create your db you probably have an spfile
SQL> create pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora' from spfile;

1.6 Then Edit the pfile and add those Data Guard specific parameters

On the primary database box edit the pfile for the primary database like this, add the following at the end of the file, change your host names and database SID accordingly.

# DG Config PRIMARY ROLE initialization parameters
*.db_unique_name=host_istanbul
*.db_domain='mediterranean'
*.log_archive_config='DG_CONFIG=(host_istanbul,host_london)'
*.log_archive_dest_1='LOCATION=/u01/oradata/test01/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=host_istanbul'
*.log_archive_dest_2='SERVICE=TO_HOST_london LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=host_london'
*.log_archive_dest_state_1=ENABLE
*.log_archive_dest_state_2=ENABLE
*.log_archive_max_processes=4
*.db_flashback_retention_target=4320
*.undo_retention=3600
#
# DG Config STANDBY ROLE initialization parameters
*.fal_server=host_london
*.fal_client=host_istanbul
*.standby_file_management=auto
#
# Flashback
*.db_recovery_file_dest=/u01/oradata/test01/fra
*.db_recovery_file_dest_size=524288000

1.7 Then startup the primary database instance using the pfile with the new Data Guard parameters

SQL> startup force pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora';

1.8 After the instance startup, recreate the spfile to include the new added Data Guard parameters

SQ> create spfile from pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora';

1.9 Bounce the primary database so that it starts up using the spfile and the new Data Guard parameters

SQL> shutdown immediate;

SQL> startup mount;

1.10 Put the primary database in archivelog mode

SQL> alter database archivelog;


1.11 Enable flashback on the primary database, flashback will be necessary for fast-start failovers

SQL> alter database flashback on;
SQL> alter database open;

Now you have a database ready to be used as primary database with all Data Guard configuration parameters in place. It is now time to create the physical standby database, lets move on.



2. Create a physical standby database from your primary database using RMAN


2.1 Create a backup copy of the primary database data files with RMAN (Oracle 10g R2) ready to be used for standby duplication.


# !/bin/bash

# Unix controls
#
trap cleanup 1 2 3 15
cleanup()
{
echo "Caught CTRL-C Signal ... exiting script."
exit 1
}


#!/bin/bash
# Oracle Environemt Variables
#
export ORACLE_SID=test01
export ORACLE_BASE=/usr/local/oracle
export ORACLE_HOME=/usr/local/oracle/product/10.2.0.1
export PATH=$PATH:${ORACLE_HOME}/bin
#
rman target=/ <<
EOF
configure controlfile autobackup on;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/u01/backup/test01/standby_backups/autobkpcontrolfile_%F';
run {
change archivelog all crosscheck;
allocate channel rman_back_ch01 type disk;
allocate channel rman_back_ch02 type disk;
backup as compressed backupset incremental level 0 database
format '/u01/backup/test01/standby_backups/sbybk_inc0_%s_%p' include current controlfile for standby;
sql "alter system archive log current";
backup as compressed backupset archivelog all format '/u01/backup/test01/standby_backups/archlog_%s_%p';
release channel rman_back_ch01;
release channel rman_back_ch02;
}
EOF





2.2 Prepare an Initialization Parameter File for the standby database



oracle@istanbul:~$ scp /usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora
oracle@london:/usr/local/oracle/product/10.2.0.1/dbs/

On the standby box after you copy a pfile from the primary database default location edit like this:

# DG Config PRIMARY ROLE initialization parameters
*.db_unique_name=host_london
*.db_domain='mediterranean'
*.log_archive_config='DG_CONFIG=(host_istanbul,host_london)'
*.log_archive_dest_1='LOCATION=/u01/oradata/test01/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=host_london'
*.log_archive_dest_2='SERVICE=TO_HOST_istanbul LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=host_istanbul'
*.log_archive_dest_state_1=ENABLE
*.log_archive_dest_state_2=ENABLE
*.log_archive_max_processes=4
*.db_flashback_retention_target=4320
*.undo_retention=3600
#
# DG Config STANDBY ROLE initialization parameters
*.fal_server=host_istanbul
*.fal_client=host_london
*.standby_file_management=auto

# Flashback
*.db_recovery_file_dest=/u01/oradata/test01/fra
*.db_recovery_file_dest_size=524288000
#

2.3 Startup the standby database in NOMOUNT state and create it

SQL> startup pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora' nomount;

Before you duplicate the database for standby with RMAN you have to configure connectivity between the 2 boxes with tnsnames.ora.

On the Primary System your tnsnames.ora file should look like this:

TO_HOST_london =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = london)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)

TO_HOST_istanbul =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = istanbul)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)

On the Standby System your tnsnames.ora file should look like this:

TO_HOST_istanbul =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = istanbul)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)

TO_HOST_london =
( DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = london)(PORT = 1521)))
(CONNECT_DATA =
(SERVICE_NAME = test01))
)

This is so that the boxes can communicate with each other via those service names. Then you can duplicate the primary database on the standby box using RMAN like this

Make sure orapwd is run on london as well

orapwd file=/usr/local/oracle/product/10.2.0.1/dbs/orapwtest01 password=kubi entries=2

Then connect to RMAN on the primary database and start the creation of the standby database via RMAN.

oracle@istanbul$ rman target /
RMAN>
connect auxiliary sys/kubi@to_host_london
run
{
allocate auxiliary channel ch1 type disk;
duplicate target database for standby dorecover nofilenamecheck;
release channel ch1;
}

2.4 Add the standby log files on the standby database

If you watch the backup in the alert.log file RMAN gives the recommendation of adding the standby log files anyway.

ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo03.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo02.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo01.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo04.log' SIZE 52428800;

2.5 Create SPFILE for the standby database

SQL> create spfile from pfile='/usr/local/oracle/product/10.2.0.1/dbs/inittest01.ora';

2.6 Bounce the standby database to pickup the changes and start with an spfile.

SQL> startup force mount;

2.7 Put the standby database in ARCHIVELOG mode.

SQL> alter database archivelog;

2.8 Put the standby database inf FLASHBACK mode

SQL> alter database flashback on;

2.9 Put the standby database in constant recovery mode receiving archived logs from the primary database

To start Redo Apply, issue the following statement:


SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;


3. Oracle Data Guard and Data Broker (DGMGRL) configuration

3.1. On both Primary and Standby Database set the parameter DG_BROKER_START to TRUE

SQL> ALTER SYSTEM SET DG_BROKER_START=TRUE SCOPE=BOTH;


3.2. Add global_db_name parameters and other parameters to the listener.ora file on both primary and standby as follows, on Primary Database your listener.ora

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
(PROGRAM = extproc)
)
(SID_DESC =
(SID_NAME = test01)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
(SID_DESC =
(SID_NAME=pdb)
(GLOBAL_DBNAME=host_istanbul_DGMGRL.mediterranean)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
)
LISTENER = (DESCRIPTION =
(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=istanbul)
(PORT=1521))))
SQLNET.EXPIRE_TIME=2

on Standby Database your listener.ora

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
(PROGRAM = extproc)
)
(SID_DESC =
(SID_NAME = test01)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
(SID_DESC =
(SID_NAME=pdb)
(GLOBAL_DBNAME=host_london_DGMGRL.mediterranean)
(ORACLE_HOME = /usr/local/oracle/product/10.2.0.1)
)
)
LISTENER = (DESCRIPTION =
(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=london)
(PORT=1521))))
SQLNET.EXPIRE_TIME=2


3.3 Create the DGMGRL broker Configuration

$dgmgrl
DGMGRL> connect sys/kubi@to_host_istanbul;

DGMGRL> create configuration mediterranean_dg as primary database is host_istanbul connect identifier is to_host_istanbul;

DGMGRL> add database host_london as connect identifier is to_host_london maintained as physical;

DGMGRL> enable configuration;

3.4. Enabling Fast-Start Failover and the Observer


DGMGRL> EDIT DATABASE 'host_istanbul' SET PROPERTY 'LogXptMode'='SYNC';
DGMGRL> EDIT DATABASE 'host_london' SET PROPERTY 'LogXptMode'='SYNC';

3.5 Specify the FastStartFailoverTarget property

DGMGRL> EDIT DATABASE 'host_istanbul' SET PROPERTY FastStartFailoverTarget='host_london';
DGMGRL> EDIT DATABASE 'host_london' SET PROPERTY FastStartFailoverTarget='host_istanbul';

3.6. Upgrade the protection mode to MAXAVAILABILITY, if necessary.

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;

3.7. Enable fast start failover

DGMGRL> ENABLE FAST_START FAILOVER;

3.8. Start the observer.

DGMGRL> CONNECT sys/kubi@to_host_istanbul;
DGMGRL> START OBSERVER;
Observer started

The above command will just hang, will not return you back to the prompt, this is how the observer is started, it is normal. Start anothter dgmgrl prompt for the rest of the operations.

3.8 Check the configuration so far

DGMGRL> connect sys/kubi@to_host_istanbul
Connected.
DGMGRL> show configuration;

Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Primary database
host_london - Physical standby database
- Fast-Start Failover target

Current status for "mediterranean_dg":
SUCCESS

Also see the verbose output where you can see how long it will take for a box to failover to the other.

DGMGRL> SHOW CONFIGURATION VERBOSE;

Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Primary database
host_london - Physical standby database
- Fast-Start Failover target

Fast-Start Failover
Threshold: 30 seconds
Observer: istanbul

Current status for "mediterranean_dg":
SUCCESS

4. Do the failover

Test by killing PMON of database instance test01 on the primary database box istanbul, this automatically will trigger a failover within 30 seconds to the standby database as configured. You can watch this exciting event happening by looking at the alert_tst01.log on both boxes simultaneously.Once you crashed test01 on host_istanbul, then on host_london go to dgmgrl and check out what it says.


oracle@london$ dgmgrl

DGMGRL> connnect sys/kubi@to_host_london

DGMGRL> show configuration;

Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Physical standby database (disabled)
- Fast-Start Failover target
host_london - Primary database

Current status for "mediterranean_dg":
Warning: ORA-16608: one or more databases have warnings

That is, host_istanbul database is down!

Be careful, the database is down and NOT the box. As I am running observer on host_istanbul as well, if you unplug the box, probably nothing nice will happen as the observer will be incapable of detecting anything. In a real life situation I suppose the observer runs on a 3rd piece of hardware. I haven't tested the 'unplugging' of the box, I don't know what it will do. I just 'kill -9' the PMON backgroud process for the primary database instance test01.

Anyway, your primary database now is host_london. You have failed over to it successfully in 30 seconds.

If you followed the configuration steps above you will be able to REINSTATE a disabled standby database as you have flashback enabled and put it back in the High Availability environment after you mount it. This is what we will do next.



5. Reinstate the failed primary database as a physical standby database after the failover

Fast-Start failover is a very good configuration for High Availability (HA) as it requires no intervention from the DBA, almost, you wish!

Say the primary database fails, say at 04:00 am, when you are sleeping, you get no phonecall from your boss, and the standby database becomes primary. Oleey! business continues. The next day when you realise what happened, you can with Data Guard Observer and using flash back logs reinstate the failed primary to be a standby to the new primary.

Basically Data Guard 'rolls forward' the SCN of the old Primary to match that of the now new primary (ex Standby). That is what Flashback Database does and it is all about. Moving the SCN to catch up with the primary.

In all situations, you will have to manually mount the failed ex-Primary database, probably the next day, if it has no media failure and can be restarted without any problems.

Oracle Data Guard Broker can then reinstate the failed primary database as the new standby database. Changing of roles. But, before you reinstate the failed primary database host_istanbul, first see the state of both databases by logging in to the DGMGRL from the new primary database host_london.


5.1 Login to DGMGML from the new primary database host and checkout your configuration

oracle@london:~$ dgmgrl
DGMGRL for Linux: Version 10.2.0.1.0 - Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/kubi@to_host_london;
DGMGRL> show configuration verbose;

Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Physical standby database (disabled)
- Fast-Start Failover target
host_london - Primary database

Fast-Start Failover
Threshold: 30 seconds
Observer: istanbul

Current status for "mediterranean_dg":
Warning: ORA-16608: one or more databases have warnings

5.2 After this you are sure you have failed over and the new primary database is working properly go to the failed ex-primary database box and mount the instance test01.

oracle@istanbul:~$ sqlplus /nolog

SQL*Plus: Release 10.2.0.1.0 - Production on Thu May 24 23:35:58 2008

Copyright (c) 1982, 2005, Oracle. All rights reserved.

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 608174080 bytes
Fixed Size 1220844 bytes
Variable Size 167776020 bytes
Database Buffers 436207616 bytes
Redo Buffers 2969600 bytes
Database mounted.
SQL>

5.3 Go back to DGMGRL on new primary database host_london and run the reinstate database command

DGMGRL> reinstate database 'host_istanbul';

You have to give it some time, the configuration will not immediately show the reinstated status, but after a few minutes you will get the following.

DGMGRL> show configuration;

Configuration
Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
host_istanbul - Physical standby database
- Fast-Start Failover target
host_london - Primary database

Current status for "mediterranean_dg":
SUCCESS

Congratulations! You just had a successfully failed over and now your primary database is host_london and the standby database is host_istanbul and the obsesrver is watching them.

NOTES

Be patient when you are working with this configuration use a test system, you can use VMWare to have multiple nodes in one box and make sure you have:

  • password files with the same password on both machines.
  • you have tnsnames.ora set up on both machines and you can communicate between both machines via service names
  • you have flashback enabled on both databases
  • you have configured a db_domain and is the same in both init.ora files
References:

  1. Oracle® Data Guard Concepts and Administration 10g Release 2 (10.2) Part Number B14239-01 link: http://youngcow.net/doc/oracle10g/server.102/b14239/rcmbackp.htm


  2. Using Data Guard to do fail-over and other cool features here:
    http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14230/cli.htm#i1005573