Saturday 20 December 2008

sp_spaceused and other MS SQL Server Dictionary views

Many times I am asked to look at Microsoft SQL Servers to investigate things, and I find the following T-SQL procedures and Microsoft SQL Server dictionary views very handy.

To find out how much disk space you Microsoft SQL Server database consumes and other things, you can run the following T-SQL procedures inside the Query Analyzer.

sp_spaceused @updateusage = 'TRUE'

To see quickly how many stored procedure, views, tables and other objects your database has you can use this query here

See table disk space usage

sp_spaceused 'Tablename'

See all tables

select *

See all foreign key constraints

select *

See foreign keys, primary keys of a particular table


See the foreign key of a particular table.

select table_name, column_name "foreign key", constraint_name
where TABLE_NAME='put_table_name_here'
and constraint_name not like 'PK%'

Find which MS SQL Server Stored procedure is using a particular table, or in other words dependencies between stored procedures and tables.

FROM syscomments sc
INNER JOIN sysobjects so ON
WHERE sc.TEXT LIKE '%put_table_name_here%' -- enter table name here.

For more info go to:

Monday 1 December 2008

Service Level Agreement (SLA) figures

How does downtime affect your Service Level Agreement (SLA) with your customer?

What does 99.99999 % uptime per year mean?

If you want answers to these questions you have too look at the availability levels below. They show how many minutes, seconds downtime/system outage you are allowed to have in a year, to meet certain SLA figures and your contractual obligations.

Level of availability Total downtime per year
===================== ======================
99% 3.6 days
99.9% 8.76 hours
99.99% 52 minutes
99.999% 5 minutes
99.9999% 30 seconds
99.99999% 3 seconds

Saturday 22 November 2008

Data Integration and the Cloud

Recently I read a Special Report from the magazine Economist about the computing "Cloud". The 1 year old, yet still buzzy and exciting new phenomena in the IT world.

The article is quite analytical and it is definitely worth reading. Here is the full link to the article called "Let it rise!":

I have to warn you the article is quite lengthy but is worth every page of it.

After you read the article you get a feeling like returning back to the past in the supercomputers era of mainframes. That is what The Cloud is. A "thing" which lives on the internet, and where all the IT infrastructure lives and it is remote and you connect to it via the internet.

No more hardware and server rooms in the office, just smart PCs with browsers and the browser actually becoming the operating system, full of internet based applications for you to work with.

The Cloud is getting rid of all the dusty servers in the server room. All your applications are sucked by the Cloud. If you need word-processing to be done, you will just login to the internet and you use Google Notes or something. If you need spreadsheets, you can use Google Docs. Do you need to design and run a database use Oracle Apex. These are already existing and mature Cloud technologies. Do you need a ready-made CRM application without the complexity of installing it and maintaining it yourself on your PC or your servers, use Salesforce or SugarCRM.

All these and many more are available for you on the internet, in the form of SaaS which stands for Software as a Service. All you need to have is an internet browser and little money to pay your subscription to these SaaS services as you would pay for your mobile phone bill. Software is becoming a Pay-As-You-Go thing! There is no more licenses, product installation keys and certificates to worry about. You don't have to download anything and you don't have to install anything. Even better you don't need to upgrade to the new version each time, as this is done for you automatically. Sounds nice doesn't it?

The Cloud seems a heaven of applications, but how about data? What if I want to store gigabytes-terrabytes of data in the Cloud? How about backups?

Yep it is possible. Check Amazon AWS and S3 you pay pennies for gigabytes of storage per month. But how practical is this? Will all companies suddenly decide to part from their valuable and competitive data and leave it to the misty-airy hands of The Cloud? Will it happen quickly? I think it won't happen as quickly as it happened for the applications. You can guess reasons why. Usually we don't sell data but we sell applications, applications come and go but data stays, we don't like to pass the control of our data to someone else or somewhere else, are just a few reasons.

But what is the alternative? The alternative is to connect the cloud to your database or legacy data. Keep the data, but get The Cloud connect to it! Do we have to care where the data is when we want to analyse it? Do we have to know if it is on Oracle, MS SQL or MySQL or in an MS Excel spreadsheet? Do we have to be looking for the right tool to connect to the right database? In tools issue, I like what Oracle did with Oracle SQL Developer, you can connect it to any database.

To carry on, we should just be able to get to the data with no gimmicks. Also would be nice to have the data presented in an application in the Cloud. Right? If these are sort of your requirements then probably you are looking at one of those Data Integration solutions.

An integration solution which is not just buying Crystal Reports or Business Objects or Oracle OBIEE and installing it on one of your new servers and firing up more and more instances and doing lots of more configurations etc. But a solution where you can have all this in the Cloud, done by others at a Pay-As-You-Go cost, and all you get is a Login to start developing your reports and web applications.

Who wouldn't prefer not to own the complexity of maintaining a reporting server infrastructure but would like to own the data?

At the end all you need is a web page which connects to your database and you develop it as easy as writing an email. A web page which is an application, developed on the internet and stays on the internet but connects to your database in your data centre or your PC. Wouldn't that be cool?

Apatar a Data integration tool

Yes, there are independent tools out there for the integration of database systems with the Cloud. There are ways to get the cloud to connect to your whichever database system. I have been looking at one of them recently called Apatar You can find out more on

Apatar is an Open Source data integration and ETL tool. It is FREE and it lives in Available both for Linux and Windows. It works a treat. It is just 22MB of software. You can downoad Apatar from:

Once you installed it you just drag and drop objects to its screens. These objects could be databases and maybe applications you have in the Cloud. You configure the objects by double-clicking on them and providing database login and Cloud login info. At no time, you start pumping and synchronizing data between your database and your Cloud application (i.e. Salesforce). Look at some screen-shots from the Apatar website to understand how it works. You can even schedule this operations to happen at times of your choice with the tool.

It is great to see Cloud enabled great software like Apatar for FREE and this doesn't stop here. I found even more tools like apatar for free here:



Wednesday 12 November 2008

Entity Relationship diagrams using Oracle SQL Developer Data Modeling

Finally, the early adopter release of Oracle SQL Developer Data Modelling tool is out. It is easy to use and I quickly managed to reverse engineer a database schema from an Oracle 10g database.

All you have to do is to connect to a database, specify which schema and which tables you want to include in your ER Model, and then just press the button. It even connects to Microsoft SQL Server 2000 database and extracts the ER Models. Pretty impressive!

I use ubuntu 8.04 and the install was easy, just unzip the downloaded file to a directory and run the script. See instructions below.

It was about time for Oracle to come with a solution like this and for FREE, well almost. Still not clear! But I downloaded the early adopter version to try it.

You can download Oracle SQL Developer Data Modeling from:

Here are the instructions to reverse engineer a schema using the tool.

1. Go to directory where you unziped the download (in my case the download like this:

cd /opt/osdm1/osdm/bin

2. Run the file like this


3. When the application starts from the menu, go to:

File > Import > DB Catalog

4. The "DB Metadata Extraction Wizard" starts.

Just follow the instructions and provide your login credentials like you would do with any other tool connecting to a database and choose the tables you want to model.

Enjoy modeling in your
universe of discourse!

You can also read my other post on this topic which shows you how to create a database with OSDM here

Tuesday 11 November 2008

Microsoft SQL Server 2000 Data Dictionary view to view all objects in the database

See all objects in your Microsoft SQL database and their types.

This is one of my favorite data dictionary queries in MS SQL Server. It shows me quickly what makes up a database, how many tables, triggers, views etc.

I have been using it on MS SQL Server 2000 and it works!

select owner, cnt, object_type =
CASE xtype
when 'C' then 'CHECK constraint'
when 'D' then 'Default or DEFAULT constraint'
when 'F' then 'FOREIGN KEY constraint'
when 'L' then 'Log'
when 'FN' then 'Scalar function'
when 'IF' then 'Inlined table-function'
when 'P' then 'Stored procedure'
when 'PK' then 'PRIMARY KEY constraint (type is K)'
when 'RF' then 'Replication filter stored procedure'
when 'S' then 'System table'
when 'TF' then 'Table function'
when 'TR' then 'Trigger'
when 'U' then 'User table'
when 'Q' then 'UNIQUE constraint (type is K)'
when 'V' then 'View'
else 'unknown'
select owner, so.uid uid, so.xtype xtype,count(*) cnt from sysobjects so, sysusers su
where so.uid = su.uid
group by, so.xtype, so.uid
) sysobjects_info

Owner Count Object Type
---------- ------- ----------------------------------
dbo 12 CHECK constraint
dbo 363 Default or DEFAULT constraint
dbo 127 FOREIGN KEY constraint
xanthi 34 Stored procedure
dbo 241 Stored procedure
dbo 234 PRIMARY KEY constraint (type is K)
dbo 19 System table
dbo 10 Trigger
dbo 399 User table
dbo 6 unknown
dbo 11 View

Monday 10 November 2008

ORA-01000: maximum open cursors exceeded

When you get the ORA-01000: maximum open cursors exceeded did you ever want to see which SQL is causing this error? That is, try to find where the leak is?

Then try running the following sql statements on the dynamic dictionary views V$SESSION and V_$OPEN_CURSOR. This statement shows which SQL statement, which username and which machine are using up your cursors.

SQL> select s.machine, oc.user_name, count(1) as count, oc.sql_text
from sys.v_$open_cursor oc, v$session s
where s.sid = oc.sid
group by s.machine, user_name, sql_text
order by count desc;

----------------------- --------------- ------- ------------------------------------------------------------ Athena 127 SELECT NAME, EAR_NAME, MASTER_NODE_ONLY, ENABLED, DEFINITION Athena 120 SELECT NAME, EAR_NAME, MASTER_NODE_ONLY, ENABLED, DEFINITION Athena 43 SELECT NAME, EAR_NAME, MASTER_NODE_ONLY, ENABLED, DEFINITION Athena 39 select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.fla Athena 35 INSERT /*+ IDX(0) */ INTO "Athena_OWNER"."MLOG$_USERS" (dmlt
You must run this as SYSDBA. The output is from an Oracle 10g instance.

Tuesday 28 October 2008

Cloud computing

These two buzz words "cloud computing" have been in my attention lately and I really wanted to find out what they stand for. I got a classic and short definition from Wikipedia:

"Cloud computing is Internet-based ("cloud") development and use of computer technology ("computing"). "
see full definition....

It seems like it is time for the internet to prove what it stands for, the ultimate computer!?

Yes, from the look of things, all of the applications and databases of humankind are going to be on the internet. At the office we will not have "server rooms" any more, but just PCs with Internet browsers. Developers, will be the only kind of IT people that will stay back in the office, they will sit very near to the boss together with Business Analysts and IT Managers. Modern day developers of the future will be using the browser, and rapidly building applications, maybe with "declarative programming". Is it already hapenning? If you look at the business model of companies like Salesforce where they market concepts like "No Software!" and Software as Service (SaaS) and Oracle's hosted rapid application development environment called Oracle Apex one can easily start seeing The Clouds gathering in the sky. How about the Greenplum the Mega-Giga Titanic Datawarehouse for everyone on the Internet. A datawarehouse internet appliance where everyone can put their datawarehouse and access it from anywhere on the earth on a browser and do analysis. No Servers, No hardware, No Software just a login to a workspace on the internet and that's it. Everything you need, tools, spreadsheets will be there on the internet wating for you.

I wonder what will happen to all those other IT guys, SysAdmins, DBAs. Will they lock them up in huge data centres somewhere in the countryside? Will they be needed at all?

Tom Kyte an Oracle Expert was asked a smilar question on his website and here is Thomas Kyte's comment on Cloud Computing.

Or is it going to be as Tom Kyte says that with 'Cloud Computing', databases will just get larger and larger.

Monday 27 October 2008

Monitor the time an RMAN backup takes

To see what RMAN is doing now, and see what SID is doing what sort of work, and how much it has got left to do, use the following SQL. This script is good when you are trying to see how much work an RMAN Channels have got left to do. It is good to watch with the RMAN backup script log (tail -f) as the backup is hapenning. For both scripts you have to lonig as SYSDBA on the instance where the BACKUP
is or RESTORE hapenning.

SELECT sid, 
       ( sofar / totalwork ) * 100 pct_done 
FROM   v$session_longops 
WHERE  totalwork > sofar 
       AND opname NOT LIKE '%aggregate%' 
       AND opname LIKE 'RMAN%' 


---------- --------- ---------- ---------- ----------
       100 27-OCT-08    1554952    1364978 87.7826454

To watch the success or failure of an RMAN job in the past, or even when it is hapenning, you can use the dynamic v$ view v$rman_status. The following query will show you a history of your BACKUP and RESTORE operations. By changing the where start_time > sysdate -1 clause you control how much in the past you want to look at. I am using this on Oracle 10g, I don't know if it is available on Oracle 9i and before.

SELECT To_char(start_time, 'dd-mon-yyyy@hh24:mi:ss') "Date", 
FROM   v$rman_status vs 
WHERE  start_time > SYSDATE - 1 
ORDER  BY start_time 


Date                 STATUS                  OPERATION                         MBYTES_PROCESSED
-------------------- ----------------------- --------------------------------- ----------------
27-oct-2008@11:40:11 FAILED                  RMAN                                             0
27-oct-2008@11:40:29 COMPLETED               BACKUP                                       11812
27-oct-2008@12:06:30 COMPLETED               BACKUP                                       23112
27-oct-2008@12:41:45 COMPLETED               BACKUP                                         160
27-oct-2008@12:42:28 FAILED                  CONTROL FILE AUTOBACKUP                          0
27-oct-2008@17:24:28 RUNNING                 RMAN                                             0
27-oct-2008@17:24:43 COMPLETED               DELETE                                           0
27-oct-2008@17:24:51 COMPLETED               CATALOG                                          0
27-oct-2008@17:25:16 RUNNING                 RESTORE                                  22082.875

Thursday 25 September 2008

Partitioning using CREATE TABLE AS (CTAS) and Column Default Values

This is about the CTAS (Create Table As...) operations during the creation of partitioned tables from normal tables in an Oracle database.

The CTAS operation although copies column constraints such as NULL, NOT NULL from the normal table to the partitioned table during the partitioned table creation, it does not copy DEFAULT VALUEs of the columns. This might lead to a nasty surprise if you are doing RANGE PARTITIONING and the partition key DATE column has a DEFAULT VALUE of SYSDATE in the normal table. This DEFAULT VALUE setting is NOT copied to the partitioned table!

What will happen is, if you do not specify a date explicitly in your INSERT statements you will get an 'ORA-14400: inserted partition key does not map to any partition' error. As the partition key value passed in will be NULL and the partitioned table will NOT know about the DEFAULT value.

Things can be worse if you are using a combination of CTAS to create partitioned tables from normal tables, and then you use an ALTER TABLE ... RENAME TO .. to replace your normal production tables with the new partitioned tables. Let me tell you how. If your application's INSERT statements into these new partitioned tables, do not explicitly specify the date value of the partition key, you will pass in NULLs and hit ORA-14400. Watch out you will suffer production outage!

To fix this problem, you will have to either explicitly change your code to pass in a date value for the partition key column, or alter the partitioned table after CTAS and modify the column to have a default value.

Here is a demonstration on Oracle 10g R2 of how easily this thing can happen.

Create the normal table

conn scott/tiger

drop table big_table;

create table big_table
(id number primary key,
subject varchar2(500),
created_date date default sysdate

insert into big_table (id, subject) values (4,'tset3')

1 row created.


Commit complete.

Create the partitioned table with CTAS from the normal table above, consider using NOLOGGING table creation option to avoid trashing the logs if you think this data is recoverable from elsewhere. This will also create the table faster.

drop table par_big_table
-- change dates below appropriately to include the SYSDATE
-- at the time you run this example.

create table par_big_table
partition by range (created_date)
partition p200809 values less than (to_date('01-10-2008', 'DD-MM-YYYY')),
partition p200810 values less than (to_date('01-11-2008', 'DD-MM-YYYY'))
select * from big_table

Now try to insert into the new partitioned table without passing the CREATED_DATE value, as you will wrongly assume the new partitioned table will have a DEFAULT VALUE. See how you get the error.

insert into par_big_table (id, subject) values (5,'test4')

insert into par_big_table (id, subject) values (5,'test4')
ERROR at line 1:
ORA-14400: inserted partition key does not map to any partition

This happens because the value the INSERT statement is passing for the CREATED_DATE column is NULL and the partitioned table doesn't have a DEFAULT VALUE for this column.

To fix this error and stop the production outage you might have caused :-) you have two choices:

* ALTER the new partitioned table and make the column to have a DEFAULT VALUE
* Change the application code to always include a value for the CREATED_DATE

Let's ALTER the table.

alter table par_big_table modify ( created_date date default sysdate);

insert into par_big_table (id, subject) values (5,'test4');

1 row created.


Commit complete.

Again watch for those DEFAULT VALUES in columns when you are creating partitioned tables with CTAS.

Friday 15 August 2008

Capture bind variables in SQL with Oracle FGA

In Oracle 10g with fine-grained auditing (FGA) it is possible to track the bind variables of your application's SQL statements. I know, Oracle 11g is out but how amazing, I am still discovering new things in Oracle 10g!

Oracle FGA is good for bind variables, if that is what you want after all. It is lightweight and more easy to use than its alternatives, for example the full SQL TRACE with the option '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12'.

Another benefit of FGA is that you can choose which object you want to audit and what kind of statements, SELECT, INSERT...etc you want to audit. Whereas with the SQL TRACE you just have to accept the trace dump and the performance implications.

More good news is that in Oracle 10g you don't have to bounce the database anymore to set the initialization parameter AUDIT_TRAIL and enable auditing for your system. FGA does not require a database shutdown/restart.

Object or user auditing can be done ad-hoc and on the spot, thanks to FGA. You can audit INSERT, DELETE or just SELECT statements. Moreover, you can be object specific or user specific. For example if you wanted to audit the DML happening on a table all you have to do is to create an Audit Policy for that table. You can use the DBMS_FGA.ADD_POLICY procedure to create the policy. The modifications on the table, together with the bind variables, are captured and logged in a SYS owned log table called FGA_LOG$. How amazing!

Here is an example:

We set up an object auditing policy, which will monitor all INSERT, UPDATE and DELETE operations on the table EMP on Scott's schema.

SQL> begin
dbms_fga.add_policy (
object_schema => 'SCOTT',
object_name => 'EMP',
policy_name => 'EMP_DETECTIVES',
audit_column => 'ENAME',
statement_types => 'INSERT, UPDATE, DELETE',
audit_trail => DBMS_FGA.DB_EXTENDED

PL/SQL procedure successfully completed.

Then connect as SCOTT the owner of the object you are auditing and do some changes on the object.

SQL> conn scott/tiger


SQL> variable myname varchar2(50);
SQL> exec :myname :='ROBIN';

PL/SQL procedure successfully completed.

SQL> insert into emp values (9999, :myname, null, null, null, null, null,null);

1 row created.


Commit complete.

SQL> delete emp where ename = :myname;

1 row deleted.

SQL> commit;

Now look at the FGA_LOG$ to see the audit entries.

SQL> conn sys/***** as sysdba


SQL> column ntimestamp# format a30
SQL> column lsqltext format a15
SQL> column dbuid format a15
SQL> column obj$name format a10
SQL> column lsqlbind format a15
SQL> column lsqltext format a20

SQL> select ntimestamp#, dbuid, obj$name, lsqlbind, lsqltext from sys.fga_log$;

------------------------------ --------------- ---------- --------------- --------------------
14-AUG-08 PM SCOTT EMP #1(5):ROBIN insert into emp valu
es (9999, :myname, n
ull, null, null, nul
l, null,null)

14-AUG-08 PM SCOTT EMP #1(5):ROBIN delete emp where ena
me = :myname

WARNING : Don't forget to drop the policy once you are done, cause the auditing will go for infinite time on the object. The FGA_LOG$ table will fill up and you will waste space. You can drop the policy once you have done with it like this:

SQL> conn sys/***** as sysdba
dbms_fga.drop_policy ( 'SCOTT', 'EMP','EMP_DETECTIVES');

Monday 16 June 2008

Oracle OBIEE Install on Unbreakable Linux

After visiting a seminar given by Oracle on Data Analysis and Oracle BI, I have decided to explore the Oracle OBIEE product and see it first hand. In this post I would like to share my experience of installing Oracle OBIEE on Oracle's Unbreakable Linux.

I am in the opinion that all things should be in the database and I was reluctant and skeptical about Oracle BI. Why on earth do we need another app server kind of server, a server which is half webserver and half app server and all it does is things like daily office automation tasks. After reading some marketing text about the tool I found it to claim to do things like:

  • Mail merge
  • Cheque printing
  • PDF reports
  • Flash graphs
  • Any format Reports
  • Interactive dashboards
  • Ad-hoc analysis
  • Market analysis

A true mix of Siebel, JD Edwards and Peoplesoft one can say. Especially what is the point of having a tool like OBIEE, when you can have all this (maybe not all, but a significant portion) directly from the database with Oracle APEX in Oracle 11g served to the user via a web browser and for FREE. Having Google offering MS Word like applications on the web what is the point really of OBIEE. Is OBIEE already old technology?

Installing Oracle OBIEE on Unbreakable Linux

So, I have downloaded Unbreakable Linux from the Oracle Store here. Installed the OS on a Desktop Machine and then, downloaded Oracle Business Intelligence Enterprise Edition OBIEE and particularly the Linux version file biee_linux_x86_redhat_101333_disk1.cpio from Oracle Downloads Website.

One thing I noticed when I was reading about OBIEE is that it is "Database Agnostic", that is it doesn't need a database to operate, although you can put it in XMLDB. But this is not necessary as OBIEE can live on a file system.

That is what I did and I run the command on the downloaded file like this from my Linux prompt

$ cpio -idmv < biee_linux_x86_redhat_101333_disk1.cpiowhen the file extracted gave me following directory structure


then all you need to do to start installilng the OBIEE is run the script like this


During the installation the trickiest part was to provide your JDK software's home. That is you have to have Java installed on the box where you will run OBIEE as it runs on OC4J (Oracle Containers 4 Java) platform. Basically a J2EE platform.

Once I have correctly indicated the directory where my Java JDK 1.5 (or greater) was installed the installation was smooth and finished quickly. I have chosen the full complete installation of OBIEE from the install options and it put software in the designated Oracle BI homes.


As this is not MS Windows I didn't get lots of menu items at the end of the installation. Instead I had to go to a directory and start certain scripts in order to get the whole of my OBIEE infrastructure up and running. If you want to know which servers and services to start, you will need to read stuff on the Installation manuals and other BI blogs. But as a starter page after the installation you will get the welcome page: file:///home/oracle/OracleBI/index_bi_ee.html

This page tells it all. You get links to the 3 main components of your OBIEE installation on the right top corner of this page and these are:

  • Application Server Control
  • Oracle BI Interactive Dashboards
  • Oracle BI Publisher

Theoretically all 3 should work when you click them. In my case I found only the 1st link to be working. After some Google whacking and troubleshooting search I have realised that some services or some '.sh' scripts required didn't automatically run and required services weren't started properly. Tsk, tsk tsk... Remember? This is a linux platform. Probably OBIEE is best run on Windows, then!?

Anyway I found the /home/oracle/OracleBI/setup directory to be full of cool scripts which you can use to start stuff in OBIEE.

After successfully starting Oracle BI Server, Oracle BI Presentation Services (SAW server) I have finally managed the get all 3 links to work and started exploring the Oracle BI Infrastructure.

In another post I hope to write about my thoughts on OBIEE.


OBIEE Defaut User is : Administrator
OBIEE Default User password is : Administrator

Resources I used for the OBIEE installation.

Thursday 12 June 2008

Data Analysis: A Quick & Cost Effective Approach by Oracle Cornerstone Briefings

I have attended yesterday the Data Analysis: A Quick & Cost Effective Approach by Oracle Cornerstone Briefings for customers in TVP, Reading, Oracle UK Headquarters.

I must say working in UK for the last decade I have never visited this part of the country which is considered the "Silicon Valley" of UK. I was impressed by the buildings Oracle's training rooms and facilities. Woow!

The topic was about Data Analysis and the OBIEE Server. The presentation was in two parts and was Given by Aidan and Stuart of Oracle UK.

Part one focused in the current situation and challenges in Data Management today. The reference architecture was a 3 component chart described as "process data" , "manage data" and then "access data".

The highlights of the first part can be summarized as:

  • The data management process is still difficult, there are still a questions about the "single version of truth" in data management still people chop and spread data all over the place and analyse it in tools such as Excel and etc.
  • There is still complexities even the best of the tools out there can not address.
  • The sources, where we gather data from are still evolving, and new sources appear from which we are not able to harvest the data quickly and efficiently.

The second part of the briefing was, if you haven't guessed already, about how Oracle with its Business Intelligence Server Enterprise Edition (OBIEE), ex Siebel a company Oracle has purchased recently, is able to meet these challenges and complexities on data management.

It seems to me like OBIEE, sweet name for a product, they call it just "obiee", is an application which is "Database Agnostic" that is, it doesn't need to belong or be installed on or work with a particular database. It is a system which is installed on a file system, windows, linux whatever and then it runs all your reporting, report publishing, dashboarding and metadata in one place. So it is not a database! It is a kind of application which manages your reporting and dashboarding metadata storage and publication processes. A kind of webserver and appserver together which schedules and runs queries on any database, the extract of reports via email, pdf, Excel, XML, Flash Charts, dashboards you name it, and it is possible kind of tool.

When I was listening and seeing demonstrations about all this I was thinking how different is OBIEE from Oracle APEX. Then I asked them.

Their answer was that they were not similar at all. APEX is more like a rapid application development environment on the web and is all stored in the database in XMLDB. Whereas although OBIEE can be stored in XMLDB like Apex, Obiee is not an application development tool but a data analysis environment. They are different. Are they?

Thursday 22 May 2008

Oracle Data Guard and High Availability Physical Standby Database configuration using Oracle 10g R2 and Ubuntu 7.10

In this post I would like to share my experience of setting up an Oracle Data Guard (DG) Fast-Start Failover High Availability (HA) environment with a Physical Standby Database and Data Guard Broker enabled, using 2 PCs with Ubuntu 7.10 Desktop and Oracle 10g R2.

Data Guard and High Availability is not much of DBA work really, there is no tuning, data modelling or core DBA skills like SQL involved.

This is all about the chit-chat between two servers, lots of networking magic-do, and one server informing the other when it is down. Is like SysAdmin disguised as a DBA. You can also call it a "poor man's RAC", I suppose.

It took me about 2-3 hours, depending on the size of your database, to set up and successfully implement the following.

Summary of steps

My instructions will be as brief and as neat as possible. Here we go:

1. Configure the primary database

1.1 Enable Forced Logging on the primary database

SQL> conn / as sysdba
SQL> alter database force logging

1,2 Create a password file with orapwd on then primary database
$ orapwd file=/usr/local/oracle/product/ password=kubi entries=2

1.3 Configure standby redo Logs on the primary database, these will be needed when the primary changes role and becomes standby

SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo01.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo02.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo03.log' SIZE 50M;
SQL>ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo04.log' SIZE 50M;

1.4 Get a better than the default standby_archive_dest location
SQL> alter system set standby_archive_dest='/u01/oradata/test01/standby_archive_dest';

1.5 Create pfile from the existing database spfile, if you have used dbca to create your db you probably have an spfile
SQL> create pfile='/usr/local/oracle/product/' from spfile;

1.6 Then Edit the pfile and add those Data Guard specific parameters

On the primary database box edit the pfile for the primary database like this, add the following at the end of the file, change your host names and database SID accordingly.

# DG Config PRIMARY ROLE initialization parameters
*.log_archive_dest_1='LOCATION=/u01/oradata/test01/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
# DG Config STANDBY ROLE initialization parameters
# Flashback

1.7 Then startup the primary database instance using the pfile with the new Data Guard parameters

SQL> startup force pfile='/usr/local/oracle/product/';

1.8 After the instance startup, recreate the spfile to include the new added Data Guard parameters

SQ> create spfile from pfile='/usr/local/oracle/product/';

1.9 Bounce the primary database so that it starts up using the spfile and the new Data Guard parameters

SQL> shutdown immediate;

SQL> startup mount;

1.10 Put the primary database in archivelog mode

SQL> alter database archivelog;

1.11 Enable flashback on the primary database, flashback will be necessary for fast-start failovers

SQL> alter database flashback on;
SQL> alter database open;

Now you have a database ready to be used as primary database with all Data Guard configuration parameters in place. It is now time to create the physical standby database, lets move on.

2. Create a physical standby database from your primary database using RMAN

2.1 Create a backup copy of the primary database data files with RMAN (Oracle 10g R2) ready to be used for standby duplication.

# !/bin/bash

# Unix controls
trap cleanup 1 2 3 15
echo "Caught CTRL-C Signal ... exiting script."
exit 1

# Oracle Environemt Variables
export ORACLE_SID=test01
export ORACLE_BASE=/usr/local/oracle
export ORACLE_HOME=/usr/local/oracle/product/
export PATH=$PATH:${ORACLE_HOME}/bin
rman target=/ <<
configure controlfile autobackup on;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/u01/backup/test01/standby_backups/autobkpcontrolfile_%F';
run {
change archivelog all crosscheck;
allocate channel rman_back_ch01 type disk;
allocate channel rman_back_ch02 type disk;
backup as compressed backupset incremental level 0 database
format '/u01/backup/test01/standby_backups/sbybk_inc0_%s_%p' include current controlfile for standby;
sql "alter system archive log current";
backup as compressed backupset archivelog all format '/u01/backup/test01/standby_backups/archlog_%s_%p';
release channel rman_back_ch01;
release channel rman_back_ch02;

2.2 Prepare an Initialization Parameter File for the standby database

oracle@istanbul:~$ scp /usr/local/oracle/product/

On the standby box after you copy a pfile from the primary database default location edit like this:

# DG Config PRIMARY ROLE initialization parameters
*.log_archive_dest_1='LOCATION=/u01/oradata/test01/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
# DG Config STANDBY ROLE initialization parameters

# Flashback

2.3 Startup the standby database in NOMOUNT state and create it

SQL> startup pfile='/usr/local/oracle/product/' nomount;

Before you duplicate the database for standby with RMAN you have to configure connectivity between the 2 boxes with tnsnames.ora.

On the Primary System your tnsnames.ora file should look like this:

TO_HOST_london =
(ADDRESS = (PROTOCOL = TCP)(HOST = london)(PORT = 1521)))
(SERVICE_NAME = test01))

TO_HOST_istanbul =
(ADDRESS = (PROTOCOL = TCP)(HOST = istanbul)(PORT = 1521)))
(SERVICE_NAME = test01))

On the Standby System your tnsnames.ora file should look like this:

TO_HOST_istanbul =
(ADDRESS = (PROTOCOL = TCP)(HOST = istanbul)(PORT = 1521)))
(SERVICE_NAME = test01))

TO_HOST_london =
(ADDRESS = (PROTOCOL = TCP)(HOST = london)(PORT = 1521)))
(SERVICE_NAME = test01))

This is so that the boxes can communicate with each other via those service names. Then you can duplicate the primary database on the standby box using RMAN like this

Make sure orapwd is run on london as well

orapwd file=/usr/local/oracle/product/ password=kubi entries=2

Then connect to RMAN on the primary database and start the creation of the standby database via RMAN.

oracle@istanbul$ rman target /
connect auxiliary sys/kubi@to_host_london
allocate auxiliary channel ch1 type disk;
duplicate target database for standby dorecover nofilenamecheck;
release channel ch1;

2.4 Add the standby log files on the standby database

If you watch the backup in the alert.log file RMAN gives the recommendation of adding the standby log files anyway.

ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo03.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo02.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo01.log' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE '/u01/oradata/test01/sbyredo04.log' SIZE 52428800;

2.5 Create SPFILE for the standby database

SQL> create spfile from pfile='/usr/local/oracle/product/';

2.6 Bounce the standby database to pickup the changes and start with an spfile.

SQL> startup force mount;

2.7 Put the standby database in ARCHIVELOG mode.

SQL> alter database archivelog;

2.8 Put the standby database inf FLASHBACK mode

SQL> alter database flashback on;

2.9 Put the standby database in constant recovery mode receiving archived logs from the primary database

To start Redo Apply, issue the following statement:


3. Oracle Data Guard and Data Broker (DGMGRL) configuration

3.1. On both Primary and Standby Database set the parameter DG_BROKER_START to TRUE


3.2. Add global_db_name parameters and other parameters to the listener.ora file on both primary and standby as follows, on Primary Database your listener.ora

(ORACLE_HOME = /usr/local/oracle/product/
(PROGRAM = extproc)
(SID_NAME = test01)
(ORACLE_HOME = /usr/local/oracle/product/
(ORACLE_HOME = /usr/local/oracle/product/

on Standby Database your listener.ora

(ORACLE_HOME = /usr/local/oracle/product/
(PROGRAM = extproc)
(SID_NAME = test01)
(ORACLE_HOME = /usr/local/oracle/product/
(ORACLE_HOME = /usr/local/oracle/product/

3.3 Create the DGMGRL broker Configuration

DGMGRL> connect sys/kubi@to_host_istanbul;

DGMGRL> create configuration mediterranean_dg as primary database is host_istanbul connect identifier is to_host_istanbul;

DGMGRL> add database host_london as connect identifier is to_host_london maintained as physical;

DGMGRL> enable configuration;

3.4. Enabling Fast-Start Failover and the Observer


3.5 Specify the FastStartFailoverTarget property

DGMGRL> EDIT DATABASE 'host_istanbul' SET PROPERTY FastStartFailoverTarget='host_london';
DGMGRL> EDIT DATABASE 'host_london' SET PROPERTY FastStartFailoverTarget='host_istanbul';

3.6. Upgrade the protection mode to MAXAVAILABILITY, if necessary.


3.7. Enable fast start failover


3.8. Start the observer.

DGMGRL> CONNECT sys/kubi@to_host_istanbul;
Observer started

The above command will just hang, will not return you back to the prompt, this is how the observer is started, it is normal. Start anothter dgmgrl prompt for the rest of the operations.

3.8 Check the configuration so far

DGMGRL> connect sys/kubi@to_host_istanbul
DGMGRL> show configuration;

Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
host_istanbul - Primary database
host_london - Physical standby database
- Fast-Start Failover target

Current status for "mediterranean_dg":

Also see the verbose output where you can see how long it will take for a box to failover to the other.


Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
host_istanbul - Primary database
host_london - Physical standby database
- Fast-Start Failover target

Fast-Start Failover
Threshold: 30 seconds
Observer: istanbul

Current status for "mediterranean_dg":

4. Do the failover

Test by killing PMON of database instance test01 on the primary database box istanbul, this automatically will trigger a failover within 30 seconds to the standby database as configured. You can watch this exciting event happening by looking at the alert_tst01.log on both boxes simultaneously.Once you crashed test01 on host_istanbul, then on host_london go to dgmgrl and check out what it says.

oracle@london$ dgmgrl

DGMGRL> connnect sys/kubi@to_host_london

DGMGRL> show configuration;

Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
host_istanbul - Physical standby database (disabled)
- Fast-Start Failover target
host_london - Primary database

Current status for "mediterranean_dg":
Warning: ORA-16608: one or more databases have warnings

That is, host_istanbul database is down!

Be careful, the database is down and NOT the box. As I am running observer on host_istanbul as well, if you unplug the box, probably nothing nice will happen as the observer will be incapable of detecting anything. In a real life situation I suppose the observer runs on a 3rd piece of hardware. I haven't tested the 'unplugging' of the box, I don't know what it will do. I just 'kill -9' the PMON backgroud process for the primary database instance test01.

Anyway, your primary database now is host_london. You have failed over to it successfully in 30 seconds.

If you followed the configuration steps above you will be able to REINSTATE a disabled standby database as you have flashback enabled and put it back in the High Availability environment after you mount it. This is what we will do next.

5. Reinstate the failed primary database as a physical standby database after the failover

Fast-Start failover is a very good configuration for High Availability (HA) as it requires no intervention from the DBA, almost, you wish!

Say the primary database fails, say at 04:00 am, when you are sleeping, you get no phonecall from your boss, and the standby database becomes primary. Oleey! business continues. The next day when you realise what happened, you can with Data Guard Observer and using flash back logs reinstate the failed primary to be a standby to the new primary.

Basically Data Guard 'rolls forward' the SCN of the old Primary to match that of the now new primary (ex Standby). That is what Flashback Database does and it is all about. Moving the SCN to catch up with the primary.

In all situations, you will have to manually mount the failed ex-Primary database, probably the next day, if it has no media failure and can be restarted without any problems.

Oracle Data Guard Broker can then reinstate the failed primary database as the new standby database. Changing of roles. But, before you reinstate the failed primary database host_istanbul, first see the state of both databases by logging in to the DGMGRL from the new primary database host_london.

5.1 Login to DGMGML from the new primary database host and checkout your configuration

oracle@london:~$ dgmgrl
DGMGRL for Linux: Version - Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/kubi@to_host_london;
DGMGRL> show configuration verbose;

Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
host_istanbul - Physical standby database (disabled)
- Fast-Start Failover target
host_london - Primary database

Fast-Start Failover
Threshold: 30 seconds
Observer: istanbul

Current status for "mediterranean_dg":
Warning: ORA-16608: one or more databases have warnings

5.2 After this you are sure you have failed over and the new primary database is working properly go to the failed ex-primary database box and mount the instance test01.

oracle@istanbul:~$ sqlplus /nolog

SQL*Plus: Release - Production on Thu May 24 23:35:58 2008

Copyright (c) 1982, 2005, Oracle. All rights reserved.

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 608174080 bytes
Fixed Size 1220844 bytes
Variable Size 167776020 bytes
Database Buffers 436207616 bytes
Redo Buffers 2969600 bytes
Database mounted.

5.3 Go back to DGMGRL on new primary database host_london and run the reinstate database command

DGMGRL> reinstate database 'host_istanbul';

You have to give it some time, the configuration will not immediately show the reinstated status, but after a few minutes you will get the following.

DGMGRL> show configuration;

Name: mediterranean_dg
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
host_istanbul - Physical standby database
- Fast-Start Failover target
host_london - Primary database

Current status for "mediterranean_dg":

Congratulations! You just had a successfully failed over and now your primary database is host_london and the standby database is host_istanbul and the obsesrver is watching them.


Be patient when you are working with this configuration use a test system, you can use VMWare to have multiple nodes in one box and make sure you have:

  • password files with the same password on both machines.
  • you have tnsnames.ora set up on both machines and you can communicate between both machines via service names
  • you have flashback enabled on both databases
  • you have configured a db_domain and is the same in both init.ora files

  1. Oracle® Data Guard Concepts and Administration 10g Release 2 (10.2) Part Number B14239-01 link:

  2. Using Data Guard to do fail-over and other cool features here:

Friday 18 April 2008

LAG, the Analytic Function

LAG is a nice analytic function in Oracle SQL which lets you access previous row of a row at the same time you access the current row.

The Oracle documentation says:

LAG is an analytic function. It provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. Found here:

Lets suppose you want to profile your customer by looking in the orders table and trying to figure out when was the last time the customer made an order. Even more you want to see the frequency of his/her ordering. How often he/she is making orders. That is, when was the last time you took an order and when was the last time before the last time ...etc.

Suppose your ORDERS table is this:

---------- --------- --------------
1 25-FEB-08 1
2 25-MAY-06 1
3 25-JAN-08 1
5 29-JAN-07 3
6 25-JAN-08 3
88 20-JAN-04 2

Then if you wanted to look at FK_CUSTOMER_ID=1 and see the history, you can write something like this with the LAG analytic function:

select FK_CUSTOMER_ID, order_date, lag(order_date, 1) over (order by order_date) "PREVIOUS ORDER DATE",
order_date - (lag(order_date, 1) over (order by order_date)) "DAYS AGO"
from orders
order by 2 desc

Which returns the history of the customer orders in a way which gives you an idea on what sort of the customer the customer is:

-------------- --------- --------- ----------
1 25-FEB-08 25-JAN-08 31
1 25-JAN-08 25-MAY-06 610
1 25-MAY-06

Analytics are cool!

Friday 11 April 2008

scheduling jobs in the database with DBMS_JOB

Quickly submitting jobs with the Oracle DBMS_JOB package. The example below is an oracle schema user who wants to schedule the execution of a PL/SQL procedure at certain times during the day.

From SQL*Plus :

SQL> variable n number
SQL> exec dbms_job.submit( :n, 'YOURPLSQLPROCEDUREHERE;', sysdate, 'trunc(sysdate)+1+1/288' );
SQL> commit;

From PL/SQL :

l_job number;
dbms_job.submit( l_job,
'trunc(sysdate)+1+20/24' );

Some Scheduling time semantics
# 1/288 means every 5 minutes
# 1/24 means every hour
# trunc(sysdate)+1+11/24 means at 11am every day

Tuesday 8 April 2008

SQL CASE statement and Aggregation

I find the CASE statement in SQL very good when it comes to classify data within ranges. Suppose you have a table called DEVICE which logs errors from some kind of machines and you wanted to classify the errors into categories according to the frequency of their occurance. Here is how you could use the CASE statement

Create the test table



insert into device values (1,'Engine 5','09-MAR-08','leak');
insert into device values (1,'Engine 5','10-MAR-08','leak');
insert into device values (3,'Cam Belt','10-MAR-08','broken');
insert into device values (3,'Cam Belt','11-MAR-08','broken');
insert into device values (3,'Cam Belt','12-MAR-08','broken');
insert into device values (3,'Cam Belt','13-MAR-08','broken');
insert into device values (3,'Cam Belt','14-MAR-08','broken');
insert into device values (5,'Cockpit','24-MAR-08','lights out');
insert into device values (5,'Cockpit','25-MAR-08','lights out');
insert into device values (5,'Cockpit','23-MAR-08','lights out');
insert into device values (7,'Deck 34','29-MAR-08','starboard light green');
insert into device values (7,'Deck 34','28-MAR-08','starboard light green');
insert into device values (7,'Deck 34','28-MAR-08','starboard light green');
insert into device values (7,'Deck 34','31-MAR-08','starboard light green');
insert into device values (7,'Deck 34','30-MAR-08','starboard light green');
insert into device values (7,'Deck 34','05-APR-08','starboard light green');
insert into device values (7,'Deck 34','04-APR-08','starboard light green');


And this is a table which would normally aggregate like this:

-------- --------
Engine 5 2
Cockpit 3
Deck 34 7
Cam Belt 5

Now let's use the CASE statement

select dname,
sum(case when cnt between 1 and 5 then cnt else 0 end) "NORMAL LEVEL",
sum(case when cnt between 6 and 11 then cnt else 0 end) "TOLERABLE LEVEL",
sum(case when cnt between 12 and (select count(*) from device) then cnt else 0 end) "DANGEROUS LEVEL"
select dname, count(*) cnt from device
group by dname
group by dname;

And here is the resultset with the CASE statement categorizing the aggregation by range.

-------- ------------ --------------- ---------------
Engine 5 2 0 0
Cockpit 3 0 0
Deck 34 0 7 0
Cam Belt 5 0 0

Saturday 15 March 2008

Relational Model to Dimensional Model

In this post I will demonstrate how a normalized relational data model which is in 3rd Normal Form (3NF) evolves to a dimensional data model (star schema) in a data warehouse.

You can use this post to practice your dimensional modeling and find tips and clues on how to transform the solid Relational Data model of your operational OLTP environment into a dimensional data model suitable for your data warehouse.

There is lots of literature on benefits and uses of dimensional modeling and design in data warehouses. Primary author in this area is Ralph Kimball and his book called The Data Warehouse Toolkit. After I read this book I have been tempted to investigate his approaches hands-on and describe them in a post.

Why do we have to use Dimensional Modeling in the Data Warehouse?

According to Ralph Kimball the necessity for Dimensional Modeling in the warehouse is the complexity and the performance problems of Relational Models. He believes that relational models are complex and hard to understand when presented to business people and the granularity of information in these models is too much. Business people are after slicing and dicing the data and expect quick response times to their ad hoc queries. But business people, in order to be able to pose the query, first they must understand the data. Kimball says that dimensional modeling is able to do that and make business managers better understand data in their organisations.

His second argument is on performance. The relational models are too detailed he says, they contain to many tables related to each other, which make them difficult to join. He argues that the relational models describe in one big picture, business processes which usually do not happen at the same time (ie, sales and delivery). Data normalization and the goal to avoid redundancy and inconsistency in OLTP systems is what drives relational models. But, in the data warehouse the drive is different. Data warehouses are all about providing historical information for general use in reporting and decision making. It is not about capturing and ensuring that the information is consistent. He does not by any means suggest that Data Warehouse data is not consistent. He only makes the point that their driving idea is different. Operational OLTP systems have little data, maybe the last week of operations, whereas Data Warehouses have data for the last 5-10 years. The large amount of data is another factor effecting performance in data warehouses.

All this is fine. Now lets look on how Kimball suggests that we change our Relational Model to a Dimensional model.

When attacking a corporate data model to transform it to a dimensional data model to be used in the data warehouse, the designer must know very well the definitions of the fact table this table is also called cube or multidimensional hypercube and the dimension table. Because these are the only 2 different kinds of tables you have in a dimensional model. Loosely speaking you have to distinguish and blend these 2 kinds of tables from your relational model and at the end you will have a dimensional model!

Definitions of the Fact Table and Dimension Table from Ralph Kimball's book are below.

Fact Table: A fact table is the primary table in a dimensional model where the numeric performance measurements of the business are stored.

Dimensions Table: Dimension tables are integral companions to a fact table. The dimension tables contain the textual descriptors of the business.

Equipped with the above knowledge let's look at our sample relational model of a business:

In redesigning this relational model to a dimensional model, first thing you will have to do is to find the fact table. When looking for the fact table, as Kimball says, you will have to look for the numeric measurements tables, or the tables which count things, the tables which record transactions and the tables which are constantly changing and are very big with lots of rows. Another way to look for them is to find out the many-to-many relationship tables or intersection tables of the relational model. Those tables are the best candidates for the fact table. In our sample model above, best candidates are the ORDERS and ORDER_ITEMS tables as these are the tables which record transactions, constantly changing information and are the largest tables in the model. CUSTOMERS, SALES_CHANNEL and PRODUCTS tables are not measurement tables or tables with constant activity on them. They are more likely to be dimension tables rather than fact tables.

Nothing stops you from introducing new dimensions in a dimensional model, even dimensions which can not translate or do not exist in your relational models. See the TIME_DIMENSION table below which is a table of dates. If you find it appropriate to have a table just to record dates and date attributes, such as holidays, weeks, months, quarters, you can go ahead and create one. It is a standard practice for such dimensions to exist in data warehouses.

So as you can see later in the dimensional model, what happens when re-designing a relation model as a dimensional model is that the primary keys of the dimension tables become the foreign keys in the fact tables. That is, the fact table which is the measurement table is full of foreign keys coming from the dimension tables. Initially if you visualize such a table you might realize that the fact table is not in 3NF, as redundant data will exist in such a table and a non-prime attribute from such a table can depend functionally to another attribute and not directly to the primary key. In many cases the fact tables might not even have a primary key but might have composite primary keys. So redundancy is permitted and normalization laws of data are relaxed in fact tables.

All this for the sake of performance and the ability to analyze data and for a better representation of data to business people, as Kimball points out, Data Warehouses are to serve business people, departmental chiefs and CEOs and Data Warehouse Administration is somewhere between a DBA and an MBA.

In this case our relational model above could become a dimensional model, star schema, like the one below.

Here as you see the ORDERS and ORDER_ITEMS tables, the tables which record day to day measurements for the business are integrated into the SALES_FACTS table, which is the fact table of this dimensional model.

Now the next step after this design would be to find a way to do ETL (Extract, Transform, Load), that is, to extract the information from the OLTP system (relational model) periodically and to insert it in the data warehouse (dimensional model). Then your dimensional model would be ready to be used in your OLAP environment analytic workspace for reporting applications or with your Business Intelligence software for any kind of analytics and performance magic these systems can offer in combination with your dimensional model.

Resources: The Data Warehouse Toolkit - Ralph Kimball, Margy Ross

Wednesday 5 March 2008

Data Warehouse, Data Mart, Data Mining and OLAP Definitions

Have you ever heard industry jargon like Data Warehouse, Data Mart, Data Mining, OLAP and more. Did you ever wanted to know in one sentence what are these words all about? Then keep on reading I have put together some quick definitions from Oracle docs.

Data Warehouse: (An Oracle Docs Definition)

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but can include data from other sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources.

Data Mart: (An Oracle Docs Definition)

A data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, the data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources. Is the categorization of your datawarehouse data to a particular line of your business. For example BIG DATAWAREHOUSE DATA becomes PURCHASING DATA and SALES DATA and INVENTORY DATA. So a Data Mart is, loosely speaking, a piece of the big thing.

Data Mining: (An Oracle Docs Definition)

Too much data and not enough information — this is a problem facing many businesses and industries. Most businesses have an enormous amount of data, with a great deal of information hiding within it, but "hiding" is usually exactly what it is doing: So much data exists that it overwhelms traditional methods of data analysis.

Data mining provides a way to get at the information buried in the data. Data mining creates models to find hidden patterns in large, complex collections of data, patterns that sometimes elude traditional statistical approaches to analysis because of the large number of attributes, the complexity of patterns, or the difficulty in performing the analysis.

OLAP: (An Oracle Docs Definition)

OLAP functionality is characterized by dynamic, multidimensional analysis of historical data, which supports activities such as the following:

  • Calculating across dimensions and through hierarchies

  • Analyzing trends

  • Drilling up and down through hierarchies

  • Rotating to change the dimensional orientation

OLAP tools can run against a multidimensional database or interact directly with a relational database.


Oracle® Database Data Warehousing Guide 10g Release 2 (10.2)

Oracle® Data Mining Concepts 10g Release 2 (10.2)

Thursday 28 February 2008

RMAN Hot Backup Script

The following unix shell script will do a full RMAN hot backup to your database and will copy the backup files compressed to the directory you will specify. This is a hot-backup, and the database must to be in ARCHIVELOG mode for this to work.

A backup retention policy with a recovery window of 2 days is defined in this script. With this retention policy RMAN will keep archive logs and backup files necessary to recover to any point of time within those 2 days in the recovery window. Older backups and archivelogs not needed to satisfy this retention policy will be automatically deleted by this script. No RECOVERY CATALOG is being used with this script, instead database controlfiles are used to record the RMAN repository information.

Make sure you set the rman CONFIGURE CONTROLFILE AUTOBACKUP parameter to ON in in RMAN in order to take extra backups of the controlfile and spfile (RMAN will not backup init.ora files) as extra protection.

This script will delete obsolete backups and not needed archive logs from the disks only after a successful backup. This means you don’t need to set up cronjobs or manually delete not needed backups. This is the beauty of using RMAN. It does this automatically with the commands: “…DELETE NOPROMPT OBSOLETE;…” and “…DELETE NOPROMPT EXPIRED BACKUP…” See last lines in the script.

The RMAN hot backup script

# Declare your ORACLE environment variables
export ORACLE_SID= (put your SID here)
export ORACLE_BASE= (put your ORACLE BASE here)
export ORACLE_HOME= (put your ORACLE_HOME here)
export PATH=$PATH:${ORACLE_HOME}/bin

# Start the rman commands
rman target=/ << EOF
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO ‘/(put backup directory here)/autobackup_control_file%F’;
run {
BACKUP AS COMPRESSED BACKUPSET DATABASE FORMAT ‘/(put backup directory here)/databasefiles_%d_%u_%s_%T’;
BACKUP AS COMPRESSED BACKUPSET ARCHIVELOG ALL FORMAT ‘/(put backup directory here)/archivelogs_%d_%u_%s_%T’ DELETE INPUT;
BACKUP AS COMPRESSED BACKUPSET CURRENT CONTROLFILE FORMAT ‘/(put backup directory here)/controlfile_%d_%u_%s_%T’;

Note: If you run this script on a database which was being backed up using OS methods and user managed backups, initially it will fail as it will look to satisfy the ‘… ARCHIVELOG ALL…” clause. That is it will try to backup archivelogs since day 1. Well, unless you have lots of money to allocate for storage, we know that is not practical to keep all archivelogs and with OS backups usually we delete them manually according to the chosen backup retention pollicy. The script once run initially will fail like this:

Starting backup at 24-MAR-08
current log archived
using channel ORA_DISK_1
archived log /u00/arch/1_2_650282994.arc not found or out of sync with catalog
trying alternate file for archivelog thread 1, sequence 2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 03/24/2008 10:43:29
RMAN-06059: expected archived log not found, lost of archived log compromises recoverability
ORA-19625: error identifying file /u01/arch/1_2_650282994.arc
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3

As you can see the script assumes that you have all archivelogs, if that is not the case (most likely), and you have been periodically deleting them ie. crontab, or manually, you can always sync the existing RMAN catalog (or controlfile) by crosschecking what is there on the disk before you run it with the following command from RMAN:

For before Oracle 9i

RMAN> change archivelog all crosscheck;

or in Oracle 9i,10g

RMAN> crosscheck archivelog all;

This will crosscheck existing archivelogs and will show you the ones which are EXPIRED and need to be deleted from the RMAN catalog, that is, they are not on the disks anymore, cause you have deleted them not using RMAN. RMAN doesn’t know this, it expects them! To delete these misleading entries from your control file you will have to run the command:

RMAN> delete expired archivelog all;

After you delete the expired archivelog records from the control file you can run the script and take your backups.

To see how a recovery is done with a backup taken with this script on a New Host with the same directory structure search this blog for RMAN Recovery.