Sunday, 4 March 2012

Big Data


The other day I watched the Oracle Big Data forum. Now available here. A half-day event with various speakers on the subject of BigData, including Tom Kyte , a mentor who I admire!

In the forum they have gone over Oracle's approach to Big Data and allow me to summarise it below:
  1. Acquire - Collect Big Data, identify it, where is it? Then store it in Oracle NoSQL - a value-pair database

  2. Organise - Stage Big Data in a transient elastic database. Using Oracle Data Integrator and the Oracle Hadoop connector, reduce and distil it.

  3. Analyse - Start Analytics on the now acquired (reduced/distilled) and organised Big Data, using variety of Oracle Tools, like Language R, pattern chasing etc.

  4. Decide - Present your data to the decision makers with dashboards, back into a relational database etc...
Looking at the summary above, it really describes Big Data as something as ... a distillation of a massive amount of data ... two questions come to my mind:
  • Where is Big Data?

  • Why do we want Big Data?
Answering the second question is easy. We want Big Data because is all about having detailed information to allow us to make better decisions. Classical "Your Boss's Decision Making" use of data, if you work for business. If you work in astrology probably the question will be: "Is there life in space?"

The first question, "Where is Big Data?" I think is the one which will probably take more time and effort to answer. Or maybe if we slightly re-write the question to "Where can I find Big Data useful for my Business?" would be even more correct to say, as Big Data must make money, too. In a summary, this was what Oracle said, or at least what I understood from it. Very interesting indeed. Oracle looks at Big Data as another data source which is cool too.

Below is my rumblings on the topic of Big Data

Let me start by saying that, there were many interesting examples used to describe Big Data in the Oracle forum. For some Big Data is things such as the heartbeats of patients collected. For others, as data collected from petroleum pipe sensors on oil rigs. Other descriptions saying, that BigData is sensor data, accumulated during flights (apparently 7TB on a flight between London and New York) and more Big Data is Machine Data or even Big Data is Dark Matter!

Whatever we call it, Big Data is massive! A beast which is usually always truncated, due to lack of space in a relational database - rolling window. A beast which we can not collect in a relational database. A beast, which has to be written quickly as its generation is too fast for a relational database table to catch up, or its schema is too constraining. A thing which cannot be visualised in the I/O of a single system kit, no matter how expensive and grand that system kit is.  Big Data seems like can only live in a cluster of system kits spread out and wide from Australia to the North Pole?

Wait a minute, does this beast fit in a NoSQL value pair database! Bob is your uncle! It fits somewhere, then. But where? In NoSQL. A loose database with no Constraints - where constraints and the whole database logic is in the hands of the web app developer. I am not defending relational databases here, just arguing on the new NoSQL theory. NoSQL, a database which is not based on the sound and proven mathematics of Relational Theory, Calculus and Algebra, a database which is hardly transactional and not consistent (eventual consistency) and integral. Hmmm.... Whatever, as long as Big Data fits in a NoSQL database and we can capture it, is fine. Capture now, analyse later. Just store and spread it out. Then bring in all the metal and CPU and memory in the world (in parallel) to crunch this. What a Great idea!

I am persuaded by Big Data. Not all data needs to be stored relational, and ACID is a theory of transactions and not a must have database property.

What bugs me is, now that we are taking away all constraints and logic from database data and allow it to grow to Big Data, when we look back at it, how quickly will we be able to relate other data to it and how easy will it make sense? I am just looking forward to any sort of easy to use prompt (like Pig and Hive) where I can write something like:

 Select * from BIGDATA;

but in the cloud, as I will never be able to afford the metal to host big data anyway, should be fun.

Monday, 27 February 2012

All Watched Over by Machines of Loving Grace

All Watched Over by Machines of Loving Grace


I like to think (and
the sooner the better!)
of a cybernetic meadow
where mammals and computers
live together in mutually
programming harmony
like pure water
touching clear sky.

I like to think
(right now please!)
of a cybernetic forest
filled with pines and electronics
where deer stroll peacefully
past computers
as if they were flowers
with spinning blossoms.

I like to think
(it has to be!)
of a cybernetic ecology
where we are free of our labors
and joined back to nature,
returned to our mammal
brothers and sisters,
and all watched over
by machines of loving grace.



By Richard Brautigan





Thursday, 2 February 2012

Oracle Apex Mobile App

I have built this Mobile Oracle Apex app on a hosting provider showing temperatures on Greek islands. I used jQuery libraries and Oracle Apex Templates. This is a learning prototype.

Features include:

1.Hourly calls to Yahoo Weather API
2.Static Google map API call to show the island on the map
3.More than 1 year daily temperature records to compare with current temperature, per island o. Google charts API
4.Learn the locations of 100+ Greek islands on the map

I am thinking of enriching the database behind this app with other relevant info.

To reach the app point your iPhone to URL below:

http://apex-outsource.com/pls/apex/f?p=533:greekisland




Comments are welcome!



Location:London UK

Friday, 16 December 2011

Salesforce to Oracle Real Time Integration

In this blog post I will show you how to make Web Service like calls from Salesforce to Oracle, using Informatica Cloud Endpoint URL in real time.

In Winter 12 release of Informatica Cloud we will have the ability to make Salesforce outbound message calls to an Informatica Cloud task and enable real-time integration.

The idea is simple

Use Salesforce Workflow Rule (a database trigger) and make an outbound message call whenever a new Account Record is edited or created.

Step by step instructions

1) You will create a vanilla Data Synchronisation task, I called mine 'test ws', where you read the Account fields from Salesforce(left) and you UPSERT them to an Oracle(right), in a table called FROMSFDC like this:


2) Then at the 6.Schedule step of the Informatica Cloud task you will see the a new option called:

Run this task in real-time upon receiving an outbound message from Salesforce






When you tick this option, as seen above, an Endpoint URL is generated for the 'test ws'  Informatica Cloud task. You can call this task whenever you save a record in Salesforce and force it to execute the  UPSERT and update the Oracle table FROMSFDC. Next is to put the above Endpoint URL in a Salesforce Workflow Action (Basically an AFTER INSERT/UPDATE TRIGGER).

3) Next thing to do is to open up Salesforce and go to Workflow & Approvals > Outbound Messages and paste the above Endpoint URL like this:






4) Last thing to do on this is to create a Workflow Rule to fire the Outbound Message you have just created to call the Informatica Cloud task. Here I created something like this:





 
That's it! Now, don't forget to Activate the rule. 

Whenever a new Salesforce Account is added to Oracle As well. Whenever a Salesforce Account is edited it is UPSERTED to Oracle as well. 

Below you see the test records inserted in Salesforce and then Upserted to Oracle (Oracle Apex in this case).

Here is the record in Salesforce being created, look at timestamp.

















That is the record created in Salesforce. Check the timestamp.

And here you can see the same record pushed in real-time to my Oracle Apex account via the Informatica Cloud URL Endpoint option in my 'test ws' task.


















Conclusion:

It is easy to connect in real-time Salesforce and Oracle, or any other database.  The benefits are tremendeous, for something similar with Web Services you would have to write code to send SOAP messages to the Database, and you would have to do lots of configuration work in the database to receive those messages. The beauty with Informatica Cloud Endpoint URL task schedule is that you can call Informatica Cloud tasks in real-time from Salesforce with zero coding and just config.


Note: The Data Synchronisation Endpoint URL functionality of Informatica Cloud will be available in January 2012. I have a beta test org which I have used for this blog post. You can find out more about Informatica Cloud Winter 2012 release here.

Thursday, 1 September 2011

Update data in the same Salesforce Object with a 'Self-join' Informatica Cloud Task


When you use Informatica Cloud you usually use it to do Migration, Integration and Synchronisation. That's about what it is used for, right?

It is all about moving data from point A to point B or  syncing data between A and B. That is what  Informatica Cloud is built for, to move/sync data from A to B and while moving it, maybe transform it as well. A is the 'source' and  B is the 'target'. Correct! A can be an Oracle or ERP database and B can be a Salesforce Org. Classical use case of Informatica Cloud.

Well, not exactly. How about if you think a little bit differently.

How about if A is the source and A is the target, as well!

Yes, that's it. Read from A and Write to A via an Informatica Cloud synchronisation task. A kind of  Informatica Cloud Data Syncronisation Task which will 'self-join' an Object or Table. To be used to 'self cleanse' or  'self update' the Salesforce custom object data or Oracle database table. This is possible, yes we can use  Informatica Cloud as 'self-join' SQL command as well, to edit our data 'in-place' in one single object/table.

I will try to explain this concept with an example. Suppose you have data in your Salesforce standard object called 'Account' which you want to change. A standard approach would be to 'export' data in a CSV file via a report, change the data manually in Excel or something and then use Apex Data Loader to put the changed data back to the Accounts object in your Salesforce Org right?

The screen shots below show you how to do this type of operations without actually exporting the data, but doing it in place with an Informatica Cloud synchronisation task.

What this synchronisation task does is, it uses the same Salesforce Org connection and Account object as 'Source' and as 'Target'. That is it, the Account Standard object for this task is the Source connection and the Target connection at the same time.

This way the Informatica Cloud task, reads the data from the Account object and writes it back, after it transforms it, into the same Account object.


Create a Source on the Account object




Create a Target for the same Account object in the same org. Just choose as target the same object.




Field Mapping a kind  of  'Self-Join', see Id=id




Use DECODE function in the same Field Mapping to manipulate data



The Informatica Cloud string function 'DECODE' does the trick. Observe how it changes the occurrences of the string 'Direct Employer' to 'Employer' in the Type field of the standard Salesforce Account object, which is actually the same object it reads it from.
decode(Type,'Direct Employer','Employer')
When you save and run this task it will read from the same Account object and write back to the same Account object the changed data. All within an Informatica Cloud Data Synchronisation 'Self-join' task! No file downloads, Excel use or whatever. Just one command. Given the plethora of  Informatica Cloud String, Arithmetic, Date etc... functions available, just imagine what you can do to your data!

This is a wonderful and simple example of how Informatica Cloud can be used to cleanse data, I think.