SQL support for Data Warehousing
Information systems must support the day to day business, as well as the long term, strategic goals of an organisation. The problems for most businesses is that supporting day to day business operations and supporting long term strategic planning require different sets of data and tools.
Operational data is used to support the day to day business. Typical operational data would be line items on a till receipt. This type of data is atomic, detailing that Customer X bought Product Y for Price Z at a certain time. This is the type of data which is traditionally stored in highly normalised tables within a relational database system.
Long term planning on the other hand isn’t interested in individual sales for example, but is more interested in how sales trends have changed over periods in the past. The decision support data required to answer these types of questions is typically based on summarisations of operational data over various time periods, locations, etc. While operational data is characterised as atomic, decision support data could be characterised under the headings of, timespan, granularity and dimensionality.
Because of the differing natures of the data, and the differing questions which the data must answer, operational data and decision support data are typically held in separate database systems. The highly normalised and relational nature of operational data makes querying it for decision support quite difficult. Multi-table joins, unions and intersections have to be used, which are difficult to program, and require lots of resources to run on the database. As a result, operational data is periodically extracted, filtered and converted from its database and added to a business’s decision support system (DSS).
The physical layout of a DSS is quite different from an operational database. DSS systems often make use of ‘star schema’ as a layout pattern. In this scheme, individual line items are related to dimension tables. This allows an individual product for example to be categorised under several categories. For example a the location of a sale can be associated with a department, store, city, state, country, region etc. These hierarchies of attributes allow management to ‘drill down’, or ‘roll up’ summarised data to analyse changing trends, or do comparative analysis of region against region, or month to month, etc.
Oracle 10g DSS SQL Support.
Oracle have extended SQL to add functionality to support DSS type operations on operational type databases.
If you think about the operations which have to be done to operational data when it is being extracted from a database and inserted into a DSS system, then you can ask yourself why these operations cannot be done at query time automatically by an extension to the SELECT statement. This is essentially what the ‘Model clause’ does. The Model clause was introduced as an extension to a SELECT statement in Oracle 10g. It’s use allows a SELECT query to format the results of a query into a multi-dimensional array suitable for answering DSS type questions.
An example of the use of the model clause is below. Here sales data is queried from an ordinary view representing sale line items. The model clause partitions the sales by country, dimensions them by product and year, and summarises the sales data.
SELECT SUBSTR(country,1,20) country, SUBSTR(prod,1,15) prod, year, sales
WHERE country IN ('Italy','Japan')
MODEL RETURN UPDATED ROWS
PARTITION BY (country)
DIMENSION BY (prod, year)
MEASURES (sale sales)
sales['Bounce', 2002] = sales['Bounce', 2001] +
sales['Y Box', 2002] = sales['Y Box', 2001],
sales['2_Products', 2002] = sales['Bounce', 2002] +
sales['Y Box', 2002])
ORDER BY country, prod, year;
The results are:
COUNTRY PROD YEAR SALES
-------------------- --------------- ---------- ----------
Italy 2_Products 2002 92613.16
Italy Bounce 2002 9299.08
Italy Y Box 2002 83314.08
Japan 2_Products 2002 103816.6
Japan Bounce 2002 11631.13
Japan Y Box 2002 92185.47
A characteristic of DSS data mentioned above is that it is dimensioned in a hierarchical way. For example a single sale could be classified by day, week, month, quarter, year, etc. Oracle 10g has SQL extensions which help in dealing with hierarchical data also.
The CONNECT BY clause is also an extension to SELECT which allows Oracle to relate rows returned by the select query in a hierarchically linked way. For example, take a bill_of_materials table which details items which are related in a hierarchy. The assembly_id field is used as an ID of the parent of a particular item. In order to reconstitute the hierarchy of items you would issue:
SELECT assembly_id, assembly_name, PRIOR assembly_id parent
START WITH parent_assembly IS NULL
CONNECT BY parent_assembly = PRIOR assembly_id;
This query uses the “START WITH” and “CONNECT BY” extensions to SELECT to start with the item which has no parent (parent_assembly IS NULL) and recursively joins the table to itself to list the items in order of their place in the parent-child hierarchy.
These examples are two of many extensions to SQL for the support of Decision Support and Business Intelligence applications. The use of such extensions to SQL can have benefits to a business as they don’t have to maintain two separate databases for operational and DSS data. An operational database, given enough resources, could be used to support both day to day, and long term strategic planning, through the use of the SQL extensions detailed above.
Coronel. C. Rob. P. Database Systems: Design Implementation & Management. Sixth Ed. 2004. Thomson Course Technology.
Gennick. J. Querying Hierarchies: Top-of-the-line Support. 2001. Retrieved on June 21, 2007 from http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/gennick_connectby.html
Oracle Corp. The SQL Model Clause of Oracle Database 10g. 2003. Retrieved on June 21, 2007 from http://www.oracle.com/technology/products/bi/pdf/10gr1_twp_bi_dw_sqlmodel.pdf.