SQL: This is your life.
History of SQL
In 1970 Dr. E. F. Codd published his paper,“A Relational Model of Data for Large Shared Data Banks”. This became the foundation for the relation database system. This paper described a new way to structure data within a database, and led to the relational database systems we use today.
While Dr. Codd’s paper defined the structure, his colleagues Donald D. Chamberlin and Raymond F. Boyce at IBM had been developing a query language known as SQUARE, which used set theory and predicate mathematics to manipulate data in the database. This language had a terse mathematical syntax, but became the proving ground for concepts which are important to database manipulation.
By 1974, Chamberlin and Boyce published “SEQUEL: A Structured English Query Language” which detailed their refinements to SQUARE and introduced us to the data retrieval aspects of SEQUEL. This new SEQUEL language was as powerful as SQUARE, but Chamberlin and Boyce “kept in mind the notions of top down structured programming, the need for linear notation, and the need for readable programs that are easy to maintain and modify” (Chamberlin & Boyce, 1974). The resulting syntax could be described as block-structured English keyword syntax.
This type of syntax had many advantages, such as being easily learned by non-experts. Both SQUARE and SEQUEL had the advantage of being declarative languages. This allowed users to specify ‘what’ to do rather than ‘how’ to do it as in imperative, or procedural languages.
SEQUEL was later renamed to SQL by dropping the vowels, because SEQUEL was a trade mark registered by the Hawker Siddeley aircraft company. Thus SQL was born.
The success of the relational database model, along with the Structured Query Language (SQL), meant that by 1986, over 20 independent vendors, and practically every computing platform was supported. This is when the American National Standards Institute (ANSI) published the first SQL standard, ANSI X3.135-1986. A few months later, the International Organisation for Standardisation (ISO) published an identical standard, ISO 9075-1987, which is better known as the SQL-87 standard. By this time, the numerous vendors had differing implementations of SQL, and ANSI X3.135 was generally seen as the least common denominator of existing implementations.
ANSI X3.135 failed to standardise many popular and necessary features of the language. Though this standard provided the ability to invoke SQL capabilities from COBOL, FORTRAN, Pascal and PL/I.
The abilities of SQL can broadly be categorised into three groups, Data Definition Language (DDL), Data Manipulation Language (DML) and Data Control Language (DCL).
Data Definition Language
Data Definition Language (DDL) encompasses SQL statements such as CREATE …, ALTER …, and DROP … These are statements which an operator would use to define and manipulate objects within the database, such as tables, views, constraints,indexes, sequences, etc. Data Definition Language is a subset of the available SQL commands which directly manipulate the underlying database schema.
An example of creating a simple table would be:
CREATE TABLE EMPLOYEE
( NAME VARCHAR2(20),
Data Manipulation Language
Data Manipulation Language is the subset of SQL’s statements such as INSERT…, UPDATE…, DELETE… and SELECT…. These statements operate directly on the data contained within the structures of the database. Typically these are the statements most often used on a day to day basis for loading and unloading data, and generating reports based on that data.
Inserting data into the table we created earlier with DML would involve the use of an INSERT statement:
INSERT INTO EMPLOYEE
(NAME, SSN, BDATE)
VALUES ('Chris Collins', '123456789', '20-AUG-78');
Reading data from the database makes use of the SELECT statement. Given our example table, displaying all employees born in 1980 would be:
SELECT ssn, name
WHERE bdate > '01-JAN-80'
AND bdate < '31-DEC-80';
While INSERT adds new data, UPDATE is used to alter existing data. For example to change the date of birth of ‘Chris Collins’ which was added to the employee table earlier:
SET bdate = '19-APR-1980'
WHERE name = 'Chris Collins';
Data Control Language
Examples of Data Control Language are GRANT… and REVOKE… statements. These control who has access to the various objects within the database. Using these permissions on objects can be granted or revoked from users or roles within the database. This is primarily used as a security mechanism as not all users of a database should have unlimited access to all the data contained within the database.
Ensuring that only members of the ‘HR’ role have access to the employee table makes use of the grant command:
GRANT select, insert, update, delete on EMPLOYEE to HR;
SQL also has support for embedding in several third generation languages, such as COBOL, FORTAN, C, C++, etc. An example of embedded SQL syntax is given here:
EXEC SQL DECLARE c1 CURSOR FOR
EXEC SQL OPEN c1;
EXEC SQL FETCH c1 into :video_title;
This shows the creating of a cursor, and the retrieval of a data item from it into a local variable.
Embedded SQL made use of a preprocessor, which would parse all source files containing embedded SQL and replace the SQL syntax with appropriate subroutine calls. These pure third generation language files were then compiled normally producing an application which could connect to and manipulate the database as specified by the embedded SQL.
Application Programming Interfaces
In addition to the abilities afforded through the use of embedded SQL, application developers began to make more use of SQL application programming interfaces (APIs) as time went on. Chief among these was Open DataBase Connectivity (ODBC).
The SQL standard specified the statements that an application can use to store, retrieve or otherwise manipulate a database and its content. This standard though, doesn’t specify how an application should deliver the statements to a database server, or how to retrieve the database responses and content from the server.
This meant that while SQL statements could be standardised, and written to be vendor neutral, getting them in and out of the database still required the use of vendor specific interfaces. This effectively coupled an application to the database vendor, meaning that the application would have to have portions rewritten if another vendor’s database was to be used.
With the development of the ODBC middleware, applications have a database neutral means of delivering SQL to the database server. The main goal of ODBC was to provide a standardised way of providing a pipeline between applications and the databases they depend on.
An application is written to make use of the ODBC driver application programming interface (API). The ODBC driver translates those application commands into the vendor specific commands for the database type being accessed, it also translates the database responses, and returns standardised ODBC responses.
If the application is moved to a different database, then the ODBC driver and database are changed. No changes are required to the application. The new ODBC driver will ensure that the same ODBC commands are translated between the application and the database using appropriate protocols.
ANSI SQL-86 was the first attempt at standardising SQL, and effectively became a standard which documented the common ground amongst database vendors at the time. The US government realised the importance of portability and standardisation, so the Federal Information Processing Standards (FIPS) body mandated standards that must be adhered to in order for technology to be bought by the US government and its agencies. FIPS 127-2 was the database standard mandated by the US government, which was effectively ANSI X3.135-1992. As a result of this, many database vendors adhered to the ANSI SQL-92 standard.
Despite this ideal situation, SQL portability remains elusive. Several explanations exist for why this is so:
- The standard does not specify database behaviour in several important areas (e.g. indexes), leaving it up to implementations of the database to decide how to behave.
- The SQL standard precisely specifies the syntax that a conforming database system must implement. However, the standard’s specification of the semantics of language constructs is less well-defined, leading to areas of ambiguity.
- The complexity and size of the SQL standard means that most databases do not implement the entire standard.
- The database standard does not prohibit a vendor from providing custom extensions.
The best that can be said of SQL standardisation efforts so far, is that they have been mediocre. ANSI and other standardisation bodies have been playing catchup with database vendors, resulting in specifications which specify the common ground between vendors rather than mandating the important aspects of database operation in a complete and detailed way. The result of which is a standard which has been too easily abused by database vendors.
SQL allows databases to adhere to the ACID constraints. ACID stands for Atomicity, Consistency, Isolation and Durability, and are four important attributes of a database system if data integrity is to be assured.
SQL supports the isolation constraint through the use of transactions. A transaction is begun when a user connects to a database and ends when they quit, or execute a COMMIT statement. A new transaction is then created.
If a user makes changes to the data in a database, only that user sees those changes until they are committed with the COMMIT statement. This ensures that users are isolated from each other, and only committed changes are shared amongst all users.
Transactions also ensure atomicity and consistency. Atomicity refers to the feature of a transaction that all elements of the transaction should succeed or none should. This all or nothing aspect of transactions ensure that transactions cannot make partial changes to the database. For example, to transfer money between bank accounts requires a deduction from one account before an addition to another. If one of these operations fails, then the whole transaction should fail, or the data in the database will be left in an inconsistent state.
Consistency is the requirement that the database be in a legal state before and after a transaction fails or succeeds. That is, a transaction will fail if it would break primary key, foreign key, or other constraints placed on the database.
Future of SQL
With regard to the future of SQL, it is sure to grow and change in line with the database industry as a whole. Corporations are moving away from single centralised databases, preferring databases distributed across a multitude of network and Interent connected devices.
While Sun Microsystems may have coined the line, “The network is the computer”, Oracle or other database vendors could easily coin the line “The network is the database”. Enterprise data is now spread widely across the enterprise, contained on devices from PDAs to large servers and everything in between. This will require better database support for distributed data, data caching, synchronisation and data staging. This will eventually lead to extensions to SQL to support these types of operations.
Data warehousing operations are no longer seen as end of the month, or end of quarter activities. Enterprises are now attempting to operate “business on demand” by recognising the asset that their operational databases provide. They no longer want to wait while data is manipulated, massaged and transformed into separate decision support systems (DSS) before trying to analyse trends and correlations. Ideally enterprises would like to execute business strategy changes in real time while analysing operational data as it occurs. This will also put pressure on SQL to provide such support.
In conclusion, the history of SQL closely follows the history of the database industry since the relational revolution in the 1970s. As SQL now has many applications in the field, throwing it away when the next database revolution occurs will be difficult for many. Until then, SQL will continue to evolve as the demands on databases and the supported applications changes.
D D Chamberlin, R. F. Boyce. (1974) SEQUEL: A Structured English Query Language. Retrieved May 16, 2007 from http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf.
Melton. J. (1994) SQL: The Standard and the Language. Retrieved May 25, 2007 from http://archive.opengroup.org/public/tech/datam/sql.htm.
Nance, B.(1999) Data Access Via ODBC and JDBC. Retrieved on May 30, 2007 from http://www.networkcomputing.com/netdesign/odbc1.html.
National Institute of Standards and Technology. (1993) FIPS PUB 127-2 Federal Information Processing Standard. Retrieved June 5, 2007 from http://www.itl.nist.gov/fipspubs/fip127-2.htm.
Wikipedia Contributors, (2007). ACID. Retrieved June 28, 2007 from http://en.wikipedia.org/wiki/ACID.
Wikipedia Contributors. (2007) Structured Query Language. Retrieved June 5, 2007 from http://en.wikipedia.org/wiki/Sql.
Vaswani V. (2004). The Future of SQL. Retrieved June 28, 2007 from http://www.devshed.com/c/a/MySQL/The-Future-of-SQL/.