This paper evaluates and compares some of the possible storage devices and technologies that can be utilized for database storage. We distinguish the benefits of each type, and assess the best possible use for each.
The need for storage
Data is a businesses greatest asset. Taking simple sales transactions as an example, a primary use of this data could be inventory control, where a report shows how much of a product has been sold and how much remains in storage. Secondary use of this data are total monthly, quarterly, and yearly sales. Beyond this, those simple sales transactions cal also support applications of fraud detection, inventory logistics and even focused hiring (Wu, 2002).
Recently, technology choices have been influenced by the introduction of the Sarbanes Oxley Act (SOX) and the Health Insurance Portability and Accountability Act (HIPAA). These laws have specific requirements for data storage and lifetime.
The combination of data as a business asset, and the legal requirements on it’s storage and handling have ensured that data storage is taken extremely seriously by the majority of organizations.
The basis of the majority of enterprise storage is the humble hard disk. Hard disks are random-access storage mechanisms that relegate data to spinning platters (a.k.a. disks) coated with extremely sensitive magnetic media. Advances in hard disk technology have made hard disks the cheapest storage medium for large amounts of data, while at the same time offering acceptable data access times.
Hard disks are electromechanical devices and their operational life is finite. Mechanical wear, electronic failures and media faults can cause issues that render the contents of the drive inaccessible. To overcome this, disks are often arranged into groups of disks usually referred to as Redundant Arrays of Inexpensive Disks, or RAID.
The goal of RAID is to ensure data availability, but often has the beneficial side effects of improved data access times and throughput. There are several different RAID configurations (a.k.a. levels) available for a storage administrator. The most common configurations are (Bigelow, 2007):
· RAID-0 — disk striping is used to improve storage performance, but there is no redundancy.
· RAID-1 — disk mirroring offers disk-to-disk redundancy, but capacity is reduced and performance is only marginally enhanced.
· RAID-5 — parity information is spread throughout the disk group, improving read performance and allowing data for a failed drive to be reconstructed once the failed drive is replaced.
· RAID-6 — multiple parity schemes are spread throughout the disk group, allowing data for up to two simultaneously failed drives to be reconstructed once the failed drive(s) are replaced.
Once RAID arrays are configured, how the server(s) connect to the data stored in them is also important. The simplest form of connection is Direct Attached Storage (DAS). In this configuration there is a one to one connection between a server and it’s supporting RAID array.
Configurations also exist when multiple servers are to be attached to the RAID array. If a DAS server makes it’s storage available through a network connection to the LAN or WAN it can be considered to be a Network Attached Storage (NAS) device.
The most robust, flexible and expensive configuration allows for many to many connections between RAID arrays and servers. Typically using Fiber Channel (FC) switch gear, a network of storage can be built. These Storage Area Networks (SANs) have been growing in popularity in recent years are organizations begin to store data in decentralized networks.
RAID forms the basis of the majority of enterprise storage configurations. It has the advantages of being high speed, having a high storage capacity, providing high data availability, high reliability, security and fault tolerance, depending on the RAID level chosen. Recovery from failure can be difficult in some systems, it has a high cost for optimum systems, and sometimes, users can have a false sense of security from RAID. Typical applications are redundant storage, internet service providers, and supporting database file storage.
Direct Attached Storage offers simplicity, low initial cost and ease of management for individual servers. This comes at a price, as each server must be administered separately, and DAS is inconvenient for data transfer in network environments. Typically DAS is used for data and application sharing, and data archival.
Network Attached Storage has the advantages of being accessible from multiple clients, ease of data sharing, high storage capacity, redundancy, and allows for consolidation of storage resources. On the other hand it is less convenient than SAN for moving large blocks of data. Redundant storage, data sharing and data archival are common uses of NAS storage.
As SAN makes use of high speed fiber channel links it is excellent for moving large blocks of data quickly. It offers exceptional reliability, wide availability, fault tolerance and scalability. Do date it is only the largest of enterprises which make use of SAN due to it’s high cost, and management complexity. This area also suffers from lack of standardization as SAN vendors offer competing but incompatible solutions. That said, SAN is ideal for supporting large databases, bandwidth-intensive and mission-critical applications (TechTarget, 2006).
The type of storage technology chosen is in a large part dependent on the database application it is supporting. The type of storage used to support a stock trading database where the number of transactions is large, would be dramatically different from that used for a video library, where the number of transactions is smaller, but the amount of data moved is much larger.
RAID is ubiquitous in enterprise settings, as redundancy and resiliency to drive failure are must haves for organizations. Beyond this, the technology choice between DAS, NAS and SAN is dependent on the application, the amount of data to be stored, the amount of fault tolerance required and the cost in terms of price and management overhead that an organization can support.
Bigelow, S. J. (2007) Data Storage Components Overview. SearchStorage.com. Retrieved 2 September, 2007 from http://searchstorage.techtarget.com/originalContent/0,289142,sid5_gci1164054_tax302775,00.html
Wu, J. (2002) Business Intelligence: The value of data mining. DMReview.com, Retrieved 2 September, 2007 from http://www.dmreview.com/article_sub.cfm?articleId=4618
TechTarget. (2006) Fast Guide to Storage Technologies. Retrieved 2 September, 2007 from http://whatis.techtarget.com/definition/0,,sid9_gci1088267,00.html