Adapting ITIL to Distributed Web Applications
Introduction to ITIL
The Information Technology Infrastructure Library version 1 (ITIL) was initially published by the Office of Government Commerce in the year 2000. ITIL is a broad framework of best practices which enterprises are using to manage their IT operations. This quickly grew to over 30 volumes within the library, so when ITIL version 2 came to be released a concerted effort to consolidate the processes described into logical sets was attempted. ITIL v3 continues in this vein by consolidating into five core titles:
- Service Strategy
- Service Design
- Service Transition
- Service Operation
- Continual Service Improvement.
ITIL has become the de facto standard for IT Service delivery and support, as businesses report several benefits from using ITIL (OGC, 2000). These include, improved quality service provision, cost justifiable service quality, services that meet business, customer and user demands, integrated centralised processes, everyone knows their role and knows their responsibilities in service provision, learning from previous experience, and demonstrable performance indicators.
ITIL is the first of these type of management protocols which has changed the focus of IT service management from being IT bound to being business bound. ITIL is perhaps the most used of these new Business Driven IT Management (BDIM) tools.
Within Service Delivery (OGC, 2000) several processes are noted:
- Service Level Management
- Financial Management for IT Services
- Capacity Management
- IT Service Continuity Management
- Availability Management
- Configuration Management
- Change Management
- Release Management
- Incident Management
- Problem Management
- Service Desk
- ICT Infrastructure Management
- Application Management
- Security Management
- Environmental infrastructure processes
- Project Management.
Introduction to Web Services
A Web Service is a software system designed to support interoperable machine to machine interactions over a network. Web services are frequently Application Programming Interfaces (APIs) that can be accessed over a network, and executed on a remote system hosting the requested services.
In many ways Web Services fulfil the same needs as Remote Procedure Calls (RPCs) did when middleware was first being developed. Remote procedure calls required both the requester and the service to know a lot about each other, but web services on the other hand have solved this tight coupling by replacing the mechanics of the service request with open standards, based on XML requests being carried over HTTP (Vinoski, 2002). Using these technologies, both a client and a service can communicate without needing to know anything of how the other operates.
Since ITIL is such a wide ranging set of IT service delivery procedures and practices, we will narrow our discussion to those processes where management and technology are closest.
“Capacity Management is responsible for ensuring adequate capacity is available at all times to meet the requirements of the business. It is directly related to the business requirements and not simply about the performance of the system’s components, individually or collectively.” (OGC, 2000a)
Where capacity management and web services meet, is perhaps in the volatile nature of the internet. Sudden spikes and surges of internet traffic can sometimes overwhelm, or otherwise affect web service performance. This effect goes by many names, though perhaps the “Slashdot Effect” (Adler, 1999) was the first term used to describe this internet phenomenon.
Capacity management also integrates with Service Level Management to ensure that all appropriate Service Level Agreements (SLAs) and Operational Level Agreements (OLAs) are being met, and that any adverse impacts on service quality are minimised.
A business must decide where the happy medium lies with regard to capacity. Building ever bigger systems costs ever bigger amounts of money, so at some point a business must decide that the supply capacity is large enough. In order to deal with situations when this is not enough, processes and other tactics must be applied, perhaps to manage demand.
Web services can be adapted to this process by ensuring that adequate capacity is available, through deploying many distributed application servers, web servers, databases and web caches. By using capacity planning, a business can be assured that they can meet today’s demand, meet demand on an ongoing basis as the business grows, and deal with the occasional traffic surge.
As McLaughlin & Damiano (2007) state, “ITIL defines Change Management as the process responsible for controlling the life cycle of all changes. The main goal of Change management is to facilitate valuable changes while not disrupting the other services being provided.”
Changes can occur, when new versions of system software are being deployed, routine maintenance is scheduled, or the system is being expanded with new elements being added to the network. Although the IT infrastructure is changing, service level agreements still need to be maintained, so taking a considered approach to any system, or subsystem changes is prudent.
The change management process interacts with the release management, incident management and problem management processes. Change management is responsible for giving a go signal for a release, while the change management process is often started in response to a required system change identified by incident or problem management.
This is especially important, given that a service within a distributed web services application have interdependencies. Although web services are self contained and self described, there still remains the possibility of system errors creeping in.
Take a service which computes the cost of shipping a product based on the weight of that product. If the weight lookup service was to suddenly change from reporting weights in pounds to kilograms, the shipping cost service may start to return incorrect values.
Incident and Problem Management
ITIL defines Incident Management as the process which “is to restore normal service operation as quickly as possible and minimise the adverse impact on business operations” (OGC, 2000b). Incident management is responsible for tracking incidents such as failed hard disks, power interruptions, etc., and ensuring that effects to SLAs and OLAs are minimised and normal operational levels are recovered as quickly as possible.
The goal of problem management on the other hand is “to minimise the adverse impact of Incidents and Problems on the business that are caused by errors within the IT infrastructure, and to prevent recurrence of Incidents related to these errors” (OGC, 2000b). This is achieved by analysing historical incident data to find correlations, trends, etc., which can provide insight into the root cause of incidents. Problem management interfaces with the Change Management process when fixes for problem root causes are found, such that the IT infrastructure is reconfigured to become more robust.
Incident and Problem management have the biggest impact on the operation of distributed web services. Within IT circles, there has been a shift in focus from device oriented management to service oriented management, as evidenced by Brown & Keller (2006), Jantti & Eerola (2006), Barash, Bartolini & Wu (2007), Hanemann, Sailer & Schmitz (2004), Kajko-Mattsson, Westblom, Forssander, Andersson, Medin, Ebarasi, et al. (2001) and Kajko-Mattsson (2002).
IT service management has moved away from taking care of the hardware. As load balancing, and fail over technologies have become more sophisticated, hardware failures were beginning to have less of an impact on service levels. Continuing in this vein, virtualisation changes the rules of IT service management (VMware, 2007).
In a virtualised environment, operating systems and their supported applications are separated from their hardware. For example, a failed hard disk in a server could render that server inoperable, with a loss of function for it’s operating system and applications, and subsequent loss of quality of service to the business. In a virtualised environment, that operating system and applications can be transparently moved from that failing server to an operational one without any loss of service quality to the business. Now that services and applications become transparently mobile between hardware servers, hardware failures have less impact to a business then previously.
The distributed nature of web services, coupled with a service oriented architecture and virtualisation, means that a business application can continue to operate without loss of service level given that whole operating systems, applications and services are transparently mobile throughout the application network. Services will continue to find each other to satisfy requests regardless of where they physically reside in hardware.
Release management is responsible for the planning and oversight of successful roll out of software and related hardware (OGC, 2000b). Periodically changes in hardware and software are required. These can be instigated as a result of incident or problem management processes, or perhaps because a new application is being deployed by a business to its network.
Given the distributed nature of web services, a considered approach to release management is prudent. For example, a change in the definition of a database table could require corresponding changes in the applications which refer to it.
To mitigate an dangers to operational applications, ITIL recommends sandboxing new applications for a period so as to investigate any knock on effects, and assure that the changes are fit for purpose before general release. This is achieved by configuring a development environment where applications and application changes are developed. Then a staging environment is used to test the effectiveness of the application or changes. Finally when all tests are completed, and all planning processes have been considered, the changes can be rolled on to the live environment.
Changes to an application do not only affect hardware and software elements, but can also affect the end users, as existing functions are changed, or new functions are introduced. Careful communication, preparation and training of the users, managers, administrators and other application stakeholders should also be considered when releases are being made. An example of how this was done badly was a recent change to the way facebook.com operated, and the resulting backlash from it’s user community (Associated Press, 2006).
To conclude, ITIL is a generic and adaptable framework of IT service management processes. ITIL is wide ranging, providing processes dealing with high level business and financial concerns down to the day to day running of IT applications.
In many ways ITIL doesn’t care about the nature of the application it is managing, so whether that application is a mainframe, or a deployment of web services, ITIL provides generic processes which give businesses just enough process to manage any type of application. ITIL leaves the low level details to the discretion of the business, and to the specifics of the application being managed. In this way ITIL can easily be adapted to distributed web services, to provide all the benefits which ITIL management brings.
- Associated Press. (2006) User Backlash Prompts Facebook Changes. Retrieved November 24, 2007 from http://www.msnbc.msn.com/id/14774919/.
- Adler, S. (1999) The Slashdot Effect. Retrieved on November 22, 2007 from http://ssadler.phy.bnl.gov/adler/SDE/SlashDotEffect.html.
- Barash, G., Bartolini, C. & Wu, L. (2007) Measuring and Improving the Performance of an IT Support Organization in Managing Service Incidents. Proceedings of the 2nd IEEE/IFIP International Workshop on Business-Driven IT Management. 11 – 18.
- Brown, A. B. & Keller, A. (2006) A Best Practice Approach for Automating IT Management Processes. Proceedings of the 10th IEEE/IFIP Symposium on Network Operations and Management. 33 – 44.
- Hanemann, A., Sailer, M. & Schmitz, D. (2004) Assured service quality by improved fault management. Proceedings of the 2nd International Conference on Service Oriented Computing. 183 – 192.
- Jantti, M. & Eerola, A. (2006) A Conceptual Model of IT Service Problem Management. Proceedings of the 2006 International Conference on Service Systems and Service Management. 798 – 803.
- Kajko-Mattsson, M., Westblom, U., Forssander, S., Andersson, G., Medin, M., Ebarasi, S. et al. (2001) Taxonomy of problem management activities. Proceedings of the 5th European Conference on Software Maintenance and Reengineering. 1 – 10.
- Kajko-Mattsson, M. (2002) Corrective Maintenance Maturity Model: Problem Management. Proceedings of the 2002 International Conference on Software Maintenance. 486 – 490.
- McLaughlin, K. A., & Damiano, F. (2007) American ITIL. Proceedings of the 35th annual ACM SIGUCCS Conference on User Services. 251 – 254.
- OGC. (2000) Best Practice for Service Delivery. London: TSO.
- OGC. (2000) Best Practice for Service Support. London: TSO.
- Vinoski, S. (2002) Web Services Interaction Models. Current Practice. IEEE Internet Computing. Vol. 6, Issue 3. 89 – 91.
- VMware Inc. (2007) An Introduction to Virtualization. Retrieved on November 24, 2007 from http://www.vmware.com/virtualization/.