A discussion of the various options for cloud computing, avoiding the marketing hype and focusing on the potential advantages and disadvantages for your laboratory
There is a lot of talk about cloud computing and its benefits. Surprisingly, there is little discussion about the downside of this technology. And, what exactly is cloud computing? How is it delivered? Are there any benefits for an analytical laboratory? What is the impact of any ISO requirements or GXP (that is, good laboratory practice [GLP], good clinical practice [GCP], or good manufacturing practice [GMP]) regulations for a laboratory that is using or considering using the cloud? In this installment, I discuss the various options for cloud computing, avoiding the marketing hype and focusing on the potential advantages and disadvantages for your laboratory. This column is not intended to be a definitive discussion of the cloud, but rather an introduction to the topic and what some of the implications could be for analytical data. This topic also allows me to write some dubious subheadings.
R.D. McDowall
In the beginning, the information technology (IT) department of any company bought and managed its own IT infrastructure (servers, cables, workstations, and switches). Then the company purchased, installed, and managed the operating systems, such as general office applications like word processing and spreadsheet programs, plus laboratory applications such as laboratory information management systems (LIMS), statistical software (like SAS or Minitab), and the spectrometry software applications used to drive the instruments. This is how most laboratories still operate today. For application software, however, this approach often is costly because there is the upfront cost of purchasing the application plus the purchase of an adequate number of user licenses, and often an ongoing annual maintenance contract, which can be up to 20% of the purchase price of the application. Because the majority of IT departments report to the finance department, it is difficult for these costs to be "massaged" because they are directly visible. Hence the trend, beginning in the mid-1990s, to move to outsourced or offshore IT operations, a trend that has accelerated in recent years as companies attempt to reduce overall IT costs.
As the Internet became more widely accepted and greater bandwidth was coupled with greater global coverage and reliability, the options offered by the Internet started to be exploited by service providers and software vendors. Hosting services for some business applications, such as enterprise resource planning and scientific software like LIMS, became available as a service provided by some software vendors. In such offerings, hosting would be at a single location, and thus this approach is not considered to be cloud computing.
Cloud computing is large-scale distributed computing that can use virtual servers, hosted or leased applications, and computer services to offer a different way for companies to set up and run IT resources on a sliding scale from small to large. E-mail is a simple example of cloud computing: The e-mail service provider has the application software and the storage space for the mail; users only need to type in a URL into a browser and off they go. The best example of cloud computing taken to its ultimate extreme is the Chromebook device, which only contains the operating system, a web browser, and a small amount of solid-state memory to run the two; all user applications and data storage are located in the cloud.
Just as in weather, cloud computing can be classified into various types. Let's start with some definitions and terms to get a better idea of what's going on. First, let's see what is available from authoritative sources to make sense of the marketing hype around cloud computing. There is a white paper from the Software Engineering Institute at Carnegie Mellon University that looks at the basics of cloud computing (1). There also is a special publication from the National Institute of Science and Technology (NIST) that focuses on the definitions of the services and modes of delivery (2). Additionally, Peter Boogaard has provided a review of cloud computing in the pharmaceutical research and development environment that readers will also find interesting (3). Furthermore, there is a draft NIST publication that goes into detailed discussions of the various cloud options and discusses their pros and cons (4); this is an invaluable source of information for anybody considering the use of cloud computing for laboratory and business data.
In essence, cloud computing is the distribution of computing infrastructure, applications, or services from within an organization to outside service providers on the Internet. As such, it aims to move the costs of computing (both hardware and application software) from an upfront capital cost to one that is based on leasing the services and facilities from a service or application provider. NIST defines cloud computing as follows (2):
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models.
This is described in Figure 1 and below in more detail.
Figure 1: The main elements of cloud computing derived from NIST SP800-145 (adapted from reference 2).
NIST SP800-145 (2) is a very short document, about seven pages long, that describes the main elements and services available with cloud computing. From this document I derived Figure 1 to show all of the elements that are possible with the cloud. Moving from left to right across the top of Figure 1, we can look at the elements of cloud computing in detail to make more sense of the technology and approaches used. Essentially, there are three elements that comprise the cloud:
All three elements need to be considered, but in my opinion, for the analytical laboratory, we can narrow down many of the options to consider just one or two.
The process begins with defining your requirements: Why do you want to move to cloud computing? Is it really what you need, or are you being seduced by technology? Just as with the acquisition of any spectrometry or laboratory software, you must get a good understanding of the business reasons for using the cloud. According to NIST (2) and Figure 1, one or more of the following can be the requirements for starting this journey:
Just as weather clouds are classified into three main types as stratus, cumulus, and cirrus, cloud computing has three types of service models, as shown in Figure 1. If you want more information on any of these models, NIST has a draft special publication available (4), as I mentioned earlier, that goes into much more detail than I can here.
Software as a Service
Perhaps the most common cloud computing service model is software as a service (SaaS), which is the provision of one or more applications to a customer or laboratory to meet its business needs. This can vary from e-mail or office applications such as word processing to GXP (GLP, GCP, and GMP) applications. Typically, the applications will be delivered through a web browser (thin client architecture) to reduce additional software installation costs for the laboratory, although some programs may be accessed via a program interface (thick client architecture) installed on each workstation or via terminal emulation at a user facility. The important point to note with the SaaS service model is that overall management of the application and environment remains with the service provider and not the laboratory. This fact will raise concerns with some quality and regulatory requirements, as we will discuss later in this column. However, the laboratory will be responsible for user account management and possibly application configuration, depending on the delivery model used, which we will discuss in the next section.
Infrastructure as a Service
The infrastructure as a service (IaaS) model provides computer infrastructure via the Internet allowing a laboratory or its parent organization to expand computer infrastructure on demand. Typically, the provider has large servers on which virtual machines are created on which each customer or user installs its own operating systems and applications. If required, data storage facilities can be added. IaaS includes providing infrastructure elements such as desktop as a service (DaaS) as noted by Boogaard (3). In this service model the consumer has control over the operating systems, applications, and data storage but does not control the underlying cloud infrastructure.
Platform as a Service
The third cloud computing service model, platform as a service (PaaS), is really for software developers. PaaS is the provision of infrastructure and a development environment to create, test, and deploy applications. The provider's development environment can include programming languages, libraries, services, and utilities that are all supported by the provider. The customer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over its own developed and deployed applications. One type of PaaS customer could be somebody developing and running a web site. I mention this only for completeness; PaaS will not be discussed further here in relation to the analytical laboratory.
Having discussed the service models that can be found under the banner of cloud computing, we need to consider how it will be delivered. According to NIST (2) there are four models:
Combination of Services and Delivery Options: Because the cloud is very flexible, you can have combinations of services and delivery options. For example, you could have a multitenant option using a community cloud incorporating PaaS with tens or even hundreds of companies sharing the same software and architecture. The client companies would have no ability to control the application configuration with the sole exceptions of look and feel and user views of their data. The challenge with such a model is that the service provider can force all tenants to be upgraded en masse on software, hardware, and architecture, leaving clients with no ability to control the environment. In a normal business environment, this could be acceptable, but in a quality and especially in a regulated environment with GXP data, such a situation would be totally unacceptable, because validation would be impossible.
For the purposes of this column, we will limit our discussions to the SaaS service model with delivery through a private cloud. The rationale is twofold. If your laboratory is a research laboratory, you will want to protect your intellectual property. If your focus is development or manufacturing, the data and information generated from your activities must be protected, so you do not want to run the risk of compromising your data with data from another company (as could happen through a community, public, or hybrid cloud), regardless of which industry it is in.
Because of the nature of the cloud, the users are physically and logically separated from the application and the data center where the computing resources are located. The problem with this separation is that when real-time control is required, such as for a spectrometer system, then the SaaS model is really not applicable, because the delay in controlling the instrument over the Internet is not acceptable. In addition, the size of some spectrometry files will be large — for example, many high resolution spectra from nuclear magnetic resonance (NMR) or mass spectrometry (MS) instruments can exceed 1 gigabyte — and there will be problems transmitting a file this large to the cloud resources as well as retrieving it.
On the other hand, business applications such as enterprise resource planning, a quality management system, or a LIMS could be operated using cloud computing, because as the latency of the Internet (delay between entering data and receiving a response from the software) usually is acceptable for those applications. However, because the Internet is a public service and out of the laboratory's control, there can be no guarantee of acceptable levels of service unless the company has a dedicated line to the cloud computing site.
The great advantage of SaaS from the perspective of the customer is that the cost moves from a capital cost model (purchase of the software and associated licences) to a revenue cost model (hire of user accounts to use the software). However, we need to go into more detail about how a SaaS service can be delivered. We have two main options to consider.
Figure 2: A SaaS cloud architecture illustrating isolated installations of an application.
The first is shown in Figure 2. Here, the cloud service provider has the computing resources and each customer has its own version of the application with a separate database in separate virtual machines or even on separate physical computers (not shown in Figure 2). Separate running instances of an application have a number of advantages from my perspective:
The potential cost savings may not be a great as you think, however, because this approach is similar to running the system in-house on your own servers and therefore a supplier may require that you purchase the software rather than lease it.
In contrast, in the second version of SaaS (shown in Figure 3), the service provider offers a single instance of the application with a single database. Here, each laboratory's operations and data are separated logically within a single database (company-specific user groups are set up) and there is the logical separation of each company's data.
With this approach, costs should be lower than with single-company instances of the application. However, the application will usually be "one-size-fits-all." Because there is only a single instance, it will be difficult to configure the software to an individual laboratory's business processes. Therefore, it will be a take-it-or-leave-it option: There is no configuration other than user account management. This means that your business process will have to conform to the application's mode of operation.
Figure 3: An alternative SaaS cloud architecture with a single application with database instance.
Validation of some elements of data integrity (such as a shared application and database) also can be difficult, because accessing another user's portion of the system will not be allowed. However, there could be a good case for having a basic validation that is then confirmed by each regulated company. However, the basic validation may not meet every company's computer system validation policies and procedures, so there would be some additional work needed.
Each laboratory's data are separated logically in the database. But how will you convince an inspector or auditor that one laboratory cannot change the data in a second laboratory's portion of the database? Also, change control will be difficult from my perspective. Does each company delegate change control to the cloud company? This would be very unlikely, because the service provider could issue service packs and application software updates with no consultation with the client companies. There could be a situation in which simple patching of the operating system with security patches could be delegated to the service provider who had an appropriate procedure. However, no changes could be made to an application without agreement of all parties involved.
So from my perspective, and for the reasons given above, the SaaS approach shown in Figure 2 is much more preferable to that illustrated in Figure 3.
So far, we have looked at the architecture of the cloud and the options. But we also need to look at technical feasibility. Running software applications over the web is not new and has been done for a number of years. I know of LIMS that operate on a global basis where there is a central server and services are made available to any laboratory within an organization. Those are not part of a cloud architecture, but there are indeed options now where LIMS is available via SaaS.
But what about spectroscopic instruments and their accompanying software applications? Can these be made to operate via the cloud? Not in the short term, at least in my opinion. Have a look around your laboratory and observe the nature of the various spectrometers. Typically, they are standalone instruments with data held locally on the workstation hard drive; or sometimes there is an option to store data on a network drive. In the latter case, either the data are acquired locally and then transferred to the network or the files can be transferred to the network directly. Regardless of the approach, the application software needs to be next to the instrument to enable real-time control. If there is a time delay via the cloud, what will this do to your data acquisition?
Consider also the size of files generated by each system. Low resolution instruments may have relatively small file sizes; for example, bioanalytical data files from liquid chromatography (LC)–MS-MS analysis are likely to be about 1 megabyte in size. But high-resolution NMR data may be up to 1 gigabyte in size. The latter is enough to cause heart failure among local IT staff if a few of these are moving around an internal network. However if you are storing files this large in the cloud, there is the time to store, but more importantly, the time to retrieve the files from wherever they may be stored. You could be overdoing it on caffeine with all those cups of coffee you'll be drinking waiting for files to be retrieved from the cloud.
One aspect of cloud computing that is usually ignored is the contract. With the cloud, we are dealing with a service that you will be accessing remotely, so there will be a number of potential concerns that you need to consider before deciding if it is suitable for you (4):
The points named above are just an overview of some of the issues you need to consider about a cloud computing contract; if you want more information, please read the NIST draft SP800-146 report (4). Also, do not think that because you have a contract that it will be acceptable to sue the service provider if things go wrong. The key to success is to spend time reviewing the contract and asking questions before you sign on the dotted line and you have moved all your data to the cloud. We will return to the contract later in this column when we consider the regulatory compliance aspects of the cloud.
So having discussed the cloud, where are we in the technology cycle? According to a Gartner Group estimate, by the end of 2012, up to 20% of companies will not own their own IT assets. However, there is a lot of hype about cloud computing that makes the approach appear more mature; indeed, Gartner estimates that it is two to five years from being adopted as a mainstream technology (1). On the Gartner Group's hype cycle, cloud computing is currently at the peak of the "inflated expectation" stage and still needs to migrate through the "trough of disillusionment" and the "slope of enlightenment" before reaching nirvana or the "plateau of productivity" (1).
Any technology has advantages and disadvantages. The pros and cons of SaaS cloud computing are summarized in Table I. I will not go into these in detail with the exception of the potential disadvantage of regulatory compliance. We will discuss this in some detail in the next section.
Table I: Some potential advantages and disadvantages of cloud computing
If you are in a regulated laboratory working to one of the "good practice" disciplines, you need to know what impact GXP regulations will have on the cloud and vice versa. You need to know this up front rather than wait for an inspector to start writing citations because you could not be bothered to read regulations or guidance. You know this makes sense!
So, do you want the good news or the bad news?
The good news (perhaps this is bad news if you work in quality assurance) is that the GLP and GCP regulations make no mention of IT systems. Where the GLP regulations mention equipment (5) or apparatus (6), it could be interpreted as including IT infrastructure. Similarly, US GMP regulations refer to equipment being of adequate size, properly installed, and fit for intended use (7). However, there is an ongoing program from the US Food and Drug Administration that puts increased emphasis on data integrity under Compliance Program Guidance (CPG) manual 7346.832 (8) for pre-approval inspections.
The bad news is that we have new regulations that should serve as a benchmark for all laboratories governed by GXP regulations and who are also considering cloud computing. It is when we turn to Europe and the new version of Annex 11 for computerized systems (9) that I reviewed in an earlier column (10) that we find the most modern and explicit regulatory requirements that can be applied to cloud computing. The first point to make is that in the glossary to Annex 11, IT infrastructure is defined as: follows:
The hardware and software such as networking software and operation systems, which makes it possible for the application to function.
So the infrastructure is the building blocks on which the regulated applications will be installed and validated.
For your reading pleasure, Table II summarizes the Annex 11 regulations that are most applicable to IT infrastructure and, in my view, also applicable to cloud computing. The most important of these is from the "Principle" section of Annex 11, which simply states that IT infrastructure should be qualified. Therefore, in a regulated environment, there must be documented evidence that the server and operating system (such as IP address services) on which an application is running has been installed and configured correctly. In addition, if the version of the application is installed as a virtual machine, this too needs to have been installed and qualified. Evidence of this work needs to be available to you as well as inspectors and auditors. Some cloud service providers are specializing in this area and will provide qualified infrastructure and will provide the copies of the work performed for their customers.
Table II: EU GMP Annex 11 regulations applicable to cloud computing (9)
However, we also need to look at the requirements of clause 3 of Annex 11, which considers service providers. There are two key requirements here. First, we need an agreement between the laboratory and the service provider, and second, we must consider whether or not we need to perform an audit. Putting the horse before the cart, section 3.3 in Table II notes that an audit of a service provider should be based on a documented risk assessment. So how critical is the system and the data held in it? If the application and data are critical; you must conduct an audit to ensure that the center where the computer is housed is acceptable to host your system. If you don't know where the data center is, the process stops here; just find another service provider.
In the audit, you should find out about the systems the provider has in place to protect your data, such as security (access to the site and the computer room), antivirus and intruder protection, alternative power supplies, standby electricity generation, fire suppression, and data backup (if part of your service). These elements also form some of the requirements of an effective business continuity plan to meet the requirements of clause 16.
Two elements in Table II that are closely related are clauses 10 (change control and configuration management) and 2 (personnel). After a system is up and running, it is the change control processes and personnel at the service provider that will make or break any compliant operation. The service provider's staff need to be aware of the GXP regulations applicable to their role and the impact they can have on the laboratory's data. This is not a "nice to have" element. It is the law. Furthermore, any uncontrolled changes to a system by the staff will destroy the validation status; therefore this is a critical area to consider. Before committing to either SaaS option, it is vital to know how patches and service packs for the operating system, database, and changes to the application are controlled, installed, documented and, where necessary, validated.
I am not trying to put you off using the cloud — there are benefits to be obtained from it. But if you are working under GXP regulations, you need to know the requirements before you zoom off into the sky.
The most common variant of cloud computing that could be used in a spectroscopy laboratory is SaaS. It will not be used for controlling spectrometers or other applications that need real-time control, but it offers advantages for laboratories using applications where response time is not critical. The technology is still maturing, however, and therefore one should exercise care in evaluating and deploying it, especially in a regulated environment. The contract between the laboratory and the service provider must be carefully analyzed to ensure the laboratory is protected and gets the service it is paying for. If working under GXP regulations, further requirements are necessary and the relevant clauses from EU GMP Annex 11 must be used to guide laboratories considering cloud services.
R.D. McDowall is the principal of McDowall Consulting and the director of R.D. McDowall Limited, and the editor of the "Questions of Quality" column for LCGC Europe, Spectroscopy's sister magazine. Direct correspondence to: spectroscopyedit@advanstar.com
(1) G. Lewis, "Basics About Cloud Computing" (white paper), Software Engineering Institute, Carnegie Mellon University, 2010.
(2) P. Mell and T. Grance, The NIST Definition of Cloud Computing, NIST Special Publication 800-145 (National Institute of Standards and Technology, Gaithersburg, Maryland 2011).
(3) P. Boogaard, Drug Disc. World, Fall, 85–90 (2011).
(4) L. Badger, T. Grance, R. Patt-Corner, and J. Voas, DRAFT Cloud Computing Synopsis and Recommendations, NIST Special Publication SP800-146 (National Institute of Standards and Technology, Gaithersburg, Maryland 2011).
(5) U.S. Food and Drug Administration, 21 CFR 58, Good Laboratory Practice (GLP) regulations (Rockville, Maryland).
(6) Principles of Good Laboratory Practice (Organization of Economic Cooperation and Development, Paris, France).
(7) U.S. FDA, 21 CFR 211, Current Good Manufacturing Practice (GMP) regulations (Rockville, Maryland).
(8) U.S. FDA, Compliance Program Guide 7346.832, Pre Approval Inspections, May 2010.
(9) European Commission, Health and Consumers Directorate-General, GMP Annex 11, Computerised Systems (Brussels, Belgium, 2010).
(10) R.D. McDowall, Spectroscopy 26(4), 24–33 (2011).