This is the first of four-part blog series on cloud computing for BI professionals.
There is a lot of confusion about cloud computing, even among professionals in the field. But that’s true of any new, fast moving technology in which there are a lot of new technologies and methods. After reading a few definitions of cloud computing that caused me to nod off at my keyboard, I created a simpler one:
Shared, online compute resources that you rent from a service provider and dynamically configure yourself.
Let’s unpack this definition a bit:
- Shared: You share compute resources with other groups or companies, even your direct competitors! Obviously, this raises security and privacy concerns.
- Online: You access the compute resources via a Web browser or a programmatic Web application programming interface. In this respect, cloud computing delivers online “services”.
- Compute resources: Compute resources consist of the infrastructure (servers, storage, and networks), development tools, and applications. Basically, the whole stack, accessible via a Web browser or service call.
- Rent: You only pay for what you use and you can terminate the service at any time (although there may be some exit fees.) This is value-based pricing. Cloud infrastructure vendors generally charge by the hour while cloud software providers generally charge by user per month.
- Service provider: A service provider could be your internal IT department (private cloud) or an external company (public cloud).
- Dynamically configure: Unlike traditional hardware and software, you don’t purchase, install, test, tune, and maintain cloud-based resources. With cloud-based infrastructure, you simply configure a virtual image of your compute environment (hardware, storage, network) using a Web browser. With cloud-based software, you simply configure your application using a Web browser to conform with your branding and workflow requirements.
As you probably have already surmised, cloud computing is divided into three classes of services, each of which can be applied to the business intelligence market: 1) software as a service (applications) 2) platform-as-a-service (application development), and infrastructure as a service (compute resources). (See figure 1.)
Figure 1. Three Types of Cloud Services with BI Examples (click to expand)
- Software-as-a-Service (SaaS) delivers applications. SaaS was first popularized by Salesforce.com, which was founded in 1996 to deliver online sales applications to small- and medium-sized businesses with few or no IT resources and few capital resources. Salesforce.com now has 92,000 customers of all sizes and has spawned a multitude of imitators. Within the BI market, many startups and established BI players are offering SaaS BI services, although the uptake of such services is slower than expected. (See “Expectations Versus Reality in the Cloud: Understanding the Dynamics of the SaaS BI Market.”) SaaS BI vendors including Birst, PivotLink, GoodData, Indicee, Rosslyn Analytics, and SAP, among others.
- Platform-as-a-Service (PaaS) enables developers to build applications online. PaaS services provide development environments, such as programming languages and databases, so developers can create and deliver applications without having to purchase and install hardware. In the BI market, the SaaS BI vendors (above) are actually PaaS BI vendors, which is the primary reason why growth of SaaS BI is slow. Before you can consume a SaaS BI application, you have to build a data mart, which is often tedious and highly customized work since it involves integrating data from multiple, unique sources, cleaning and standardizing the data, and modeling and transforming the data. SaaS BI vendors are peddling a finished product when they are actually selling a custom PaaS development effort.
- Infrastructure-as-a-Service (IaaS) provides online computing resources (servers, storage, and networking) which customers use to augment or replace their existing compute resources. In 2006, Amazon popularized IaaS when it began renting space in its own data center using virtualization services to outside parties. Some BI vendors are beginning to offer software components within public cloud or hosted environments. For example, analytic database vendors Vertica and Teradata are now available as services within Amazon EC2, while Kognitio offers a hosted service. ETL vendors Informatica and SnapLogic offer services in the cloud.
Key Characteristics of the Cloud
Virtualization. Virtualization is the foundation of cloud computing. You can’t do cloud computing without virtualization; but virtualization by itself doesn’t constitute cloud computing.
Virtualization abstracts or virtualizes the underlying compute infrastructure using a piece of software called a hypervisor. With virtualization, you create virtual servers (or virtual machines) to run your applications. Your virtual server can have a different operating system than the physical hardware upon which it is running. For the most part, users no longer have to worry whether they have the right operating system, hardware, and networking to support an application. Virtualization shields from the underlying complexity (as long as the IT department has created appropriate virtual machines for them to use.)
With virtualization, organizations can run multiple, heterogeneous virtual servers on a single physical server to maximize utilization, or they can run a single virtual server on multiple physical servers to increase scalability. Because virtualization decouples applications from the underlying hardware, IT administrators can migrate applications to new hardware without having to reinstall software. They also can spawn multiple instances of a single application using virtual servers and run them in parallel on a single physical server to improve application performance and throughput. (See figure 2.)
Figure 2. Virtualization Use Cases (click to expand)
Left: heterogeneous system images and applications run on a single server, maximizing server utilization. Middle: a single image runs across multiple physical machines, increasing scalability. Right: multiple instances of an application run in parallel on a single machine, increasing efficiency.
In short, virtualization increases the flexibility, scalability, efficiency, and availability of data center resources, and it dramatically lowers data center costs by enabling the IT department to consolidate servers and reduce power, cooling, space, and staffing overhead.
To the Cloud: Dynamic Provisioning
Browser Interface. To turn virtualization into cloud computing, you need to add software that enables business users to dynamically provision their own virtual servers and use the servers as long as they desire.
For instance, developers using a Web browser can configure a custom virtual server to support a new development and test bed. Or, they can select a virtual image (i.e., server and applications) from a library of virtual images created in advance by the IT department. Once the developers are finished using the virtual images, they “release” them. Thus, developers no longer need to submit requests to the IT department for servers, storage, and networking capacity. They either configure their own virtual machine or select one from a library that meets their application’s processing requirements. They no longer have to wait for purchasing and legal to execute a purchase order or the IT department to install, tune, test, and deploy the systems.
Services Interface. To make the leap to cloud computing, you also need a services interface so administrators can programmatically provision servers based on a schedule or events (e.g., an ETL job that begins). Administrators use Web services interfaces to support auto-scaling, failover, and backups.
With auto-scaling, a BI administrator uses a cloud services interface to automatically provision and release virtual BI servers during the course of a day to efficiently allocate processing power among servers to support various BI workloads. For example, at 2 a.m. in a typical BI environment, the system fires up an ETL server and database server to run nightly ETL jobs, while at 4 a.m. it releases the ETL server and provisions a BI server to process and burst daily reports. At 10 a.m. it provisions an additional BI server and database server to handle peak usage. Failovers and backups work much the same way.
Cloud Management Software. Cloud computing also requires management software to help IT administrators keep track of all the moving parts in a virtualized environment. Cloud management software enables IT administrators to define systems-level policies (e.g., security and usage), create and manage virtual images which enforce the policies, manage virtual server versions, monitor servers and performance, manage user roles and access, track usage, and manage chargebacks or accounting, among other things. There are a variety of vendors that offer cloud management software, including cloud data center providers, such as Amazon.com and Rackspace, and independent software vendors, such as Eucalyptus and RightScale.
Another key characteristic of cloud computing (in particular, Software-as-a-Service) is that applications are multi-tenant, which means multiple users from different organizations run the same application code running on the same hardware. This is different from a traditional hosting or outsourcing environment in which each customer owns or rents a dedicated set of hardware and software in the service provider’s data center. The hosted model leads to a lot of wasted compute resources since customers only use their own compute resources even when other machines in the data center are idle. In contrast, multi-tenancy makes much more efficient use of hardware and software resources, delivering economies of scale that make cloud computing an attractive business model to service providers, as long as they can attract enough customers.
One problem with multi-tenancy is that applications must be designed from scratch to support it. Multi-tenancy creates virtual partitions within the application and database for each distinct customer. Customers usually configure the application to match their unique branding and workflow requirements. On the data side, customer data is either interleaved by row and separated using unique identifiers or partitioned into separate tables or database instances.
Legacy applications not designed for multi-tenancy have to fudge it. Either the service provider must create dedicated environments for each customer, which is highly inefficient (e.g., the old application service provider model) or they use virtualization software to run parallel instances of each application (e.g., a virtual appliance.) In some respects, the virtual appliance approach is more flexible than multi-tenancy because the virtual appliances can be ported to run on almost any hardware. (See figure 3.)
Figure 3. Application Architectures (click to expand)
Traditional on-premise software (far left) tightly couples logic and data to hardware in a LAN environment. A hosted environment (second from left) gives each customer their own dedicated hardware and software resources in a third party data center which they access via a virtual private network. A true multi-tenant environment (second from right) partitions a single application and database so different customers get their own unique views while sharing the same application, database, hardware, and network connection. A virtual appliance model (far right) enables legacy software not written for multi-tenancy to run parallel instances, essentially virtualizing multi tenancy.
SaaS BI vendors have long waged battles over whether their respective software is truly multi-tenant or not. The virtual appliance model gives legacy software vendors venturing into SaaS a more equal footing on which to compete.
This blog defined cloud computing and discussed some of its more salient attributes. However, there are several ways to deploy the cloud, and these deployment options have significant implications on costs, security, and staffing. The next blog in this series will discuss the differences between public clouds, private clouds, and hybrid clouds and show how an organization might architect its BI environment to leverage public cloud offerings.
By the way, I’m once again speaking at CFO Magazine’s Corporate Performance Management Conference, which is being held September 11-12 in Dallas Texas. I’ll be delivering a presentation on Monday about the future of business intelligence, using my BI Delivery Framework 2020 as the basis for the presentation. On Tuesday afternoon, I’ll be delivering a half-day seminar on performance dashboards. If you are interested in registering for the all-access pass, use the code LF1000 to get a $1,000 discount. Cool!
Article source: http://www.b-eye-network.com/blogs/eckerson/archives/2011/07/what_is_cloud_c.php