This is the second in a four-part blog series on cloud computing for BI professionals.
Cloud computing offers a compelling new way for organizations to manage and consume compute resources. Rather than purchase, install, and maintain hardware and software, organizations rent shared resources from an online service provider and dynamically configure the services themselves. This model of computing dramatically speeds deployment times and lowers costs. (See prior article “What is Cloud Computing?“)
Although cloud computing shares the above attributes, it can be deployed in several different ways. The key factor is whether the cloud service provider is an external vendor or an internal IT department. There are three deployment options for cloud computing:
- Public Cloud. Application and compute resources are managed by a third party services provider.
- Private Cloud. Application and compute resources are managed by an internal data center team.
- Hybrid Cloud. A private cloud that leverages the public cloud to handle peak capacity; a reserved “private” space within a public cloud; or a hybrid architecture in which some components run in a data center and others in the public cloud.
Most of the discussion about cloud computing in the press refers to public cloud offerings. The public cloud offers the most potential benefits and the greatest potential risks. With a public cloud, organizations can obtain application and computing resources without having to make an upfront capital expenditure or use internal IT resources. Moreover, customers only pay for what they use on a usage or monthly subscription basis, and they can terminate at any time. Thus, public clouds accelerate deployments and reduce costs, at least in the short run. This is sweet news to BI teams that often must spend millions of dollars and months of development time before they can deliver their first application.
In addition, a public cloud also obviates the need for customers to maintain and upgrade application code and infrastructure. Many public cloud customers are astonished to see new software features automatically appear in their software without notice or additional expense. And the public cloud frees up IT departments to focus on more value-added activities rather than hardware and software upgrades and maintenance. In short, there is something for everyone to like about the public cloud.
Security and Privacy. But the public cloud also comes with risks. Security and privacy are the biggest bugaboos. Some executives fear that moving data and processing beyond their own firewalls exposes them to security and privacy risks. They fear that moving data across public networks and comingling it with other company’s data in a public cloud might make it easier for sensitive corporate data to get into the wrong hands.
While security and privacy are always an issue, the fact is that most corporate resources are more secure in the public cloud than in a corporate data center. Public cloud providers, after all, specialize in data center operations and must meet the most stringent requirements for security and privacy. However, there are compliance regulations that legally require some organizations to maintain data within corporate firewalls or pinpoint the exact location of their data, which is generally impossible in a public cloud which virtualizes data and processing across a grid national or international computers.
Other Challenges. The public cloud poses other challenges:
- Reliability. Executives may question the reliability of public cloud resources. For example, Amazon EC2 has had two short, but high profile, outages, causing companies that ran mission critical parts of their business there to be left stranded without much visibility into the nature or longevity of the outage.
- Costs. It can be extremely difficult to estimate public cloud costs because pricing is complex and often companies can’t accurately estimate their usage (which is why they want to migrate workloads to the cloud in the first place.)
- Blank Slate. Administrators must redefine corporate policies and application workflows from scratch in the public cloud, which generally provides plain vanilla services.
- Vendor and Technology Viability. The public cloud market is evolving fast so it’s difficult to know which vendors and technologies will be around in the future.
Because of the above reasons, many organizations are beginning their journey into the cloud with private clouds. This is especially true in the infrastructure-as-a-service arena where IT administrators are implementing virtualization software to consolidate servers and increase overall server utilization, flexibility, and efficiency. In addition, a private cloud gives an organization greater control over its processing and data resources, providing ease of mind for worried executives, if not greater security and privacy for sensitive data. And since a private cloud runs in an existing data center, IT administrators don’t have recreate security and other policies from scratch in a new environment.
But the private cloud has its own challenges. IT administrators have to learn and install new software (hypervisors and cloud management utilities). They need to manage two compute environments side by side and keep IT policies aligned in both. This adds to complexity and staff workload. And it goes without saying that a private cloud runs in an existing corporate data center, which carries high fixed costs to maintain.
Companies are increasingly pursuing a two-pronged strategy that uses the private cloud for the bulk of processing and the public cloud to handle peak loads. The key to a hybrid cloud is obtaining cloud management software that spans both private and public cloud environments. The software supports the same hypervisors used in each environment (ideally it’s the same hypervisor) and has built-in interfaces to the public cloud provider so internal IT policies and virtual images can be transferred to the public cloud environment.
In addition, many public cloud vendors allow customers to carve out private clouds within the public cloud domain. For example, Amazon.com offers a virtual private cloud within its Elastic Compute Cloud (EC2) environment that lets customers reserve dedicated machines and static IP addresses, which they can link to their internal data centers via virtual private networks. Hybrid clouds are obviously more complex and challenging to manage. Currently, few people have experience blending private and public clouds in a seamless way.
Adding Public Cloud Components to a BI Architecture
Another form of hybrid cloud uses public cloud facilities to enhance an existing architecture. In a BI environment, there are several that organizations can mix and match public cloud offerings with their on premises software (which may or may not be running in a private cloud.)
Scenario #1 – Analytic Sandbox. When a data warehouse is running at full capacity, administrators might consider offloading complex ad hoc queries submitted by a handful of business analysts to a public cloud replica. In this scenario, complex queries submitted by the analysts are bogging down performance of the data warehouse. Since it’s difficult to estimate ad hoc processing requirements and the costs of replicating a data warehouse are high, the IT staff decides it’s faster and cheaper to create a new data mart in the public cloud and point the business analysts to it. The IT staff (or analysts) can increase or decrease capacity on demand using self-provisioning capabilities of the public cloud. (See figure 1.)
Figure 1. Analytic Sandbox Using a Public Cloud (click to expand)
The primary challenge in this scenario is the cost and time required to move data across the internet from an internal data center to the cloud. Since the initial load may take days or weeks depending on data volumes, IT staff will usually ship a disk to the cloud provider to load manually. Thereafter, the IT staff needs to figure out whether it can move daily deltas across the internet in time within an allotted batch window. Considering that it takes six days to move 100GB across a T-1 line, organizations may need to skip doing batch loads and instead trickle feed data into the data warehouse replica. In addition, it is often difficult to estimate pricing for such data transfers and charges may add up quickly. Cloud providers generally charge for transferring data in and out of the cloud and storing it. (Amazon, however, has recently discontinued fees for transferring data into EC2.)
Also, depending on the speed of network connections, the business analysts might experience delays in query response times due to internet latency. Invariably, internet speeds won’t match internal LAN speeds so users might notice a difference. Finally, there are security and privacy issues discussed in the previous article. (See “What is Cloud Computing?”)
Scenario #2. Cloud-based Departmental Dashboard. A more common scenario is when a department head purchases a Software-as-a-Service (SaaS) BI solution from a SaaS BI vendor, of which there are many. Here, an organization’s source systems and data warehouse remain in the corporate data center but the dashboard and associated data mart run in the cloud. (See figure 2.)
Figure 2. Cloud-based Departmental Dashboard
SaaS BI tools are popular among department heads who want a dashboard on the cheap and don’t want to involve corporate IT. Unfortunately, designing a data mart, whether in the cloud or on premise, is never easy or quick, especially if it involves integrating multiple operational sources. (See “Expectations Versus Reality in the Cloud: Understanding the Dynamics of the SaaS BI Market.”)
This is not a problem if organizations are willing to pay the costs of creating a custom data mart and wait about three to four months, which is the time it usually takes to build out a relatively complex, custom environment. It’s also not a problem if they simply want to visualize an existing spreadsheet. But if they believe the cloud provides quick, easy, and inexpensive deployments for any type of BI deployment, they will be disappointed. Also, they still need to transfer data to the cloud and users may experience response time delays due to internet latencies.
Scenario #3. BI in the Cloud Without the Data. To eliminate security, privacy, and data transfer issues, companies may want to keep data locally in a corporate data center while maintaining the BI application in the cloud. (See figure 3.) BI developers can configure the SaaS BI tool to meet their branding and workflow requirements, gaining the speed and cost advantages of cloud deployments, while minimizing data security and privacy problems.
Figure 3. BI in the Cloud Without Data
While this scenario sounds like it optimally balances the risks and rewards of cloud-based BI deployments, it has a major deficiency: it requires the IT department to open a port in the corporate firewall to support incoming queries. If the organization is worried enough about data security to want to keep data locally where it’s safe, they will kill it as soon as they recognize the security vulnerability it presents.
Scenario #4. Data Warehouse in the Cloud. The final scenario is to put the entire data warehousing environment in the cloud. (See figure 4.) Today, this only makes sense if all your operational applications also run in the cloud. Obviously, this scenario only applies to few companies, namely internet startups that have fully embraced cloud computing for all application processing. However, these companies have to manage all the problems associated with the public cloud (i.e., security, reliability, availability, and vendor viability). At some point in the future, this architecture may prove dominant once we get past security and latency hurdles.
Figure 4. Data Warehouse in the Cloud
There are three major deployment options for cloud computing: public, private, and hybrid. As in most things in life, there is rarely a clearcut solution. So, too, with cloud computing. Organizations will experiment with public and private clouds, and most will probably have a mix of both. Most data center shops have already implemented virtualization, which is the first step on the way to private clouds. Once they get comfortable with private clouds, they will soon experiment with hybrid cloud computing to support peak computing rather than spend millions on new hardware to support a few days or weeks of peak processing a year. And if the data is particularly sensitive, they may begin with a private virtual cloud inside a public cloud data center to ease their fears about security, privacy, and reliability.
When push comes to shove, economics and convenience always trumps principles and ideals. This is how e-commerce overcame the security bogeyman and gained its footing in the consumer marketplace, and I suspect the same will happen with cloud computing.
Article source: http://www.b-eye-network.com/blogs/eckerson/archives/2011/07/deployment_opti.php