Maximize IT Flexibility and Lower Costs with Grid Computing on Windows

 By Randy HietterRandy Hietter is a director at Oracle Corporation with the Real Application Clusters product management team. His extensive experience in enterprise software product marketing and management includes specialization in performance management software, database, and J2EE application servers with companies such as EMC/Luminate, Sybase, and Persistence Software.IntroductionThis document discusses grid computing: Oracle’s definition of it, the benefits of it, and the elements that comprise an Oracle Enterprise Grid Computing environment. It also highlights an Oracle customer, TALX, a part of Equifax, which has recently moved aggressively to grid computing in a Windows environment. Last, it covers when to implement grid computing and covers grid implementation strategies. It closes with a discussion of Oracle’s commitment to Windows and its database integration with Windows and .NET.IT ChallengesFigure 1 shows some drivers for grid computing. In discussing grid computing with clients, the challenges that keep coming up are limited power, space, and cooling, and scarce and expensive resources, primarily human resources. It is hard to find skilled talent that can manage these complex environments. Changing business requirements—mergers, acquisitions, new product lines, and new facilities being opened—all create a lot of change in the IT world that must be accommodated. Underutilized infrastructure leads to the whole phenomenon of virtual machines and to people trying to consolidate workloads and get their utilization percentage above 10% to 15%.These challenges lead to certain trends that are effecting the way IT infrastructure is architected and deployed today. One of the key trends is the use of increasingly powerful, low-cost commodity servers, especially in the x86 world. Another one is server virtualization as a way to increase utilization of the servers in a data center. Finally, we see an increasing use of server pooling to aggregate a number of physical discrete servers and to present that aggregate environment as a single virtual resource to a higher layer. That is a key part of grid computing and server pooling.The way many customers are addressing these challenges and exploiting these trends is by trying to improve the efficiency and the economics of their IT operation. The three watchwords for successful data centers are: standardization, consolidation, and automation— the essence of grid computing.To start, they select a certain number of preferred vendors, then standardize on hardware, software, and configurations based on the kinds of workloads they have, such as OLTP, batch, data warehousing, and so forth.Next they consolidate servers, which in many cases can be done with virtualization. Then they consolidate their databases. A number of Oracle customers have consolidated their databases using grid computing. Next they consolidate applications and data, which is the second step of the database consolidation exercise.Last, they take advantage of the automation capabilities available in Oracle Database 11g, Oracle Enterprise Manager, and Oracle Real Application Clusters (“RAC”). To effectively manage the proliferation of servers, workloads, databases, and users inherent in a grid computing environment, you need a system to automatically monitor, manage, and react to various events in order to maintain your quality of service.Oracle Enterprise Grid ComputingWhat is grid computing? Many of us know the early definition of grid computing, which we associate with a very compute-intensive workload that is parceled out to a variety of discrete physical servers. These servers crunch on that workload and the results are then reassembled. Often this is for scientific work or other compute-intensive kinds of calculations. This workload is pushed out largely to independent servers.But five years ago, Larry Ellison introduced what he termed Enterprise Grid Computing. It is defined as an IT infrastructure that dynamically changes to accommodate the varying workloads and changes that occur in an enterprise’s data center, based on the user population and the dynamics of the business.Oracle refers to Enterprise Grid Computing as an IT infrastructure built with clustered commodity servers, usually running either the Linux or Windows operating system, and low-cost storage that adapts to changing business needs. As Figure 2 shows, the key grid capabilities are resource pooling and sharing, dynamic resource provisioning, and automated monitoring and management. The idea is to plug workload, user, and customer environments into an IT infrastructure grid that works just like a power grid. For resource pooling and sharing, a company is dynamically assembling resources to take care of a particular workload that exists. Dynamic resource pooling and sharing is a key element of a grid and you need an IT infrastructure that is designed from the ground up to monitor that and react to the varying changes that occur in the workload.Benefits of Grid ComputingSince its introduction five years ago, more than 10,000 customers have implemented production systems in an Oracle Enterprise Grid Computing environment. The business benefits they have realized include a more agile and responsive IT environment and significantly lower server and storage costs. They are able to offer higher quality of service that translates to better uptime; in fact, their unplanned outages go way down and planned outages or upgrades can be minimized as well.One technical benefit is that the infrastructure is dynamically configurable. A company may need to have three servers supporting a particular workload between 10 am and 5 pm, but the batch reporting workload goes way up in the evening, so they want to deploy more horsepower to that. You can reconfigure the environment to support the changes in the workload that exist in a business and on its calendar.Oracle has built into the grid the ability to balance workloads across this infrastructure, so that users do not always have to monitor the environment to make those decisions. An Oracle Enterprise Grid has no single point of failure, which is achieved largely through the use of Oracle RAC, among other components.When Oracle introduced grid computing, an independent third-party polled some of the early adopters and asked them, “Where are you realizing the benefits of grid computing?” As Figure 3 shows, the benefits are either in hardware and software license savings or in labor savings—fewer people required to do the same job—and it depends on the environment. Hardware savings were significant for the Stock Exchange. Migration from a mainframe environment was the key and they were able to accommodate the same user population and workloads with an Oracle Grid in a much less expensive, commodity server-based environment.The next example in the table, the tech provider, realized great savings in software licenses. They were able to consolidate a number of their distributed data centers into one and, because of that and because they began to rely on x86 commodity servers, they were able to reduce the number of software licenses they had to pay for, including operating system, database, and application licenses. The third example is a bank that was able to realize significant savings in the number of human resources required to get the job done because of the automation features and database consolidation. Thus, the grid can have a significant impact on the economics of an operation in one or more of three key areas: hardware, software, and labor.Elements of Oracle Enterprise Grid Computing EnvironmentThere are four key elements in an Oracle Enterprise Grid Computing environment, as shown in Figure 4. First is the Fusion Middleware layer. This is the application server layer, where application code lives and runs.Applications need access to data, so the essence of grid computing occurs at the Real Application Clusters level, the second element of grid computing, where the databases run. Real Application Clusters enable a database to run on a pool of servers. It virtualizes the database to the Fusion Middleware level.Automatic Storage Management is the third element of grid computing. It provides a way to manage all the database files, eliminating the need to manage raw volumes that many Oracle DBAs used in the past to realize performance gains. ASM is a free feature of Oracle Database 11g.The fourth element of grid computing is Grid Control. Grid Control is part of Oracle Enterprise Manager. It allows a company to manage and monitor the grid environment, all the way from the application layer down to the storage layer.The foundation of any Oracle Enterprise Grid environment is the database. Oracle Database 11g delivers industry leading performance, scalability, security, and reliability on a choice of clustered or single-servers running Windows, Linux, and UNIX. Oracle Database 11g delivers the benefits of grid computing with more self-management and automation, making it easier to:Change IT systems without risk, using Real Application TestingPartition and compress tables to store more data and run queries fasterSecurely protect and audit data, and enable total recall of dataIntegrate and manage the lifecycle of all enterprise informationRun your business 24 x 7 with unique, maximum-availability architectureCameron Sturdevant of eWEEK Labs said about Oracle Database 11g, “It is the cornerstone of the vendor’s dynamically allocated computing grids and should garner the attention of database managers with its improved management, recovery, and table compression capabilities. Oracle Database 11g also takes much of the guess-work out of the advanced database tuning.” Go to www.otn.oracle.com and download Oracle Database 11g for free to get acquainted with it.Oracle Real Application ClustersOracle RAC is the essence of what makes Oracle’s Enterprise Grid Computing work. Oracle RAC is the cluster database technology that Oracle has been working on since the late 1980s with VAX/VMS clusters. RAC works by enabling a cluster database to pool together a number of physical servers to present a single virtual database to the upper level tier, the application tier. The result is superior high availability and a more efficient use of server resources. If an outage occurs, Oracle RAC transparently takes care of it and migrates any user connections to another available instance of the database in that physical cluster.Active-Active Versus Active-PassiveMany companies today are running an active-passive failover environment, in which a single server is running a single database and if that server goes away, a cold failover environment has to be started up. Users then have to reconnect, and they have lost minutes of productive work. In some cases, depending on the application and the size of the database, they may have lost hours. It is an inefficient use of server resources because the standby idle system is occupying power and space yet is not doing any work. It is a difficult and costly environment to support.Contrast that with an Oracle RAC active-active failover environment. All the servers in an Oracle RAC cluster are active all the time, and that is part of the unique aspect of Oracle RAC: any workload that comes in is directed, via the workload balancing capabilities of Oracle RAC, to one of the instances in the cluster, based on either a round-robin mechanism or the runtime connection load-balancing algorithm available with Oracle Database. No resources are underutilized. If a server instance goes down or if the physical server fails, Oracle RAC automatically migrates those connections and the work keeps going. Many firms buy Oracle RAC for high availability. Oracle Enterprise Grid has no single point of failure; Oracle RAC is one of the key mechanisms that enables high availability and business continuity.Pay as You GrowAnother reason that people deploy Oracle RAC is the pay-as-you-grow benefit it offers. A company can start small and grow incrementally, adding capacity on demand with zero downtime. You can start with a two-node cluster and, as the workload or business requirements change, easily add cluster nodes and extend the shared database across those nodes.Figure 5 contrasts Oracle RAC with Big Iron installations. In the case of Big Iron, if you are not using x86 commodity servers but more specialized machines, you need to plan ahead for the workload volume anticipated years down the road, and size the machine for that. The pink section above the SMP2 shows all the underutilized capacity that you pay for when the server is rolled in, which you hope to use over time. In contrast, the chart below it shows a very fine-grained way of adding capacity with low-cost commodity servers. As a firm needs more horsepower to support its database processing workload, it adds another node, so the Real Application Clusters grow capacity in a fine-grained, less expensive way.Mercado Libre, a Latin American online auction owned partly by eBay, has realized improved scalability and performance by deploying an Oracle Grid. They were an early adopter of Oracle Database 10g, launching their auction site in Brazil starting with a four-node cluster. Their auction transaction volume grew steadily over the years, and within the first three months they added another node; within six months, they added three or four nodes. They now have a 50-node cluster, a tenfold or more increase in their auction activity. They have a 15-node cluster of x86 machines and they have not had to take down that operation to do a forklift upgrade. They simply incrementally added another node to the cluster, bringing it online through Oracle Enterprise Manager. Oracle RAC automatically runs connections and work to that new node in the cluster without disrupting the cluster or database activity.Flexibility and AgilityThe primary reasons for choosing an Oracle Enterprise Grid Computing environment with Oracle RAC are flexibility and agility. Many firms have dedicated silos of hardware and software to accommodate a particular application area. This approach does not provide much inherent scalability. What’s more, resources are underutilized, operational procedures surrounding each of the silos have to be backed up separately, and there is likely a separate standby system for each of the mission-critical environments. Using a grid, flexibility and agility can be enhanced significantly. Figure 6 shows a cluster database in an Oracle RAC environment. This shows a consolidated database, in which a customer has combined four discrete application areas—ERP, data warehousing, CRM, and web e-commerce—into a single database. With Oracle you can merge separate physical databases and then expose in the cluster what Oracle calls “services”; this is all part of Oracle Database 10g and Oracle Database 11g. These services represent the workloads and they provide great flexibility. Figure 6 shows various application workloads: ERP running on two servers, data warehousing running on one, one server in the cluster designated as test, and another server running web e-commerce and a CRM workload.As users from the application tier connect to the database tier, they do not specify a specific host and IP address, they specify a “service” and a virtual IP address. This is part of the virtualization that Oracle Database 11g offers. The connect string in the application environment refers to a service rather than to the specifics of a host ID, IP address, or other entity. It is not hard-coded as it was in the past. When it receives this service connection request, Oracle RAC routes it to the appropriate instance in the cluster running that service, in a workload-balanced way. You do not have to worry about a particular node getting overloaded. Oracle RAC keeps track of that and routes that connection request.With Oracle Database 10g, Oracle introduced services, which help virtualize the whole database tier to the upper-level application tiers. They also enable companies to consolidate their databases so they do not have multiple physical, discrete databases.One customer, TalkAmerica, now called Cavalier Telephone, did this early on. They consolidated 60 discrete Informix databases into a single Oracle Database and are heavy users of services. That eliminated the need to back up 60 databases: now they back up one. It eliminated much of the extract, transform, and load (ETL) activity that occurred moving data in one database, consolidating it with others, and moving it into another for reporting. The ETL activity and scripts in that environment went away. There are many advantages to moving to a consolidated database environment, which is one benefit of moving to a grid solution.If a workload spikes during the holiday season, for example, when the workload mix and volumes change, requiring additional horsepower, the test node in the cluster could be instantiated as one of the web e-commerce services and another data warehousing service could be added to one of the nodes in the cluster. This is very easily done, either manually or via Oracle Enterprise Manager, which can monitor this environment and automatically instantiate additional service instances in a cluster.Oracle ClusterwareAfter Oracle Databases and Oracle Real Application Clusters, the next element of the grid is Oracle Clusterware, which is part of an Oracle RAC environment. Oracle Clusterware can also be used on its own in a cluster to monitor various resources, like applications, and can be used to monitor and failover application instances in the cluster.Figure 7 shows Oracle Clusterware running across a four-node cluster, with Protected Apps A, B, C, and the database. If Protected App A fails, or the physical node of the cluster fails, Oracle Clusterware will track that event and restart either that node or that application, or move it to another running node in the cluster. If we view the database in Figure 7 as an application running on a single instance, Oracle Clusterware can be used to restart that database on this node or on another node of the instance. Oracle Clusterware can be used in a single-instance environment to fail over the database to keep applications and the database running and available in a cluster; however, it does not provide the availability and flexibility of a RAC environment. Oracle Clusterware is free if you are running an Oracle product in the cluster.Automatic Storage ManagementAnother key capability of Oracle Grid is Automatic Storage Management (“ASM”), shown in Figure 8. ASM is a volume manager and cluster file system for Oracle Datafiles that simplifies file and volume management. It is tightly integrated with Oracle Database 11g and Oracle RAC and is the foundation for the storage grid. It has a number of unique capabilities, including software mirroring, striping, and automatic rebalancing—that is, as data in a shared storage environment increases, Oracle ASM balances it. One of the key aspects of ASM is the Disk Group. Disk Groups are comprised of disks, which are LUNs provisioned by a storage administrator.ASM makes sure that the consumption and utilization of each disk in the Disk Group is evenly balanced to avoid hotspots as data grows. ASM constantly monitors and rebalances data across all the spindles or disks in a Disk Group, providing more predictable and even performance.You can easily add or remove disks from a Disk Group with a button click, using Oracle Enterprise Manager. When you add new disks to the Disk Group, Oracle ASM automatically moves data to those new disks quickly or slowly, depending on the impact you want in the production environment. If you add a disk during peak production hours, and want the least intrusion on the production workload, you would balance that slowly with Oracle Enterprise Manager. Alternatively, you could do it quickly, which allows the introduction of new generation disks very easily without taking anything down. TalkAmerica/ Cavalier Telephone was able to introduce Next Generation Storage technology in a non-disruptive fashion by adding disks to their Disk Group and then, after the rebalance, removing the older disks automatically. It is a very flexible tool for managing a shared storage environment and it is part of the Oracle Grid.The key value of ASM is manageability: simple provisioning and storage array migration up to newer technology without taking down storage and disrupting production workload. Also, storage pool cost savings come from eliminating the discrete, siloed storage many firms are using today. Because ASM is part of the Oracle Database environment, there is no additional cost.Oracle Enterprise ManagerOracle Enterprise Manager, shown in Figure 9, allows you to manage the many elements of a grid as one, or drill down to the discrete performance of the grid elements individually. The Oracle RAC Database can exist as several instances running on several discrete servers within the cluster and a manager can look at the performance of the clustered database in aggregate or drill down to the specific instances in a specific host and manage and monitor those individually.The same is true of application servers. You can look at the application server tier in aggregate or drill down to the individual instances in the app server running at the mid-tier. The same is also true of hosts: Oracle Enterprise Manager provides visibility to the host metrics: utilization I/O, memory consumption, and the like. You can look at individual applications and monitor the total response time from application to storage and back, through Beacon, which was introduced in Oracle Enterprise Manager 10g. You can also monitor groups of users, their respective authorizations, and the performance they are receiving, as shown in Figure 9.StandardizationPolicy-based standardization is another key aspect of Oracle Enterprise Manager. You can set policies around response times or other metrics. You can monitor and set thresholds for many different metrics. If those thresholds are exceeded, you can be alerted or Enterprise Manager can take action. For example, some customers using services monitor CPU utilization or session count. If the monitored metric on a cluster node exceeds the threshold, Oracle Enterprise Manager will react, for example, by executing a script that would instantiate another service instance in the cluster. There is tremendous flexibility for automatically managing and reacting to activities occurring in the grid environment.TALXOne customer who successfully moved to the grid is TALX, a division of Equifax. TALX provides outsourced payroll and HR services to mid-sized and smaller businesses online. TALX implements these payroll and HR solutions on an Oracle RAC cluster running on Windows. TALX migrated to a grid because of scalability issues and because they wanted to move to 64-bit machines when the 32-bit operating system environment was proving to be unsatisfactory. A failure in a node in their existing environment was negatively impacting their customer service and satisfaction levels. They wanted to be able to experience a two-node outage with minimal impact, so they implemented an Oracle Grid solution based on Windows Server 2003. They have six Oracle RAC clusters based on Oracle Database 10.2.0.3. They are using Oracle Clusterware, Oracle Automatic Storage Management, and Oracle Partitioning, which is a database option, and they are running services for their workload management. Oracle Enterprise Manager manages that cluster. Figure 10 shows their three different databases. They did not consolidate all their databases, but merged into three main ones: Single Sign-on is a 1 TB database; Work Number is 3 TB; and ePayroll is 750 GB, with the details shown at the bottom of the figure. They went to a grid environment for availability and ease of scalability. When they did, they had an option: stay with Windows or move to another operating system. They chose to stay with Windows because 99% of their IT operation runs on Windows, they have deep Windows experience, and it is easy to hire talent. They like the multicore processor licensing and scale-out architecture that Windows provides. They also are very comfortable with the .NET development platform and like Microsoft support for the Windows environment. Their question was: why change?TALX chose 64-bit because they needed a large number of connections. Their growth was constrained by 4 GB/process and needed to run multiple concurrent instances of Windows. When choosing Oracle RAC, the expense of scaling up was key. They did not want a forklift upgrade or an SMP environment. They went to Oracle RAC to allow them to do fine-grained scaling on commodity x86 machines. They wanted the higher memory that 64-bit environments offer and that Oracle RAC is able to take advantage of by having very large shared caches per machine and fast recovery time. Oracle RAC offers a completely transparent failover in the event a server or database instance goes down.Fault tolerance with Oracle RAC is significantly better than with the fail-safe environment, which was the cold failover environment for the 32-bit machines from which TALX migrated. They have no memory constraints using 64-bit machines. They plan to move up to 48 GB servers and believe that Windows is an excellent platform for their high-availability Oracle RAC environment. They were able to achieve the benefits of grid computing using the familiar Windows operating system. By not having to change their applications, they were able to migrate their single- instance applications into a grid environment with no changes. One can slide a grid infrastructure underneath the current application environment without making any changes to the curve—and still take advantage of the flexibility and high availability the grid offers.Grid Implementation StrategiesFigure 11 shows the path to the grid. On the lower left is High Availability (“HA”). This is where many companies start, but you can actually leap to any of these points. Many start with a two-node cluster running a single workload. In this example, green represents a single type of workload running in a two-node cluster. The next step would be scaling, adding nodes to that existing cluster but still running a single workload. All the workload in the figure is green, representing, for example, an ERP system. Some companies then move from a cluster to a grid, and the grid in this example has four nodes, showing different kinds of workloads, such as ERP, CRM, and batch reporting against two databases. This is a shared cluster with more than one database, with multiple workloads running across the many nodes in the cluster.The next step in grid computing is consolidating all databases into a single or shared database and running multiple workloads as services in a cluster. With this step, you can begin to extend the availability and scalability benefits of grid computing as well as the operational efficiencies a single consolidated database environment offers.Going forward, Oracle’s grid computing strategy will include a focus on Real-Time Infrastructure, in which the pool of shared resources available to both database and application server instances can be reprovisioned as the overall desired mix of application servers and database servers changes.When to ImplementIf you are upgrading a database version, especially if you are a current Oracle customer migrating from Oracle Database 9i to Oracle Database 10g or Oracle Database 11g, now is a good time to consider moving to a grid environment.If you are introducing new applications and new projects and are deciding what that infrastructure should look like, you should also consider a grid or HA initiative. Often you want to do something other than a cold failover environment; grid is a perfect option for achieving better high availability or capacity addition. If you are contemplating a forklift upgrade, you may want to move to a cluster of x86 commodity servers instead.Oracle’s Commitment to WindowsOnly Oracle offers the benefits of an Enterprise Grid Computing environment for the Windows platform. Oracle Database 11g and Oracle RAC can be easily integrated into a Windows environment and in fact have been optimized to make the most of the Windows operating system’s features. Oracle’s first relational database for Windows NT was released in 1993 and we have been providing database support for the Windows platform ever since. Today Oracle holds the number one position for the TPC-C Price/Performance benchmark on Windows; customers who deploy Oracle Database on Windows benefit from the best performance at the lowest cost.[1]Oracle is in close collaboration with Microsoft engineering teams and, in fact, Microsoft uses Oracle Database Workload to test various versions of their server offerings. We have access to prerelease drops of Microsoft products, we are part of their Premier Marketing Program, and we have a Premier Support Agreement that enables our support teams to work very closely to resolve joint customer issues.Oracle has also provided tight integration with Visual Studio and .NET for over five years, as Figure 12 shows, as well as free Oracle tools integrated with Visual Studio. This integration uses the advanced Oracle Database features and is productive for beginners and advanced users alike.SummaryGrid computing comes with many benefits. Foremost is an agile and responsive IT operation, lower server and storage costs—taking advantage of commodity x86 servers—and shared low-cost commodity storage. A higher quality of service is achieved by using a grid, because of the business continuity it offers. The fact that you can scale and do patches without taking down workloads allows rolling upgrades of operating systems, patches, and database versions. Thousands of customers are benefiting by deploying Oracle Database 11g on Windows in a grid fashion.More information can be obtained through www.oracle.com/grid, www.oracle.com/windows, and www.oracle.com/database. Feel free to download the latest version of Oracle Database 11g and experiment with it.Common QuestionsQuestion: With regard to RAC and clustering, you mentioned the ability to reduce ETL, but if you reduce ETL presence, what is replacing the movement of the data from one database to another?Answer: When you consolidate a database, you combine the schemas of the various discrete databases. It may take some application rework, but you are eliminating the need for activity against one database to be migrated through an ETL to another, another, and another database. The activity that occurs against data occurs in that one consolidated database, so the reporting against that one database is against live, up-to-the-second, fresh data. You do not have extract, transform, and load activity. That may take some consolidation, some reconciliation of various schemas, possibly some rework at the application level, but the benefit is eliminating all those discrete ETL jobs. What replaces the movement is working on one instance of the data. You are eliminating the multiple instances of customer balance, for example, because you do not have five or six discrete customer databases or customer tables; you now have only one.Question: What is the difference between grid and RAC?Answer: Grid is a philosophy. It is a way of architecting your environment, while RAC is a specific Oracle product that enables it. The grid is a way of virtualizing resources so they can be dynamically grown or shrunk as your business needs or growth lines dictate. Oracle RAC is a software product that allows you to create a cluster database that virtualizes the database layer to the upper level mid-tier layers. By pooling a number of discrete servers, that virtualized database layer creates ability and flexibility in your infrastructure, so it can grow and shrink as needed.Question: What is Oracle’s plan to support Windows Server 2008?Answer: It is being certified right now. Oracle certifies new operating systems in conjunction with the platform vendors, so we are working with Microsoft on that. We will be running through a number of certification tests in the next few months. There is no specific end-date, but it is in process.Question: Is there a performance difference between running Oracle on Windows and Oracle on Linux?Answer: No. We have run a number of tests that show an imperceptible performance difference. If you are comfortable with Windows and are intrigued by some of the benefits that Oracle Grid can offer, you should get started with that.


[1] 1. Source: Transaction Processing Performance Council (TPC), www.tpc.org, as of 06/08/07: HP ProLiant ML350G5, 82,774 tpmC, $0.84/tpmC, available 3/27/07, versus HP ProLiant ML350G5, 102,454 tpmC, $0.73/tpmC, with Oracle Database 11g Standard Edition One running on Microsoft Windows 2003 Standard x64 Edition SP1R2, available 12/31/07.


Leave a Reply

SEO Powered by Platinum SEO from Techblissonline