Loading...

Autonomic approach for fault tolerance using scaling, replication and monitoring of servers in cloud computing

by Ashima Garg (Author) Sachin Bagga (Author)

Master's Thesis 2015 56 Pages

Computer Science - Technical Computer Science

Excerpt

TABLE OF CONTENT

ACKNOWLEDGMENT

ABSTRACT

LIST OF FIGURES

LIST OF TABLES

CHAPTER 1 INTRODUCTION
1.1. Cloud Computing Evolution
1.2. Introduction to Cloud Computing
1.3. Characteristics of Cloud Computing
1.4. Cloud Computing Services
1.4.1. Software as a Service (SAAS)
1.4.2. Platform as a Service (PAAS)
1.4.3. Infrastructure as a Service (IAAS)
1.5. Deployment Models of Cloud Computing
1.5.1. Public cloud
1.5.2. Private Cloud
1.5.3. Community Cloud
1.5.4. Hybrid Cloud
1.6. Layered Architecture of Cloud Computing
1.7. Research Issues in Cloud Computing
1.8. Virtualization
1.8.1. Overview of X86 Virtualization
1.8.2. CPU Virtualization
1.9. Introduction to Fault Tolerance
1.9.1. Importance of Fault Tolerance in Cloud Environment
1.9.2. Management of Fault tolerance
1.9.3. Fault Tolerance Techniques
1.10. Structure of Thesis

CHAPTER 2 LITERATURE SURVEY

CHAPTER 3 PRESENT WORK
3.1. Problem Definition
3.2. Objectives
3.3. Framework Design
3.3.1 Vmware Workstation
3.3.2 Ubuntu
3.3.3 Haproxy
3.3.4 Docker
3.3.5 Servlet Application
3.3.6 Nginx
3.3.7 MySQL Database
3.3.8 Nagios
3.4. Interaction Diagram
3.5. Implementation of the Framework

CHAPTER 4 RESULTS AND DISCUSSION

CHAPTER 5 CONCLUSIONS AND FUTURE SCOPE

REFERENCES

ACKNOWLEDGMENT

DEDICATED TO ALL THOSE PEOPLE WHO CHANGES THE LIVE OF OTHERS BY GIVING AS AN EXAMPLE AND BY EMPOWERING OTHERS FOR GREATNESS

THANK YOU

HAVE YOU EMPOWERED SOMEBODY FOR ABSOLUTE GREATNESS TODAY?

ABSTRACT

Cloud based systems are more popular in today’s world but fault tolerance in cloud is a gigantic challenge, as it affects the reliability and availability for the end users. A number of tools have been deployed to minimize the impact of faults. A fault tolerable system ensures to perform continuous operation and produce correct results even after the failure of components up to some extent. More over huge amount of data in the cloud cannot monitor manually by the administrator. Automated tools, dynamic deploying of more servers are the basic requirements of the today’s cloud system in order to handle unexpected traffic spikes in the network. This proposed work introduces an autonomic prospective on managing the fault tolerance which ensure scalability, reliability and availability. HAProxy has been used to provide scaling to the web servers for load balancing in proactive manner. It also monitors the web servers for fault prevention at the user level. Our framework works with autonomic mirroring and load balancing of data in database servers using MySQL master- master replication and Nginx respectively. Here nginx is used to balance the load among the database servers. It shifts the request to the appropriate DB server. Administrator keeps an eye on working of servers through Nagios tool 24X7 monitoring can’t be done manually by the service provider. The proposed work has been implemented in the cloud virtualization environment. Experimental results show that our framework can deal with fault tolerance very effectively.

LIST OF FIGURES

Figure 1.1: Evolution of Cloud Computing from Centralized Computing

Figure 1.2: Top Cloud computing Services Providers

Figure 1.3: Service Models

Figure 1.4: Layered Architecture

Figure 1.5: X86 Virtualization Layer

Figure 1.6: Hypervisor Manages VMM

Figure 3.1: Traditional Approach

Figure 3.2: Framework Design

Figure 3.3: Virtualization

Figure 3.4: Virtual Machine Settings

Figure 3.5: Authentication Page for Haproxy

Figure 3.6: Configuration File of Nginx

Figure 3.7: Interaction Diagram

Figure 3.8: Configuration file of HAProxy

Figure 3.9: Showing the master status

Figure 4.1: Statistic Report of Haproxy

Figure 4.2 : Statistic Report of Haproxy with one server down

Figure 4.3: Application Running through HAProxy

Figure 4.4: Data entered in one database

Figure 4.5: Data copied on another server

Figure 4.6: Database Entries

Figure 4.7: Statistic Report of Nagios

Figure 4.8: Shows the Service State Information of server

Figure 4.9: Commands on Terminal

LIST OF TABLES

Table 1.1: Definitions of Cloud computing

Table 1.2: Deployment Models

Table 1.3: Comparison of x86 processor Virtualization Techniques

CHAPTER 1

INTRODUCTION

1.1. Cloud Computing Evolution

The conception of Cloud Computing came into existence in 1950 with the implementation of mainframe computers, accessible via thin/static clients. Since then, cloud computing has been evolved from static clients to dynamic ones from software to services [1]. The Figure 1.1 [2] given below will explain the evolution of cloud computing:

illustration not visible in this excerpt

Figure 1.1: Evolution of Cloud Computing from Centralized Computing

Centralized computing came with the idea of doing the computing at the central location with the help of terminals which are connected to central computer. The computer has control over all the peripherals may be directly or may be through terminals. The centralized computing offer the large security because all of the processing is handle over the central location. Grid computing is the gathering of all computer resources from different locations to attain the specific goal [3]. Grid computing uses the approach of parallel computing to solve the large problems. Utility computing is contain the idea of metered services and has the rapid scallability because the user has to pay for the services used by it. There is no need to pay for the extra services.

The following Table 1.1 shows the various definitions of the cloud computing with respect to the different organizations.

Table 1.1: Definitions of Cloud computing

illustration not visible in this excerpt

1.2. Introduction to Cloud Computing

Cloud computing is typically defined as the computing which includes the shared pool of resources instead of local servers. In cloud computing the word “Cloud” is the metaphor of internet so it is also known as the internet based computing . It offers various services like storage, applications and platforms. There are several benefits of cloud computing like fault tolerance, reliability, on demand service and also to achieve the peak-load handling. Cloud computing provides the reliability at the greater extent by optimizing the infrastructure. Data on the cloud migrate very quickly for the availability to the user. This migration is transparent to the user. The environment for the application where these has to run should be compatible with the environment of cloud.

The following Figure 1.2 [6] shows the examples of various cloud service providers. Among all of these the Amazon is the giant cloud service provider which came into existence in 2002. It provides a number of cloud services like large computations and to storing the big data with high level security. Then after this in 2006, Amazon introduces its new product EC2 (Elastic Compute Cloud). This offers commercial we services. In 2009, the well known company Google has start offering its Cloud based services like Google Apps that offer the facilities like email, data storage and many of the shared services. The cloud providers provides various services like calendar and spread data sheet. After these Microsoft launched their own cloud Microsoft Azure in February 2010. It provides both of the service infrastructure as a service and platform as a service (PAAS) which includes business analytics, access management, data management and identity.

illustration not visible in this excerpt

Figure 1.2: Top Cloud computing Services Providers

1.3. Characteristics of Cloud Computing

(i) Cost Degradation: - Cloud computing provides pay-per-use facility. It is very beneficial for the novice not to invest the large amount of money on the infrastructure and the technology. One can easily access the required data or resources for the time to use it.
(ii) Rapid Elasticity and Scalability: - Rapid elasticity provides the quick scale in and scale out of the services. This scalability helps the user to easily access the data from anywhere after paying the cost. This scalability is transparent to consumer.
(iii) Improved Reliability: - Services offered by the cloud to the users are in continuation i.e. without any interruption. High reliability is achieved by maintaining the replicas of data so that user can access it from anywhere at any time.
(iv) Measured Services: - Cloud computing provides the metered services. Users have to pay for the services they want to access. Rather than purchasing the whole software or infrastructure, one can utilise the services from cloud at anywhere and anytime. No need for the things that we don’t want to use.
(v) Resource Pooling: - A large number of resource and service opportunities are there present in the cloud. These all services are available among all the users. Multiple users can access the same services or different at the same time from anywhere.

1.4. Cloud Computing Services

On the basis of services provided by the cloud, it has three types. It includes the knowledge about the property of the services. The different services provided by cloud are: SAAS, PAAS, IAAS has also shown in the below Figure 1.3.

1.4.1. Software as a Service (SAAS)

Saas service model is also known as a delivery model. With the help of this model multiple numbers of users can access the data or software at any time provided by the service providers. Users can access the data remotely. To run the given the software, the user can also use the Pass and Iaas services. In today’s market there are various companies whom provide the software like google, Microsoft, amazon etc. Gmail is the well known example of this.

1.4.2. Platform as a Service (PAAS)

In Paas, the providers set up the platform (like API’s) for the cloud users where the users can create their own applications and run (or implement) them on the infrastructure provided by the provider. There are various tools available for the users online. With the help of Paas one can create the innovative application at low cost and quickly. Google App Engine and Windows azure are famous paas offering.

illustration not visible in this excerpt

Figure 1.3: Service Models

1.4.3. Infrastructure as a Service (IAAS)

It is known as delivery service model or hardware as a service. In Iaas, the cloud providers offer various applications and resources to the users. The organizations offer the storage devices, NIC’s, Processors etc. to the clients. The clients can modify their systems according to their operating system configurations and their needs. The users can create their own applications and put them on the cloud. Examples of Iaas are Amazon EC2, Nimbus, Rackspace etc.

1.5. Deployment Models of Cloud Computing

Deployment models describe the location of the cloud i.e. where the cloud is hosted. The following Figure 1.4 shows the four types of deployment models:

1.5.1. Public cloud

The public cloud is hosted by the third party or outside the vendor’s premises. The services of public cloud are available and shared by the all public users. The users have no knowledge about the cloud where it is hosted and who manage it. It is very cost-effective because you have to just pay for what you have to use. Examples of public cloud are Blue cloud by IBM, Google App Engine.

1.5.2. Private Cloud

Private clouds are basically for the organizations to keep the confidential data on them that they do not want to share with others. These are of two types: On-premises cloud and externally hosted cloud. In on-premises cloud, the cloud is hosted by the company itself. All of the work of data management is done by the organization. It is more secure. In the externally hosted, the cloud is hosted by the third party who works for the organization. The work of data management on cloud is done by them. Externally hosted are less expensive as compared to the on-premises. Examples are openstack, Amazon private cloud.

The Table 1.2 given below describes the brief summary of deployment models.

Table 1.2: Deployment Models

illustration not visible in this excerpt

1.5.3. Community Cloud

Community clouds are formed for the companies having the common requirements and hosted by the third party. It is also used by the organizations having the joint venture or doing work on the similar project. Example of community cloud is government organizations.

1.5.4. Hybrid Cloud

Hybrid cloud contains both the public cloud and the private cloud or community cloud and private cloud. Some organizations have to share some data with the other one that reside on their private cloud, then that some data is put on the public cloud for sharing which act as a hybrid cloud.

1.6. Layered Architecture of Cloud Computing

The following Figure 1.4 [8] shows the service – oriented layered design of cloud computing. The aim of user-level middleware [9] is to providing the paas capabilities. The top layer aimed to provide the software services through the services provided by the lower layer services [10]. Paas/Saas services are mostly developed and provide by the third party that are different from Iaas providers.

(i) User-level middleware: - The frameworks such as AJAX that helps the developers to create the attractive or cost effective user-interface applications are included in this layer. Various types of composition tools and programming environments are also include in this layer.

(ii) Core-level middleware: - This layer includes the programming environments that helps to run the applications built using the user-level layer. The examples of services managed at this layer are Amazon EC2, Google App Engine etc.

(iii) System Level: - In cloud computing the computing power required is supplied by the data centre. At this system level [11] there are large number of physical resources attached that power the data centres.

illustration not visible in this excerpt

Figure 1.4: Layered Architecture

1.7. Research Issues in Cloud Computing

As cloud computing is an emerging technique in IT industry but there are some issues in it. The various research issues related to the cloud computing is following:

(i) Data Security Issue: As in the cloud computing due to availability of various resources the data is stored online by the clients but they do not have any knowledge about the location of data whether the data is secure or not [5], [12]. Due to this security issue data can be used by the others. Due to this many organizations feels unfriendly to share information on the cloud. It offers many technologies which require high level of security [13] like transaction management, resource scheduling etc.
(ii) Privacy Issues: Many organizations use the cloud infrastructure to store the data but there are privacy [14] issues in it. Sometimes due to work load the data has to migrate from one cloud to another which requires authentication. Data on public cloud is shared by the multiple users that require privacy and safety.
(iii) Data Management Issues: Large amount of data is stored [15] on the cloud and management of that big data is also a challenge. Most of the organizations store their data on the cloud but they don’t know anything about the location of cloud where it resides. It may be in another country so it becomes difficult for the vendor to manage the data. So the chances of data stealing get increased. If the organizations want to modify the data or want to use the benefits of cloud for resource allocation then they have to consult with the third party to do the changes.
(iv) Availability Issues: It deals with the availability of the same cloud at the same time. The vendors have large clouds provided by them that are geographically spread. It is not sure that the cloud used to store the data [16] will be same next time. The vendors also don’t know about the location of the data stored on the cloud, it can be anywhere in the world.
(v) Performance Issues: The performance issues in the cloud computing is to measure the performance for test and development engineers. The measuring of performance is difficult because the location of cloud is not known.
(vi) Fault Tolerance Issues: Cloud providers have to be suffering from various faults. For the system to make fault tolerable it should perform the working even after the occurrence of faults. It requires very high consideration because the infrastructure of cloud is made of various hardware and software. Moreover the data of cloud is not residing on the single data centre therefore the chances of occurrence of fault increase.

1.8. Virtualization

Virtualization is the concept of sharing the resources of one operating system with another operating system. In computing virtualization [17], [18] means to create the virtual version of resources like network interface card, processor, memory, hard disk etc. and even the operating system. It distributes the resources among two or more executable environments.

1.8.1. Overview of X86 Virtualization

illustration not visible in this excerpt

In this X86 virtualization, a virtual layer is merged between the hardware and the operating systems for sharing the resources. The host operating system which has to share the resources among the guest operating systems install the virtual environment and runs the virtualization layer as the application on the top of the operating system, whereas the hypervisor has direct access to share the hardware resources with the operating system. The Figure 1.5 given below helps to understand the sharing of resources in X86 architecture.

Figure 1.5: X86 Virtualization Layer

In case of hypervisor there is no need to run through the operating system layer. Therefore, as compared to X86 virtualization the hypervisor provides more efficient architecture. It delivers better performance, scalability and flexibility. The functionality of the hypervisor varies according to the architecture and implementation. Figure 1.6 shows the virtualization of resource using the hypervisor.

illustration not visible in this excerpt

Figure 1.6: Hypervisor Manages VMM

1.8.2. CPU Virtualization

The X86 operating systems are designed in such a way that they directly run on the bare-metal hardware, so they assume that they own the full hardware of the system. In this case for the virtualization there exist the virtualization layers under the operating system to share the resources among the virtual machine.

There exist three alternative techniques for the CPU virtualization on the X86 architecture that are following:

(i) Full Virtualization using Binary Translation

The technique binary translation and direct execution enables the vmware to do the virtualization of any X86 operating system. The instructions that cannot be virtualized are replaced with the new sequence of instructions by translating the kernel code. These new instructions have great impact on the virtual hardware. The monitor of all the virtual machines allow the virtual machines to share the services if the physical system i.e. virtual hardware and the memory management. In full virtualization, the guest operating system is not aware about the virtualization and no need of any modification. There is no need of hardware support and operating system support in full virtualization to virtualize the privileged instruction.

(ii) OS Assisted Virtualization or Paravirtualization

The word para means “beside” or “alongside”. So it becomes “alongside virtualization”. Paravirtualization is done to improve the efficiency during the communication between guest operating system and the hypervisor. There is some difference between full virtualization and paravirtualization because in full virtualization the guest operating is not aware of the virtual environment and for the sensitive instructions to be virtualized binary translation has been done. In paravirtualization there is no need any binary translation. The performance of the paravirtualization varies according to the workload.

(iii) Hardware Assisted Virtualization

In hardware virtualization there is no need of any binary translation and the paravirtualization because in this the sensitive calls are set in such a way that they trap the hypervisor directly.

(a) Memory Virtualization

After the CPU virtualization the next one is memory virtualization i.e. critical. In this the sharing of physical memory held which is dynamically allocating to the virtual machines. In today’s X86 systems MMU (Memory Management Unit) is used to optimize the performance of VM’s. If one has to run the multiple virtual machines on the single host operating system then memory virtualization has been done with the help of MMU.

Comparison of all the virtualization techniques described in the Table 1.3. In this the virtualization techniques of x86 processor are compared i.e. comparison between Full Virtualization, Hardware Assisted Virtualization and OS Assisted Virtualization.

Table 1.3: Comparison of x86 processor Virtualization Techniques

illustration not visible in this excerpt

(b) Device and I/O Virtualization

The most important requirement in virtualization is the virtualization of I/O devices. The physical hardware are virtualizes by the hypervisor and provides each virtual machine a set of virtual devices. This I/O virtualization includes the sharing of the NIC’s, Ethernet, sound card, USB etc.

1.9. Introduction to Fault Tolerance

A system is said to be fault tolerable if the system keeps on working even after the occurrence of fault like failure of any component, it may be hardware or software. Even if it not able to remove the fault [19], [20] but it still permit the system to perform the task at low efficiency and reduced rate.

1.9.1. Importance of Fault Tolerance in Cloud Environment

Cloud computing is an emerging technique in today’s scenario so it also become possible to run the real time applications on the cloud environment [21], [22]. The advantage of cloud computing i.e. scalability makes the real time applications to take the benefit of cloud computing. For the real time applications [23] to run over the cloud it is very necessary to make it fault tolerable and efficient otherwise the chances of data lose increases at a greater rate. As in the real time computing the work is done with the time bonding. So the system should have to perform the task within given time without any latency or data lose. For the system to make fault tolerable the technique used is ‘replication’.

1.9.2. Management of Fault tolerance

In the working system fault can be occur at anytime at any stage. In the cloud computing environment there are three types of fault tolerance management techniques used.

(i) Application Fault Tolerance: This type of the fault is occurring at the consumer level. The technique applied for the recovering of the fault depends upon the nature of the application. For the applications to be working in the failure stage, sensors are attached with all applications that are special software deploy by the consumer. The sensor executes the method for the repairing of application to recover the fault.

(ii) Virtual Machine Fault Tolerance: The virtual machine FT occurs at both sides: at consumer level and cloud provider level. At the consumer side the fault is repaired by the sensors which check the virtual machine [24] during the lifetime but the problem is that if the VM fails then chances of failure of sensor also gets increased. Then the fault can be repaired in the following manner:

(a) It request for new VM.
(b) Send the request to cloud to free the failed VM.

At the cloud provider level the VM [25] faults are repaired more accurately than at the consumer level. At the cloud provider level all the VM’s share the single hypervisor. In contrast to VM FT at consumer level it has decreased number of VM sensors because all the sensors are integrated in the same hypervisor [24]. It decreases the time complexity.

(iii) Physical Machine Fault Tolerance: The detection of hardware failure at the consumer level is difficult to recover because this type of failure is visible is only at the consumer side. The sensors deployed in the VM’s cannot repair the fault. All of the virtual machines get failed with the failure of the physical machine. For the systems to make fault recoverable the consumers should deploy the some restrictions on the location of the sensors, which integrates between the both cloud provider and consumer. At the cloud provider the hardware failure can be repaired by shifting the entire work load on the new system. In this the check points are applied at the VM’s that the new system start working from the point where the failure occurs.

1.9.3. Fault Tolerance Techniques

For the systems to make fault tolerable, there exist various fault tolerance techniques. A system is said to be fault tolerable if it keeps on performing the task even after the occurrence of fault. These fault tolerance techniques are applied during the development of the cloud. First is the reactive technique, in which the faults are removed after its occurrence. The various techniques lies under this technique are Replication, job migration, retry, Check point/Restart, SGuard etc. In checkpoint/Restart, it restarts the work from the point where it gets failed. The fault has not got removed completely; there is the probability of occurrence of fault. On the other hand, the pro-active techniques of fault tolerance [26] are used to remove the fault before the job to start. It predicts the fault before occurring it and repairs it. The main advantage of pro-active fault tolerance is that it is used to remove the faults on the distinct applications [27]. The policy of pro-active fault tolerance is to predict the faults before occurring them repair them and replace or change the suspicious component with the new one. This FT can be achieved with software rejuvenation, self-healing and pre-emptive migration. The explanation of all the above techniques is as follows:

(i) Checkpoint/Restart: The checkpoint is used when the task completely fails then it restarts the task from the point where it gets failed. Check points are applied in it which helps it start from the failing node rather from the beginning [28].
(ii) Replication: Various replicas of the task are maintained which run on the different resources which can be used when one of the task get failed then other can be used.
(iii) SGuard: To recover the fault it used the rollback technique. With this more resources can be available.
(iv) Job migration: In this job is migrated on another machine or server when it gets failed [29].
(v) Retry: It is the technique that used to run again the failed task on the same cloud resources or machine.
(vi) Pro-active FT using Self-Healing: In the self-healing technique multiple instances are run on the multiple machines and use any of the instances to recover the fault [30], [31].
(vii) Self-Rejuvenation: Self-rejuvenation deals with restart the system with clean state. In this system reboots itself to recover the fault [32].
(viii) Pre-emptive Migration: It works is continuous loop manner that it checks or analyze the application in a loop [33].

1.10. Structure of Thesis

The various chapters in the thesis are arranged in the following manner

Chapter 2: This chapter of the thesis explains the existing research and literature survey in detail.

Chapter 3: This chapter describes the problem statement, objectives, framework design, interaction diagram and the implementation of the framework.

Chapter 4: This chapter includes the result of the given approach.

Chapter 5: This chapter includes the future scope and conclusion of the given approach.

CHAPTER 2

LITERATURE SURVEY

Hines et al. (2009) in the paper presented the technique for the post-copy migration of virtual machines across the gigabit LAN. In this it postpone the migration of virtual machines contents until the processor state sent to the target host. In this post-copy migration is compared with the pre-copy migration. Post-copy migration provides the win-win strategy which reduces the time during migration by maintaining the liveliness of the virtual migrations. In contrast to pre-copy, it first copies the content using the multiple iterations and the sent to the final host.

Sidiroglou et al. (2009) represented the technique which includes the rescue points that help in recover the fault in softwares. The fault can be of any type i.e. unknown faults. These faults occur while maintaining the availability and the system integrity. For recovery it mimics the system behaviour under the known conditions of errors. As it uses the rescue points to restore the execution at a particular point and helps the program to run after recovery.

Kaushal and Bala (2010) presented in their paper a fault tolerance solution to handle the faults at the customer level by maintaining the replication the server queries. In this they used the haproxy as a load balancer. The given work is only applicable for the saas cloud here is not any partition between the cloud provider and the customer.

Zhao et al. (2010) represented the idea of Low Latency Fault Tolerance (LLFT). The LLFT provides the fault tolerance for the distributed applications using the leader/follower approach. It consists of low latency messaging protocol, a virtual determinizer framework and a leader – determined membership protocol. The messaging protocol provides the reliable, ordered message directly group to group multicast. The virtual determinizer captures the ordering at the primary replica and enforces the backup to maintain the replicas. The recovery and the reconfiguration service are provided by the membership protocol when any replica is faulty. The LLFT maintains the replication consistency and end-to-end latency.

[...]

Details

Pages
56
Year
2015
ISBN (eBook)
9783668271098
ISBN (Book)
9783668271104
File size
2.1 MB
Language
English
Catalog Number
v336348
Grade
Tags
autonomic

Authors

Share

Previous

Title: Autonomic approach for fault tolerance using scaling, replication and monitoring of servers in cloud computing