Loading...

High-Performance Persistent Storage System for BigData Analysis

Master's Thesis 2014 104 Pages

Computer Science - Applied

Excerpt

TABLE OF CONTENTS

CERTIFICATE

DECLARATION

ACKNOWLEDGEMENT

ORGANIZATION CERTIFICATE

ABSTRACT

Chapter 1 - INTRODUCTION

Chapter 2 - LITRATURE REVIEW
2.1 Advantages of using Hadoop
2.2 Big Data
2.3 Project Architecture
2.4 Goal: HDFS-HMFGR
2.5 INTERCONNECT TECHNOLOGIES (Hardware Solution)
2.5.1 Traditional Interconnect 10GigE Network
2.5.2 Infiniband Technology
2.5.2.1 IPoIB Interconnect Technology
2.5.2.2 RDMA-IB Interconnect Technology
2.6 Memory Allocation with MemCached (Software Solution)

Chapter 3 - Experimental Testbed System
3.1 An Insight into SSD and HDD

Chapter 4 - Installation, Designing and Implementation of System
4.1 Set a LAN and System service of Network
4.2 Hadoop Installation Guide
4.2.1 Steps On each Machine
4.2.1.1 Install prerequisites
4.2.1.2 Adding a dedicated Hadoop system user
4.2.1.3 Setup hostname
4.2.2 Steps On Master
4.2.2.1 Install Hadoop
4.2.2.2 Configuration ssh
4.2.2.3 Install NFS
4.2.2.4 Configuration
4.2.3. Run Hadoop
4.2.3.1 Run HDFS
4.2.3.2 Rum Map Reduce Job
4.2.3.3 Run All Daemon
4.2.3.4 Hadoop Web Interfaces
4.3 Enable Ethernet and InfiniBand (SR-IOV)
4.3.1 Enable Ethernet SR-IOV
4.3.2 Enable InfiniBand SR-IOV
4.4 Steps to setup IPoIB on Quanta machines
4.5 Install MemCached on CentOS

Chapter 5 - Benchmarking
5.1 ATTO Disk Benchmarking
5.2 HD Tune Pro:
5.3 Linux Disk Utilities:
5.4 HiBench (Hadoop Benchmarking Suit)
5.4.1 Micro-Benchmarks
5.4.1.1 Sort
5.4.1.2 Word Count
5.4.1.3 TeraSort
5.4.1.4 Enhanced DFSIO
5.4.2 Web Search:
5.4.2.1 Nutch Indexing
5.4.2.2 Page Ranking
5.4.3 Machine Learning:
5.4.3.1 Bayesian Classification
5.4.3.2 K-means Clustering
5.4.4 Analytical Query
5.4.4.1 Hive Join
5.4.4.2 Hive Aggregation

Chapter 6 - Performance Evaluation of SSD and HDD on Hadoop using 10GigE
6.1 Sort Work Load:
6.2 Word Count Work Load
6.3 Tera Sort Work Load:

Chapter 7 - Performance Evaluation of SSD and HDD on Hadoop using IPoIB
7.1 Sort Work Load:
7.2 Word Count Work Load
7.3 Tera Sort Work Load

Chapter 8 - Performance Evaluation of SSD and HDD on Hadoop by RDMA-IB
8.1 Sort Work Load
8.2 Word Count Work Load
8.3 Tera Sort Work Load

Chapter 9 - Performance Comparison between 10GigE and IPoIB
9.1 Performance Comparison of Sort Workload
9.1.1 Performance Comparison of SSD
9.1.2 Performance Comparison of HDD
9.2 Performance Comparison of WordCount Workload
9.2.1Performance Comparison of SSD
9.2.2 Performance Comparison of HDD
9.3 Performance Comparison of TeraSort Workload
9.3.1 Performance Comparison of SSD
9.3.2 Performance Comparison of HDD

Chapter 10 - Performance Comparison between IPoIB and RDMA-IB
10.1 Performance Comparison of Ssort Workload
10.1.1 Performance Comparison of SSD
10.1.2 Performance Comparison of HDD
10.2 Performance Comparison of WordCount Workload
10.2.1Performance Comparison of SSD
10.2.2 Performance Comparison of HDD
10.3 Performance Comparison of TeraSort Workload
10.3.1 Performance Comparison of SSD
10.3.2 Performance Comparison of HDD

Chapter 11 - Performance Comparison between 10GigE and RDMA-IB
11.1 Performance Comparison of Sort Workload
11.1.1 Performance Comparison of SSD
11.1.2 Performance Comparison of HDD
11.2 Performance Comparison of WordCount Workload
11.2.1Performance Comparison of SSD
11.2.2 Performance Comparison of HDD
11.3 Performance Comparison of TeraSort Workload
11.3.1Performance Comparison of SSD
11.3.2 Performance Comparison of HDD

Chapter 12 - Overall comparison of 10GigE, IPoIB and RDMA-IB. `

Chapter 13 - Conclusion

Chapter 14 - Future Scope

APPENDICES

REFERENCES

ABSTRACT

Hadoop and Map reduce today are facing huge amounts of data and are moving towards ubiquitous for big data storage and processing. This has made it an essential feature to evaluate and characterize the Hadoop file system and its deployment through extensive benchmarking.

We have other benchmarking tools widely available with us today that are capable of analyzing the performance of the Hadoop system but they are made to either run in a single node system or are created for assessing the storage device that is attached and its basic characteristics as top speed and other hardware related details or manufacturer’s details.

For this, the tool used is HiBench that is an essential part of Hadoop and is comprehensive benchmark suit that consist of a complete deposit of Hadoop applications having micro bench marks & real time applications for the purpose of benchmarking the performance of Hadoop on the available type of storage device (i.e. HDD and SSD) and machine configuration. This is helpful to optimize the performance and improve the support towards the limitations of Hadoop system.

In this research work we will analyze and characterize the performance of external sorting algorithm in Hadoop (MapReduce) with SSD and HDD that are connected with various Interconnect technologies like 10GigE, IPoIB and RDBA- IB. In addition, we will also demonstrate that the traditional servers and old Cloud systems can be upgraded by software and hardware up gradations to perform at par with the modern technologies to handle these loads, without spending ruthlessly on up gradations or complete changes in the system with the use of Modern storage devices and interconnect networking systems.

This in turn reduces the power consumption drastically and allows smoother running of large scale servers with low latency and high throughput allowing use of the utmost power of the processors for the big data flowing in the network.

LIST OF FIGURES

Figure 2.1) A typical Hadoop Software Stack

Figure 2.2) High Performance Big Data Appliance Software Stack

Figure 2.3) Switched Fabric Architecture

Figure 2.4) InfiniBand Layers

Figure 2.5) InfiniBand Architecture

Figure 2.6) Various Interconnect Technologies and architecture

Figure 2.7) Memory allocation in MemCached

Figure 3.1) Snapshots of Quanta Stack Server.

Figure 3.2) Setting up 4 nodes on Quanta Server with other Lab Mates

Figure 3.3) Quanta Server Stack Placement Diagram

Figure 3.4) NAND Flash based Solid State Drive

Figure 3.5) Platter Based Hard Disk Drive

Figure 3.6) Power Usage in Watts

Figure 5.1.) ATTO Benchmarking of 100 Gb Solid State Drive

Figure 5.2.) ATTO Benchmarking of 100 Gb Hard Disk Drive

Figure 5.3) 100Gb SSD performance HD TUNE

Figure 5.4) 100Gb HDD performance HD TUNE

Figure 5.5) Linux Disk Utilities Benchmarking of SSD

Figure 5.6) Linux Disk Utilities Benchmarking of HDD

Figure 5.7) Sort Workload Graph

Figure 5.8) WordCount Workload Graph

Figure 5.9) TeraSort Workload Graph

Figure 6.1.) SORT workload on Solid State Drive

Figure 6.2) SORT workload on Hard Disk Drive

Figure 6.3) WORD COUNT workload on Solid State Device

Figure 6.4) WORD COUNT workload on Hard Disk Drive

Figure 6.5) TERASORT workload on Solid State Drive

Figure 6.6) TERASORT workload on Hard Disk Drive

Figure 7.1) SORT workload on Solid State Drive

Figure 7.2) SORT workload on Hard Disk Drive

Figure 7.3) WORD COUNT workload on Solid State Device

Figure 7.4) WORD COUNT workload on Hard Disk Drive

Figure 7.5) TERASORT workload on Solid State Drive

Figure 7.6) TERASORT workload on Hard Disk Drive

Figure 8.1) SORT workload on Solid State Drive

Figure 8.2) SORT workload on Hard Disk Drive

Figure 8.3) WORD COUNT workload on Solid State Device

Figure 8.4) WORD COUNT workload on Hard Disk Drive

Figure 8.5) TERASORT workload on Solid State Drive

Figure 8.6) TERASORT workload on Hard Disk Drive

Figure 9.1) Improvement in Sort Workload execution on SSD using 10GigE and IPoIB

Figure 9.2) Improvement in Sort Workload execution on HDD using 10GigE & IPoIB

Figure 9.3) Improvement in WordCount Workload execution on SSD using 10GigE & IPoIB

Figure 9.4) Improvement in WordCount Workload execution on HDD using 10GigE & IPoIB

Figure 9.5) Improvement in TeraSort Workload execution on SSD using 10GigE & IPoIB

Figure 9.6) Improvement in TeraSort Workload execution on HDD using 10GigE & IPoIB

Figure 10.1) Improvement in Sort Workload execution on SSD using IPoIB and RDMA-IB

Figure 10.2) Improvement in Sort Workload execution on HDD using IPoIB & RDMA-IB

Figure 10.3) Improvement in WordCount Workload execution on SSD using IPoIB & RDMA-IB

Figure 10.4) Improvement in WordCount Workload execution on HDD using IPoIB & RDMA-IB

Figure 10.5) Improvement in TeraSort Workload execution on SSD using IPoIB & RDMA-IB

Figure 10.6) Improvement in TeraSort Workload execution on HDD using IPoIB & RDMA-IB

Figure 11.1) Improvement in Sort Workload execution on SSD using 10GigE and RDMA-IB

Figure 11.2) Improvement in Sort Workload execution on HDD using 10GigE & RDMA-IB

Figure 11.3) Improvement in WordCount Workload execution on SSD using 10GigE & RDMA-IB

Figure 11.4) Improvement in WordCount Workload execution on HDD using 10GigE & RDMA-IB

Figure 11.5) Improvement in TeraSort Workload execution on SSD using 10GigE & RDMA-IB

Figure 11.6) Improvement in TeraSort Workload execution on HDD using 10GigE & RDMA-IB

Figure 12.1) SSD performance graph with various combinations

Figure 12.2) HDD performance graph with various combinations

Figure 12.3) SSD & HDD cluster performance graph with various combinations

Figure 12.4) Cluster performance graph with various combinations

Figure 12.5) HDD performance graph with various combinations

LIST OF TABLES

Table 2.1) Difference between Fabric and Bus Technology

Table 3.1) Quanta Server Table

Chapter 1 - INTRODUCTION

In today’s digital age, a big measure of data is been processed on the internet. Allotting optimal data processing with advantageous response duration acts the output to the requests by the consumer. There are frequent users that assay to enter the alike data above the web and it is a challenging task for the server to deliver optimal result. The large amount of data the internet has to deal with every day has made conventional solutions extremely uneconomical. There are difficulties like processing large documents split into many disaffiliated sub-tasks, which are segmented with the available nodes, and processed in parallel. Due to this, MapReduce and Hadoop came into existence.

Hadoop is a free-of-cost, programming architecture that is java-based and wires the handing out of huge amounts of information in a scattered setting.1 It’s developed by Apache. Hadoop has large capacity to execute programs on multiple computers that have numerous nodes and involves multiple pentabytes of data. The Hadoop Distributed File System helps faster info transport speeds between the computers & makes cluster toward persistent functioning performances nonstop in condition of computer failure. The system actually reduces the risk of complete system breakdown also when a noteworthy no. of systems (nodes) are in-operative.2

Hadoop was motivated by MapReduce (Figure 1) that was introduced by Google, a software structure that breaks up a program into small pieces or threads. All these broken pieces of codes have the ability to be run and processed by any node system in the cluster.3

MapReduce based research has being done to achieve effective processing of information and large amounts of data on Hadoop. Hadoop executes on the collection of systems that have the ability to handle and process huge quantities of data and can handle distributed programming.4 In the last few years, many researchers have been done to improve the throughput and efficiency hadoop. One of hindrances is the performance issues of the storage device used as it is connected to the system by a slower connecting interface like Bus. Even the difference in the Devices used for storage creates the hindrance.5

The performance of the Hadoop system is also bound on the type of workload that we consider. This is why we consider HiBench as the standard model for testing Hadoop Distributed File System (HDFS). In this paper, we try to study and evaluate the performance of Hadoop Distributed File System on a Hadoop Cluster system that contains Solid State Drive & Hard Disk Drive by optimizing each parameter on HiBench.

Technology has advanced fast, and datasets have grown even faster as it is easier to generate and incarcerate data. The large Big Data, are a warehouse of information. The primary challenge in the investigation of Big Data is to conquer the I/O blockage present on modern systems.6 Lethargic I/O systems overpower the very use of having high end processors. They cannot provide data fast adequate to utilize all of the accessible processing power. Outcome of this is wastage of power and increases in the price of in commission large clusters. An approach is the use of Modern interconnects like IPoIB and RDMA-IB in place of Traditional Interconnects like 10GigE.

Chapter 2 - LITRATURE REVIEW

Today, big data is generated in a rapid way with a variety of formats due to the development of hardware devices (mobile devices, wireless sensor networks, etc.) and software systems (Google services, Facebook, Twitter, Blog, etc.). Examples of big data are electronic business transactions and browsing records, streaming data, different kinds of contents (audio, video, text, etc.) in a social network etc. With the deployment of wireless sensor networks, IoT, 4G and 5G, we can expect big data will be generated from everywhere in the world. It is difficult for a single data center to store and analyze big data generated in such environment. Several data centers, which located in different locations geographically, cooperated to each other to process big data as a possible solution for such difficulty.

2.1 Advantages of using Hadoop

Hadoop has got a huge force on the great Web 2.0 organizations like Google and Facebook that uses Hadoop to accumulate and supervise their enormous data sets. It has also established valuable for many other conventional enterprises. The five big advantages of Hadoop are6 7:

- Scalable

It can accumulate and allocate very big data sets transversely numerous inexpensive servers that work and compute in parallel. Dis-similar to the conventional relational data base sys. (RDBMS) which can not range to compute huge quantities of info and data, Hadoop helps to execute programs over hundreds of systems concerning penta bytes of data.

- Cost efficient

Hadoop gives an economical place to keep outsized quantities of data. The issue with conventional RDBMS is the high cost to range a big level in order to practice huge loads of info and data that is easily possible in the Hadoop System.

- Flexible

Hadoop provides easy access to novel information foundation and work with diverse type of info and data (planned and unplanned) for creating values from the set of data. Thus Hadoop could be used to obtain helpful imminence from sources.

- Swift

Hadoop's exclusive storing technique is made on a disseminated file arrangement with the aim of essentially 'mapping' the data anywhere it is positioned on a group. The gear used for data handing are located on the similar servers anywhere the data is positioned, this results in a lot earlier data dispensation.

- Flexible to Breakdowns

A benefit of using Hadoop is its flexibility to breakdowns. While info is shared with a single system, the same info is as well copied to further systems in the group with the purpose that in case of breakdown there does exists an additional duplicate.

2.2 Big Data

There is no clear definition for big data. In general, big data is the word used for a group of big and compound data deposits that becomes hard to practice by means of available database administrative tools or else traditional data dispensation program which has five characters (5V), viz:8 9 10

- Volume: This is the most obvious character of big data, that is, the data volume is huge.
- Velocity: This is the character of data effectiveness for a given period of time.
- Variety: This is the content and format character of data.
- Veracity: This is the reliability character of data.
- Value: This is the most important character of data.

Overall, the grand challenge of big data is- How to derive valuable information from data with volume, variety, and veracity characters in a given period of time.

The most commonly used big data software is Apache Hadoop software stack. Figure 2.1 shows a typical Apache Hadoop software stack, including Hive, HBase, MapReduce framework, and HDFS. These components in Apache Hadoop software stack have several boundaries. For example,

- Hive only provides SQL-like syntax. It does not support full SQL syntax;
- HBase lacks multi-tenancy, fault-tolerance, transaction, caching and reconfigurable mechanisms.
- Map Reduce lacks run-time optimization, caching and reconfigurable mechanisms.
- HDFS does not support heterogeneous storage, multi-tenancy, fault-tolerance, global address and reconfigurable mechanisms.

Summing up the limitations, The Apache Hadoop software stack cannot meet the requirements of various big data applications since it is only suitable for applications that require performing a little amount of computation but need to process big data.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.1) A typical Hadoop Software Stack

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.2) High Performance Big Data Appliance Software Stack

2.3 Project Architecture

The design of HPBDA will be based on Apache Hadoop software stack. We will enhance and extend the Apache Hadoop software stack to meet the hardware and software co-design requirements of HPBDA. (Figure 2.1) shows a typical Apache Hadoop software stack. (Figure 2.2) shows the enhanced and extended Apache Hadoop software stack that we will develop includes:

- SQL-to-NoSQL Translator: A SQL to NoSQL translator that supports full SQL operations to allow a SQL program accessing HBase without modification;
- Data Adapter: Provide data connector and data converter for big data import and export;
- HBase-MFTCR: An enhanced HBase with multi-tenancy (M), fault-tolerance
(F), transaction (T), caching (C) and reconfigurable (R) mechanisms;
- MapReduce-OCR: an enhanced MapReduce framework with run-time optimization (O), caching (C), and reconfigurable (R) mechanisms;
- HDFS-HMFGR: An enhanced HDFS with heterogeneous storage (H), multi- tenancy (M), fault-tolerance (F), global address and reconfigurable (G) mechanisms;
- Accelerator Library: A suite of GPU and Xeon Phi libraries to speed up the execution of computations, data access, and data transmission of big data applications;
- System Monitor& Analyzer: A suite of system monitoring and analysis tools to record and analyze all events occur in HPBDA for further system optimization.

2.4 Goal: HDFS-HMFGR

This research project provides storage service in the big data analytic platform, and conduct research on I/O performance optimization.

Here we have three main research directions:

- First, build a persistent storage service based on HDFS distributed file system, and enhance HDFS by adding new features, such as multi-tenancy and fault tolerance.
- Second, study and verify how to improve the storage system performance using high performance devices, such as SSD and InfiniBand.
- Third, optimize performance in heterogeneous storage environment.
The main working items are (1) Heterogeneous Storage with SSD; (2) Global Address with RDMA & InfiniBand; and; and (3) Auto Data Placement Configuration.

The high-performance big data appliance proposed in this project can be distinguished from others by its four features

(1) High performance and reliable computing and storage
(2) Data life cycle optimization
(3) Automatic performance analysis and re-configurable system,
(4) RDMA based communication mechanism.

Besides building a Big data analytic platform with the high performance equipments mentioned above, the main project will also use realistic big data applications to evaluate and demonstrate the performance and usability of our platform system.

Currently, we have chosen five applications for our propose, and we briefly describe each of them as follows.

1. Recommendation system
2. Image Processing and Recognition
3. Social Network Computation
4. Patent Data Processing
5. Semi-Conductor Data Analysis

2.5 INTERCONNECT TECHNOLOGIES (Hardware Solution)

2.5.1 Traditional Interconnect 10GigE Network

10 gigabit Ethernet is a communication methodology which can give data transfer rate up and about to 10 billion bits per second. 10 gigabit Ethernet is as well recognized as 10GbE, 10GE & 10GigE.11

It supports full duplex connections that can be connected by network switches and shared medium operation with CSMA/CD.12 It can work properly with the existing protocols. Since the 10GigE works in full-duplex method, it doesn’t requires Carrier Sense Multiple Access/Collision Detection protocols that is extremely important as this improves the efficiency and the speed of 10 Gb Ethernet as it can be easily deployed in the existing network, thus giving a costefficient methodology that hold up express, low latency necessities.13

10-Gigabit Ethernet offers distance involving physical positions up and about 40 kilometers over a sole channel fiber and multi channel fiber systems.

Technically 10 Gb Ethernet is a Layer 1 and Layer 2 protocol which follows the Ethernet characteristics like Media Access Control (MAC) protocol, the Ethernet structure design, min & max structure size. This technology supports both LAN and WAN standards. (Fig 2.)

Issues faced in deploying 10 Gb Ethernet are due to the costs of fibre channels, but the benefits received are very large.14 15

2.5.2 InfiniBand Technology

InfiniBand as well as recognized as Bandwidth out of the Box.16 17 Introduction:

- A control base sequential Input Output inter-connect arrangement
- Executes on a basic pace of 2.5 Gb/s otherwise 10 Gb/s in every course.
- Gives both QoS (Quality of Service) and RAS (Reliability, Availability and Serviceability)
- Superset of the Virtual Interface Architecture
- Useful in high performance cloud computing along with big data centres

Technical Outline:18

- Control-based P2P interrelate arrangement
- Every single connection supports a 4-wire 2.5Gb/s both directional attachment.
- Arrangement connotes a hierarchic hardware protocol (Physical, Link, Network, Transport Layers) and software layer
- Every association be capable of uphold manifold transfer assignments in favor of trustworthiness

Abbildung in dieser Leseprobe nicht enthalten

Table 2.1) Difference between Fabric and Bus Technology

InfiniBand Characteristics:

- Layered Protocol- Physical, Link, Network, Transport, Software Layer

- Package support broadcasting

- Quality Of Service

- Three link speeds
1. 1X - 2.5 Gb/s with 4 Wires
2. 4X - 10 Gb/s with 16 Wires
3. 12X - 30 Gb/s with 48 Wires

- Copper as well as Fiber network interconnect

- Remote DMA approved

- Multicast & Unicast approved

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.3) Switched Fabric Architecture

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.4) InfiniBand Layers

Switches & Routers:

- Switch

- Comprises extra one InfiniBand dock station & preludes pack from one of its dock to other.
- Configurable to promote either uni-cast packets or multi-cast packets.

- Router

- Moves packets from one subnet to another without reconstructing packets.
- Assimilates the Global Route Header to the package based on IPv6 address.

Advantages:

- Excellent throughput
- Low-latency
- High-throughput
- Fabric composition and small energy application
- Faithful, consistent attachments
- Data comprehensiveness
- Highly interoperable environment

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.5) InfiniBand Architecture

Drawbacks:

- Complicated in architecture
- Little platforms have approved
- Bleeding edge, requirement to conduct wide-ranging experimenting

Applications:

- Program Grouping
- Inter-Processor Communication (IPC)
- Storage Area Networks (SAN)

Result:

- better CPU throughput
- Acknowledges RAS
- Appears together in the box and approves Bandwidth Out of the Box
- Acknowledges flexibility to scale.19

2.5.2.1 IPoIB Interconnect Technology

InfiniBand (IB) 20 is a uniform organization regional Network that is applicable in HPC and data centre environments.

InfiniBand Technology has soaring speed data transport at a very low latency time. To allow the legacy IP based applications over Internet Protocol based apps over InfiniBand in Data Centers. Internet Protocol over InfiniBand protocol uses a crossing point on apex of InfiniBand ‘Verbs’ which permit the programs running on sockets to use host based TCP/IP protocol load that be converted keen on local InfiniBand Verbs that looks invisible to the program. Sockets Direct Protocol (SDP)21 is a development of the socket stand boundary, permit the process to avoid the TCP/IP protocol and interpret socket stand package in the verbs coat RDMA process, still preserving TCP streaming socket symbolism.22

SDP have the benefit with trespassing program layer which is required at IPoIB. The results of this are SDP has better latency and performance than IPoIB.

The uses of the InfiniBand are in modern computing and high performance computing. The benefits of IPoIB are dropping communiqué latency as well as given that superior accessible bandwidth to clients in the limited DCN.

The administration of network load is of concern in the new networking technologies. Quality of Service availability could be consumed to control the passage for within the network loads that could include main concern regarding the input data stream (IDS). The traffic loads and elasticity of excellent modification of the efficiency of network is also a bottle neck for the system wide performance. Such technology would improve the throughput of traditional data centers which still work on Ethernet. Finest Attempt Tune-up with low or else none requirement for modifying the conventional socket applications.

It’s significant to evaluate behavior of H/W rank QOS availibility for InfiniBand network from applications on the fine tuned socket support protocols.

This pace to use of this new technology to harness high-speed interconnects for on hand Internet programs.

In this research, analysis of and performance improvements in case of modern interconnects like IPoIB in comparison of the traditional Bus interconnects or 10GigE hardware will be done.

InfiniBand is a prominent cluster interconnecting technology with very stumpy concealment along with very amplitudinous throughput. Inhabiting InfiniBand verbs exists in the basic software level of the InfiniBand connection that allocates govern user-level approach to IB Host Channel Adapter (HCA) domains by omitting the OS . At the IB verbs category, a chain pairing shape existing is employed for correspondence underneath by either Send/Receive along with RDMA semantics. InfiniBand expects the consumer to schedule the buffer preceding employing it for broadcasting.

InfiniBand HCAs adhere 2 ports that archive conduct as 4X InfiniBand or 10- GigE. The arrangement of HCA comprises a stateless offload appliance for network interface card (NIC) based protocol processing.23 24

Sockets Direct Protocol had been contrived initially for InfiniBand that has immediacy been redefined as a transfer -agnostic protocol for RDMA network based architectures. It endured acted grasped to amend along with benefit the practice germinal sockets adjacent applying the RDMA protocol adjunct the InfiniBand network. SDP endures a byte-stream protocol that is built on TCP flow socket confirmations. SDP uses a protocol alternate inside the OS pivot that directly oscillates between kernel TCP/IP stack above IB (IPoIB) along with the SDP above IB (which sidesteps the TCP/IP stack)25.

SDP acquires bi-form layouts of data interchange. Buffered duplicate arrangement, the alternating information is duplicated beforehand in preregistered buffer foregoing the network transport. For no duplicate arrangement, consumer buffer is lucidly recorded to broadcast for bypassing information reproduction. (Figure 5)26 27 28

2.5.2.2 RDMA-IB Interconnect Technology

InfiniBand Host Channel Adapters (HCA) and further network equipments can be approached by the upper layer software using an interface called Verbs. It is a low level communiqué interface that goes after the line duo (communiqué ends) model.29

Line duos are required to establish a channel between the two communicating entities. Each line duo has a fixed no. of job row basics. Superior stage programs establish a job demand on the consecution doublet that is next processed by the HCA. When a work element is completed, it is positioned in the conclusion row.30 Superior stages programs are capable of detecting conclusion by census the conclusion row. Verbs used to transport info are totally Operating Sys. bypassed. (Figure 2.6)31 32

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.6) Various Interconnect Technologies and architecture 14

2.6 Memory Allocation with MemCached (Software Solution)

What is MemCached?33

- Easy in memory accumulating arrangement
- Can be utilized as a alternate in-memory data archive
- Archives data in-memory only.
- Exceptional read performance.
- Immense record throughput.
- Data disbursement between many servers.
- API feasible for leading languages
- Stands For “Memory Cache Daemon”
- Served by prevalent major sites along with Flickr, Slashdot and Wikipedia.

Introduction to ‘MemCached’

- A Gaint Hash Table dispersed across many machines.
- A server that caches Name Value Pairs (NVPs) in memory.
- This couldn’t be an extended data archive!
- This couldn’t be queryable for a file of complete things.
- No central safeguard approaches.
- No fail-over/high-availability approaches. (server is down, data is volatile)
- A surrogate storage sector where recurrent converged data can be stockpiled for random data access.
- Deal memory/disk for speed benefits.

MemCached Layout:

- Basically consumer <=> (NVP)Server Memcached Layout
- Server simple quick data retrieval based on a key.
- Length of key can’t overrun 250 characters.
- Capacity of value can’t deluge 1 MB.
- Each MemCached server is atomic, i.e. it neither recognizes nor cares about any alienation MemCached server.
- Consumer libraries for most every major language.

When memcached is started, the memory is not automatically allocated until it is configured. In other terms, memcached starts allocating and reserving physical memory after it has started saving data into the cache.

When the data is stored on to the cache, memcached doesn’t assign the memory for the information on an item by item basis. Instead, a slab assignment is exercised to optimize memory utilization as well as avert memory shortfall when information ceases from the cache.

With slab commission, memory is preserved in blocks of 1MB. The slab is split up into a no. of blocks of same size. On storing a value in the cache, memcached verifies the value of the size that is added to the cache and decide which slab has the right size contribution for the block. If a slab with the article size already endures, the article is recorded to the block within the slab.

In case the new item to be added is larger than the block size of any existing block then a totally new slab is created and divided up into blocks of suitable sizes. If a slab of the right block size exist in the case that no free blocks then also a new slab is created. Updating a present segment with facts that occurs expanded than the essential block allotment for that key, erstwhile it is re-allocated into an admissible slab.

If data size to be stored is larger than its value then the size of the block is increased with the chunk size factor unless a block size is large enough to keep the value assertive. Block size is a function of the factor of scale, i.e. rounded up to size of the block that is precisely divisible in the chunk size.

The outcome lasts that you hold many pages delegated within the field adjunctly memory assigned to memcached. Each page endures to 1MB in capacity (by default), as well as is cut into a differing no. of fractions, according to the fraction capacity needed to archive the key/value doublets. Each prototype has many pages assigned, as well as a page is consistently constructed when a fresh item desires to be constructed needing a fraction of a characteristic capacity. A slab may consist of many pages, and each page within a slab accommodates an identical no. of segments.34

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.7) Memory allocation in MemCached

The segment capacity of a introductory slab is ascertained by the bottom portion capacity assembled with the fraction size magnification factor.

Assigning the pages in this manner affirms that memory appears not get discontinued. Although, depending on the disbursement of the units that you store, it may lead to an undesirable disintegration of the slabs as well as divisions if you have explicitly contrasting sized items. For instance, adhering a relatively few no. of factors within each fraction capacity may damage a large portion of memory with just little fractions in each assigned page.

One can balance the amplification factor to lessen this consequence with use of the -f command line alternative, which alters the amplification factor addressed to make accrual fruitful utilization of the fractions and slabs apportioned.

If your operating system accepts it, you can additionally begin memcached with the -L command line alternative. This alternative preallocates all the memory in the course of startup employing big memory pages. This can enrich throughput by mitigating the no. of misses in the CPU memory cache.

Chapter 3 - Experimental Testbed System

For the Experiments and the test beds to be constructed on the server stack, I

was provided with 4 Stack Nodes of the one of the finest and highest configuration of the computer servers available. This system is provided by Quanta Company for corporate research projects.35

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.1) Snapshots of Quanta Stack Server.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.2) Setting up 4 nodes on Quanta Server with other Lab Mates.

Hardware Specification:

Node Capacity: 60, All 1U Servers (All nodes with the same Hardware Spec) Central Processing Unit: 2 Intel Xeon X5670 CPU’s, each one has 6 cores = 12 physical cores, 24 in HyperTerminal mode.

Memory: 96GB

Network: Ethernet and InfiniBand

Disk: 1 piece 100 GB SSD, 2 piece 2TB HDD

Abbildung in dieser Leseprobe nicht enthalten

Software Specification:

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.3) Quanta Server Stack Placement Diagram 20

[...]

Details

Pages
104
Year
2014
ISBN (eBook)
9783656721611
ISBN (Book)
9783656722847
File size
3.2 MB
Language
English
Catalog Number
v278725
Grade
82.00
Tags
High Speed Cloud Computing

Author

Share

Previous

Title: High-Performance Persistent Storage System for BigData Analysis