Forecasting Cloud Storage Consumption Using Regression Model

by lecturer Abdallah Ziraba (Author) Mbata David (Author)

Scientific Study 2017 13 Pages

Computer Science - Commercial Information Technology


Table of Contents

1.0. Introduction
1.1. Background of the study
1.2. Problem statement
1.3. Objectives of the study:
1.4. Research hypothesis:

2.0. Related works/ Review of related literature

3.0. Methods
3.1. Method and source of data collection
3.2. Sample size
3.3. Method of data analysis, procedure and instrument used for analysis
3.4. The regression model
3.5. Dependent and Independent Variable
3.6. Validation of model

4.0. Analysis
4.1. Regression Equation
4.2. Discussion
4.3. Key findings

5.0. Conclusion

6.0. References


The primary aim of the study was to develop a regression model for forecasting monthly cloud storage consumption. Second, to ascertain if the month is a reliable predictor of cloud storage capacity consumed. The model was developed using Minitab18 statistical software. The dependent variable was cloud storage capacity consumed, while the independent variable was the month of cloud storage consumption. The model was validated by checking the assumptions of regression to establish its suitability in making future predictions. Twelve-month data sets was analyzed to make future prediction for each passing month. The model made predictions with near accuracy from the actual cloud storage data consumed in each month. The model determines the intervals of monthly storage consumption. The study concluded that the month is a globally significant linear predictor of cloud storage capacity consumed over a period.

Key words: Cloud computing, forecasting model, storage capacity

1.0. Introduction

Cloud computing is the most rapidly growing field of Information Technology (IT) and has been adopted by many organization for better IT infrastructure. Cloud computing platforms provide easy access to a company’s high-performance computing and storage infrastructure through the cloud provider’s web services. According to (Harvey, 2017), the cloud platforms are relatively cheaper than dedicated infrastructure for storage. Cloud storage offers high scalability with nearly 100 percent reliability, and high performance.

Consequently, many business activities are being performed through cloud computing. The cloud storage cost is based on the metered capacity consumed. Based on service consumption and business needs, organizations are billed in gigabyte (GB) per month. This means that organizations need to forecast their storage consumption to enable the potential IT managers to budget ahead. Cloud storage forecasting model becomes important as we are in the era of internet of things (IoT) when (McKendrick, 2016) said that much data will be generated. The model is rather more important as we near the predicted year for internet of everything (IoE). Cloud Infrastructure as a Service (IaaS) will be much in demand for data storage.

Therefore, the researchers have considered forecasting cloud storage consumption using regression model as a remedy for determining interval of cloud storage consumption for the organizations. Forecasting cloud storage usage based historic data has the potential to guide IT managers in effective budgeting and information system auditing.

1.1 Background of the study

Forecasting cloud storage consumption based on historical data can serve as a valuable source of guidance in IT budgeting and effective decision-making.According to , regression as a statistical modelling technique is very helpful for the future event forecasting based on timely and reliable figures. , submitted that

The regression model predicts cloud infrastructure performance and even availability of cloud network resource like availability of servers.Similarly, the regression model predicts cloud storage capacity consumed in megabyte (MB) or GB, with high precision and near accuracy especially, for large series of data.The model gives ranges of cloud storage consumed over each passing month or a period, which can guide decision making in an organization.

1.2 Problem statement

Earlier (Kondo, 2009) asserted that the cost-benefits of cloud computing compared to traditional IT infrastructure and what constitutes the cost of cloud computing ranging from computational size, time, and storage is not perfectly clear to some organization and their IT managers. (Linthicum, 2014), affirmed that dynamic workloads and changing prices of cloud computing, most enterprises seem to be getting worse at understanding their storage actual storage cost.

(McKendrick, 2016), stated that in the coming years, one of the major forces driving cloud storage services in the organizations will be Internet of Things (IoT) and associated big data. The report says the ugly implications of this evolution is that organizations do not have enough storage capacity to handle the terabytes of data that will be generated. Therefore, nearly all workload will be cloud-borne and organization will contract storage services. Recently (RightScale, 2017), reaffirmed this problem through its conducted annual state of the cloud survey, which shows that understanding cloud storage service consumption and managing cloud costs has become a top challenge to companies. Consequently, the researchers developed a regression-forecasting model, for prediction of the future cloud storage consumption based on historical data. Without this forecasting model, organizations will find it difficult to determine the ranges of their cloud storage consumption and budget accordingly.

1.3 Objectives of the study:

The first objective of the study was to develop a new forecasting model for predicting cloud storage consumption in the organization. Second, to find out if the month is a good predictor of cloud storage capacity consumption.

1.4 Research hypothesis:

HO: β1 = 0 (Change in cloud storage capacity consumed does not depend on the change in the month of consumption).

HA: β1 ≠ 0 (Change in storage capacity consumed depends on the change in the month of consumption).

Decision Rule: Reject H0, for probability value (p-value) < alpha (α) = 0.05 (5%) level of significance.

2.0. Related works/ Review of related literature

Many studies have reported different models for forecasting cloud computing. (Yanshuang, Na, Hong, and Yongqiang , 2015), predicted energy consumption in cloud data center using about four different regression models (nonlinear, linear, support vector, polynomial and exponential regression models). The models performances was investigated; it was concluded that all kinds of linear models had similar prediction performance, and therefore was good to model energy consumption with linear models for the performance counters. It was based on this recorded success of regression models in predicting energy consumption in cloud data center that it becomes only logical for the researchers to investigate the performance of linear regression model in predicting cloud data storage consumption.

(Baughman, McAvoy, McCrory, and O'Connell, 2016), developed a forecasting model for cloud server provisioning. The model was developed using Java programming. With the model, servers are independently provisioned based upon the forecast demand output. The model determines how many cloud resources to provision or de-provision. From the model, P (t) represents the number of servers to provision at the time, while the total capacity of the server is represented by βc. It was concluded that the model could forecast many servers availability for future workloads using IBM as a case study. This model though reported perfect for server availability forecasting but not suitable for predicting cloud storage consumption over the month. This is simply because the server-provisioning model cannot determine the intervals of monthly cloud storage consumption.

Similarly, (Lu, Panneerselvam, Liu and Wu, 2016), reported the success of a workload forecasting model for smart cloud computing. The model was based on Neuron model and was developed using MATLAB. The model is an answer to the cloud computing smart prediction of workload. It has a high precision, reliability and prediction accuracy of workload characteristics. Against the recorded success of this model in forecasting workload, it will fail in predicting ranges of cloud storage consumption over the month. (Vazquez, Krishnan, and John, 2016), studied cloud data center workloads for dynamic resource provisioning using time series model which defers from regression model. As the researcher is not interested in investigating trend.

(Chandini, Pushpalatha, and Ramesh, 2016), also investigated cloud server load prediction using three different methods; the Bayesian model, Prediction Based on Phase Space Reconstruction (PSR) method and the Group Method of Data Handling (GMDH) based on Evolutionary Algorithm(EA), Support vector and kalmann smoother based on Support Vector Regression and Neural Network Load Prediction. Against the recorded success of these models in forecasting workload, it also fails in predicting the ranges of cloud storage capacity consumed overs a period.

(Khan, 2016), developed a model for cloud data center load forecasting using dependent mixture model based on Bayesian inference. The model was aimed at predicting the unforeseen day-ahead cloud data center load. The efficiency of the model in scheduling and operating a cloud data center was reported. However, the model is has no history of efficiency in forecasting cloud storage capacity consumed with intervals of monthly consumption.

Obviously, all existing models have failed to address the forecasting of cloud storage capacity consumed using the historic data. This entails the uniqueness and the need of the model developed by the researcher using regression model. The researcher relied more on regression model following the standing views shared by (ORACLE., 2013) that regression is the most popular and reliable method for identifying a linear relationship and forecasting from historical data.



ISBN (eBook)
ISBN (Book)
File size
678 KB
Catalog Number
cloud Computing



Title: Forecasting Cloud Storage Consumption Using Regression Model