Loading...

Assessment of Data Integrity Risks in Public Blockchain Systems

Textbook 2019 100 Pages

Computer Science - Commercial Information Technology

Excerpt

Table of Content

Assessment of Data Integrity Risks in Public Blockchain Systems

Acknowledgements

List of Figures

List of Tables

List of Abbreviations

1 Introduction
1.1 Research Gap and Research Question
1.2 Motivation
1.3 Value Proposition
1.4 Outline

2 Background
2.1 Centralized, Decentralized & Distributed Systems
2.2 Blockchain Technology
2.3 Data Security
2.4 IT Risk Management
2.5 Previous research

3 Qualitative Research Methodology
3.1 Data Collection Method
3.2 Focus Interview
3.3 Ontology & Epistemology
3.4 Content Analysis
3.5 Data Presentation
3.6 Scientific Quality Criteria

4 Results for Risk Identification
4.1 Interview Participants
4.2 Identified Risks
4.3 Differences in Public and Private Blockchains

5 Discussion / Explanation of identified Risks and Differences
5.1 Identified Risks
5.2 Differences public and private blockchains

6 Quantitative Research Methodology
6.1 Online Survey Design
6.2 Data Analysis & Visualization
6.3 Scientific Quality Criteria

7 Results for Risk Evaluation
7.1 Survey Participants
7.2 Research Results

8 Discussion & Implications of the Results from the Risk Evaluation
8.1 Discussion
8.2 Implications of Results

9 Limitations of the Research

10 Conclusion

11 References

Appendix
1 Category Definitions A
2 Survey B
3 Code for Visualizations H
4 Consent Sheet O

Assessment of Data Integrity Risks in Public Blockchain Systems

Since its first use in 2008, blockchain technology has come a long way and developed its functions from a simple distributed ledger to distributed virtual machines that execute smart contracts and much more. Blockchains have a potential application in many industries and offer great innovation potential for organizations. With all the opportunities and value new technologies can deliver, the risks are often neglected. In this paper risks to data integrity on blockchains are identified. Further the differences regarding data integrity among private and public blockchains are assessed. For the risk identification and the comparison between public and private systems a qualitative method with focus interviews is used, while the risk assessment is done with a quantitate online survey. The identified risks will be evaluated among their likelihood of occurrence and their possible consequences on the integrity of the data. Overall 11 risks have been identified which are applicable to public blockchains. Even though some of them got rated as a “High Risk” there is currently no evidence that a blockchain should be considered insecure. The identified risks should be taken into consideration before a public blockchain is implemented. The differences between public and private blockchains regarding data integrity are not rated, hence based on the collected data it cannot be generalized which design is more secure. The research results facilitate the decision between public and private systems. Based on the collected data and the literature review, the author discusses some actions that can be taken to mitigate the identified risks.

Acknowledgements

At this point I would like to thank all the people who accompanied and supported me during my master thesis. Special thanks to my family, without whose support my thesis would not have been possible. I would like to thank all interview partners and survey participants for their trust and support throughout the research process. A special thank you goes to the advisors at the Management Center Innsbruck and the University of Nebraska at Omaha which supported me the entire time.

Further I want to thank my fellow master students and the special people I meet and made friends with throughout the study program.

List of Figures

Figure 1: Architecture styles

Figure 2: Simplified bitcoin block design

Figure 3: Merkle Tree

Figure 4: Importance and Severity of core attributes and enablers

Figure 5: Scope of COBIT 5 for Risk

Figure 6: Risk Assessment Process

Figure 7: Inductive Category Development

Figure 8: Survey Logic

Figure 9: Age Distribution

Figure 10: Highest degree of Survey Participants

Figure 11: Occupations of Survey Participants

Figure 12: Likelihood of Risks

Figure 13: Consequences of Risks

Figure 14: Risk Matrix

List of Tables

Table 1: Examples of Decentralized Systems

Table 2: Types of nodes

Table 3: Used Consensus Mechanism and Hash Algorithms

Table 4: Interview Guiding Questions

Table 5: Exemplary illustration of categories

Table 6: Measures to enhance research quality

Table 7: Interview Participants

Table 8:Identified Threats to Data Integrity in Blockchain Systems

Table 9: Differences Public & Private Systems

Table 10: Considered LinkedIn Groups

Table 11: Risk Priority Categories

Table 12: Numerical Results for each Category

List of Abbreviations

Abbildung in dieser Leseprobe nicht enthalten

1 Introduction

The term blockchain has become one of the main IT related buzzwords in the industry. Random organizations used the term blockchain in their company name to increase their share value. An article by Easton(2018)shows that an iced tea-maker was able to boost its share price temporary by 180% by changing the business name to “Long Blockchain Corp.”. Blockchain found its first real-world utilization with the cryptocurrency Bitcoin, which was developed by the pseudonym Satoshi Nakamoto in 2008 and launched in 2009. Since Bitcoins launch, more than 1500 other cryptocurrencies are currently on the market, which mostly also utilize blockchain technology (CoinMarketCap, 2018). Almost every industry tries to get in touch with blockchain to leverage their business. The “Gartner Hype Cycle for Emerging Technologies”, which is a depiction that shows maturity and adoption of trending technologies, shows, in its most recent issue that is available, that blockchain technology is currently in the phase called “Peak of Inflated Expectations”.

With all this hype, the questions arise whether it is justified or blockchain technology will disappear again soon. A general answer to the success of blockchain is almost impossible, but from a technical perspective, the advantages that blockchain could give are undisputable, especially when it comes to the basics of data security, which are also known as the CIA (Confidentially, Integrity, Availability) triad.

When there is a big hype about a new technology there is always a gold-rush mood where a lot of new people join and expect the greatest things. In this case, often disadvantages or risks are forgotten or simply ignored.

1.1 Research Gap and Research Question

By implementing a not very mature technology, organizations have to be aware of the disadvantages of the technology. There is some literature published about security problems within blockchain technology. Karame and Androulaki(2016) studied the security of blockchain, especially of Bitcoin. Bitcoin is using a public and permission less blockchain, where every participant in the network can change the ledger stored on the blockchain (Nakamoto, 2008). A lot of researcher addresses the advantages that blockchain can bring to industries and organizations, but the conducted literature review of the author showed, that there are currently no publications that address the threats that blockchain can bring to data integrity.As described by Boritz(2005) data integrity is an essential part to data quality and should therefore have a high priority in any information system. While there is already a lot of discussions going on how blockchain can achieve data integrity there is currently no publication that addresses the risks, that the implementation of a blockchain can bring to data integrity. To narrow down the research this thesis focuses only on the public type blockchains, although a short comparison of the risks is done to be able to deliver more complete research results. While the differences are often discussed for example by Antonopoulos(2015) or Bashir(2018) they are not compared on a data integrity level. To enhance the contribution to the current state of research the identified risks are rated among their likelihood and the impact on data integrity.

This results in the following research questions:

RQ1: WHAT ARE RISKS WITHIN PUBLIC BLOCKCHAIN SYSTEMS REGARDING DATA INTEGRITY?

RQ2: WHAT ARE THE DIFFERENCES FOR DATA INTEGRITY WITHIN PUBLIC AND PRIVATE BLOCKCHAINS?

RQ3: WHAT ARE THE LIKELIHOOD AND CONSEQUENCES FOR EACH IDENTFIED RISK?

1.2 Motivation

Blockchain is one of the hot trending topics in the IT industry now. Such innovative technologies offer always a great opportunity to do research on. The author is also interested in blockchain technology and got already in contact with it by using cryptocurrencies. Blockchain and cryptocurrencies (especially Bitcoin) have proven their right of existence, when observing the constant increasing adaption and awareness by consumers and organizations (Thompson, 2018). Big tech companies like IBM or Microsoft offer blockchain services in their cloud environments or even contribute to the development of open sources blockchains. Also, as a future employee within the IT and consulting industry it is necessary to have knowledge about new and innovative technologies all the time, especially when the technology has the potential to disrupt whole industries. Furthermore, the author is convinced that the conducted research will deliver a significant value to organizations that consider implementing blockchain technologies in any way.

1.3 Value Proposition

The aim of this research is to contribute to the current state of research on blockchain technology and to support organizations when considering the implementation of a blockchain in their IT infrastructure.

Value is delivered for organizations by identifying risks, that the organization may have not been aware of. The evaluation of the identified risks can be seen by organizations as basis where they can add or remove risks depending on the system and design the organization intends to use.

From an academic perspective research about blockchain technology can be done in various fields of study. According to Risius and Spohrer(2017) most of the publications are in the field of computer science and information system, but there are also publications involving blockchain in finance, political science or law. Risius and Spohrer(2017) proposed a research framework which works as a guideline on which topics regarding blockchain research should be conducted. In this framework various levels of analysis are defined which are “Users & Society”, “Intermediaries”, “Platforms” and “Firms and Industries”. These various levels can overlap in a research project, but their primary focus is to inspire future research. Beside the level of analysis, the framework defines different activities which are “Design & Features”, “Measurement and Value” and “Management and Organization”. The research of this thesis on conducted of the level “Platforms” and assesses “Design & Features”. By following the blockchain research framework provided by Risius and Spohrer(2017), the academic relevance is ensured.

1.4 Outline

The background section of this master thesis will introduce the reader to the basic literature of blockchain technology and data security, especially data integrity and IT risk management. At the end of this section the reader should be able to understand what a blockchain is and how it works and what data integrity is and why it is an essential part of data / information security. Chapter 3 explains the research methodology for the risk identification, while the following Chapter presents the first empirical results. Next the results of the risk identification are briefly discussed and explained. In Chapter 6 the research methiodal for the risk evaluation is described. While the subsequent Chapters discuss the results and the implications of the found results. Chapter 9 delineates the limitations and assumptions that apply to the conducted research. In the last Chapter a conclusion of the thesis is stated. The raw data of the research is not included in the appendix, but instructions how to reproduce the conducted research.

2 Background

This Chapter provides background information on the topics decentralized systems, blockchain technology & design, data security and IT risk management. The goal of this Chapter is to provide the knowledge needed for the conducted research.

2.1 Centralized, Decentralized& Distributed Systems

Blockchains are distributed and decentralized system, hence it is important to understand the properties and parameters of these system designs.

At the beginning of the computer era, systems were big in size and expensive in acquisition and maintenance. These systems processed 1 instruction per second where nowadays systems can execute millions of instructions per second. All of processing was done in a single unit, which is called a centralized system (Tanenbaum & van Steen, 2016). The centralized architecture is still used by mainframes, even though mainframes are also able to operate in a cluster and can therefore be a distributed computing system (Weller, 2007).

In literature there is no single definition of the term distributed system. According to Tanenbaum and van Steen(2016) a distributed system is defined as:

“[…] a collection of independent computers that appear to the user of the system as a single computer.”

Andrews(2000) states that a distributed system consists of numerous computing systems that have their own random access memory (RAM). As there is no general definition of a distributed system the author will stick to the definition by Tanenbaum and van Steen(2016) for this thesis.

According to Grosch(1953) the power of a computer increase in proportion to the square of its cost. While this has been true for the mainframe era, it is not anymore nowadays because of the usage of microprocessors in computers. More computers (also called nodes) with less computing power are cheaper than one single system with the same processing power. This gives distributed systems an economical advantage against centralized systems. Distributed systems can achieve a higher overall performance, which would be not achievable by using just a single, centralized processing unit, simply due to physical limitations. By utilizing more than one processing unit, distributed systems omit a single point of failure (SPOF) and increases the reliability of the system by adding more nodes. When using or implementing distributed systems there are also some drawbacks. Distributed systems need a reliable network and have problems with security when it comes to data access. Systems that are distributed need specialized software to be able to deliver the advantages described (Tanenbaum & van Steen, 2016).

A very common perspective on distributed and decentralized systems was published by Baran(1964). Baran says decentralized systems use a hierarchy, while distributed systems are organized in a mesh and centralized systems are an organized in a solitary star. The publication of Baran is focused on communication networks and shown in Figure 1.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: Architecture styles

(Baran, 1964, p. 2)

The Ethereum co-founder, blockchain expert and researcher Buterin(2017) provides a perspective on decentralized architecture that is more in-depth According to him decentralization in a system can be achieved on different levels. While centralization and distribution focus on the physical architecture of a system, decentralization can be achieved on an architectural, political and logical layer. A system that is decentralized on an architectural level is like a distributed system, where a number of individual nodes are interconnected. Political decentralization describes the control of a system. If a system is controlled by a single entity it is seen as centralized otherwise decentralized. The third level, logical decentralization, focuses on the data structure of the system. A system with one monolithic data object is logically centralized where amorphous systems do not share the data and are therefore logically decentralized.

In Table 1 some technologies are listed that achieve a certain level of decentralization. A content delivery network (CDN) is architecturally decentralized, by using webservers across the globe. These servers are controlled by one company, which makes the CDN politically centralized. Logical decentralization can also be seen in this case as the separation of databases. CDN often use different database on each web server, therefore it has a decentralized logic. The BitTorrent networks offers decentralization on all 3 levels. Compared to the CDN BitTorrent is not regulated or controlled by a single entity, therefore it also has decentralized politics. The third example, blockchain systems, offer decentralization in architecture and political. A blockchain behaves like a single computer and has a commonly agreed state, which results in a centralized logic.

Abbildung in dieser Leseprobe nicht enthalten

Table 1: Examples of Decentralized Systems

based on Buterin (2017)

By sticking to the approaches of (Tanenbaum & van Steen, 2016) and Buterin (2017) in conclusion can be said, distributed system offers an architecture, where the processing power is distributed among several nodes, which brings advantages in scalability, economy and overall performance. The decentralization of politics (control) enables a system to work independent from organizations and governments on its own. Logical decentralization exists when the system appears to be a single system. According to the definition by Tanenbaum and van Steen(2016), provided earlier, a distributed system is never fully decentralized because it appears as a single system to the user.

2.2 Blockchain Technology

After discussing the properties of decentralized and distributed systems in Chapter 2.1, this section explains the properties and parameters of blockchains systems. This includes the origin, components and design of a blockchain system.

There is no common definition in literature for a blockchain, most of them refer to the cryptocurrency Bitcoin. Bitcoin is a decentralized and trustless cash system which transactions are stored on a public ledger called blockchain (Swan, 2015). From a technical perspective a blockchain is a network of public databases, which keep in the case of bitcoin, track of all conducted transactions.

This Chapter uses the example of the bitcoin blockchain, to show exemplary architecture & protocol design, simply because it is the most established and developed blockchain and has therefore proven the validity of the used concepts.

2.2.1 Genesis

The basics principles of a blockchain are not new within the IT industry. A major influence of blockchain technology had the publication of Haber and Stornetta(1991). Proposed as “a naive solution” Haber and Stornetta(1991) described a time-stamping solution, where all clients send their documents to a trusted time-stamping service (TSS) which time stamps them and keeps them in record. This design raised several questions about privacy, bandwidth, storage and trust, which makes it not useable. To tackle those problems the utilization of cryptographic hashes was proposed, where only the hashes are transmitted to the TSS instead of the document itself. Another addition was the use of digital signatures. After the TSS received a hash, the TSS digitally signs it and sends the new signed hash (certificate) back to the client, which makes storing the data irrelevant for the TSS. With this design the TSS could still issue void timestamps, therefore Haber and Stornetta implemented a linking function between the issued hashes. This makes it for the TSS impossible to issue wrong timestamps. The TSS cannot issue a future time stamp, because the issued certificate must contain bits from immediately preceded requests.

This proposed design by Haber and Stornetta(1991) was an immutable chain of document certificates consisting of document hashes, time stamps and signatures of participating parties. This was a big step to the current state of art in blockchain designs, but Bayer, Haber, and Stornetta(1993) added another essential feature, the Merkle tree. The previous proposed solution was vulnerable to a flood of banal transactions, so Bayer et al.(1993) published a solution where many unnoteworthy events are merged into one big event. This is done by hashing 2 documents in one single hash and publish this hash, this procedure is called a hash- or Merkle tree. Another important step for the development of the blockchain were publication by Dwork and Naor(1993) and Back(2002) where a principle called hashcash was presented. Hashcash was initially designed to avoid spam emails and denial-of-service attacks (DDoS), by forcing the user (for example the sender of an email) to use a moderate amount of computing power to generate a hash, which is included in the email header. This hash needs to fulfill requirements (e.g. start with a zero) and can be only found via trial-and-error. The process of finding a hash via trial-and-error is called proof-of-work (PoW).

By adopting the previous described technical principles, Nakamoto(2008) published the first design of a blockchain, the bitcoin blockchain. The intend of bitcoin is to send and receive digital coins by using digital signatures and hashing algorithms. While bitcoin was proposed in 2008 the first block also called genesis block was mined on 3 January 20091. Bitcoin is designed as an electronic peer-to-peer cash system on the top of the internet (Nakamoto, 2008). In a peer-to-peer (P2P) network, participants are equal and are connected in a mesh to each other. When referring back to Chapter 2.1, a P2P network is a distributed on an architectural level, with no central server (Tanenbaum & van Steen, 2016). The bitcoin network is a collection of nodes operating in a P2P network and using the bitcoin protocol for communication.

2.2.2 Components in a Blockchain System

A blockchain has several components in its system, which can be physical or digital. This Chapter explains the main components / terms of the bitcoin blockchain.

Digital Signatures

According to the National Institute of Standards and Technology(2013), a digital signature is a mechanism to verify the origin, authentication and integrity of electronic data. A digital signature can also be used to detect whether the data has been changed after signing or not. Each signature is a pair of keys which includes a private and a public key. To sign a document the private key of the key pair is necessary, therefore it should stay secret and be only available to the owner of the keypair. A digital signature cannot be forged by simple copying from a previous signed document. To validate a digital signature, the public key is used, therefore there is no need to keep this key secret, because it is not possible to sign any documents with it. To obtain a key pair most of the time a digital signature algorithm (DSA) or elliptic curve digital signature algorithm (ECDSA) is used, but in some cases also a Rivest–Shamir–Adleman (RSA) algorithm. By using these mathematical algorithms, it is currently not possible to derive the private key of the public key. Digital signatures can also include domain parameters, for example if a digital signature is created to replace analog signatures, this may be the name of the person or the organization the individual is working for. All domain parameters within digital signatures are public (National Institute of Standards and Technology, 2013).

Address

Bitcoin is designed as a payment solution and therefor needs and address to transfer the digital coins to. To generate a bitcoin address, first a public/private key pair is created utilizing ECDSA. The address is not the public key itself, but is derived from it, by using several steps which include the hash functions SHA-256 and RIPEMD-160 and the Base58Check encoding scheme (Christin & Safavi-Naini, 2014). In the bitcoin environment an address is a unique string from 26 up to 35 characters, that is used by transactions to allocate origin and destination. An example of a bitcoin address is 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa, which was the first address ever used for receiving bitcoin2. The balance related to an address is public available on the blockchain and can be retrieved by looking up an address in a blockchain explorer. The smallest amount of bitcoin possible is 1 Satoschi, which equals 10-[8] Bitcoin. It is also possible to create an address, that is shared within 2 or more parties. To be able to execute a transaction from such an address a defined number of signatures from the parties is required. For example, if 3 people share an address they can define that at least 2 people out of the 3 need to sign with their private key to make a transaction. This type of address is called multi-signature (or Multisig) address. Transactions are signed with the 256-bit private key number (Karame & Androulaki, 2016). A new bitcoin address can be generated offline, to avoid that a third party gets access to the private key. Therefore it is theoretically possible that 2 individuals generate the same address, although it is highly unlikely because there are 2[256] possibilities of potential private keys for bitcoin. (Antonopoulos, 2015). As explained earlier the address generation uses RIPEMED-160, which means there are only 2[160] addresses available for 2[256].private keys. Hence it is theoretically possible that the same address is derived from more than one private key.

Wallet

According to Franco(2015) a bitcoin wallet is a collection of one or more private keys. Compared to physical wallets, bitcoin wallets can be easily copied by duplicating the private key. As explained in the previous paragraph about addresses, it is possible to operate a distributed wallet by using a multi-signature address. When talking about a wallet people often mean the wallet software. A regular wallet holds banknotes and the owner of the wallet can check the amount by counting the notes. A bitcoin wallet works different, because the available amount of bitcoin is not stored. To see the balance of an address the wallet software queries the whole blockchain and collects all transactions with the provided address as origin or destination. The result of this query is called unspent transaction output (UTXO) and shows the delta between inputs and outputs to the address (Franco, 2015).

Block

The blockchain is a linked list of blocks that contain transaction data. Each block has a unique hash which it can be identified with. A blockchain block consists of a header, and transactional data. Within the bitcoin blockchain, the header includes information about the used software version, hash of the previous block, the Merkle root, timestamp, difficulty target and a nonce. Another way to identify a block is by its height. The height can also be seen as the number that the block has (Antonopoulos, 2015). The first block had height 0 and the last published block as of writing this thesis has the height 537017 and contains 2467 transactions3. Blocks are created by miners and then published to the network. More on mining in the next paragraph.

Mining

When a transaction is made it is published to the network. Nodes operating on this network collect and verify transactions by given rules, but do not confirm them yet. A transaction is unconfirmed when it has been not included in one of the blocks of the chain. The pool of unconfirmed transactions is called transaction pool or memory pool (Antonopoulos, 2015). To get a transaction confirmed it must be included in a block. A node that is participating in mining on the network, sets up a block includes transactions and tries to find the hash for the block using the SHA-256 algorithm. To create a valid block, the hash of the block header must start with a defined number of zeros. A higher number of starting zeros makes it more difficult for miners to find a valid block and is therefore referred as mining difficulty. When the block header is hashed with SHA-256 and the requirements of the hash are not meet, the miner changes the nonce in the header and creates a new hash. This is the implementation of the hashcash algorithm in bitcoin for proof-of-work. After finding a hash that meets the criteria the miner publishes the new block to the network. Every other network node verifies the validity of the block and adds it to the existing chain (Bashir, 2017).

Nodes

Not every node on the bitcoin network is operating as a miner. According to Antonopoulos(2015) there can be distinct types of nodes on a blockchain network, which are depicted in Table 2, where the first column defines the name of the node and the following columns describe the functions, the node is performing.

Abbildung in dieser Leseprobe nicht enthalten

Table 2: Types of nodes

adopted from Antonopoulos (2015)

The terms of mining and wallet software have already been covered in the previous paragraphs. A full copy of the blockchain means a node has stored the whole blockchain, not only the headers, on its local hard drive. This enables a node to validate transactions and blocks that are sent across the network. This is where the network routing functionality comes in. This function enables the node to participate in the P2P network, and to send / receive new transactions and blocks.

2.2.3 Bitcoin Blockchain Design

The bitcoin blockchain is a chain of blocks which contain financial transactions. Every block is linked to the previous block by including the hash of the previous block, which proves that a certain state was valid at the defined point of time, because the past cannot be changed without a very high investment in computing power. This is due to the parent / child relationship that blocks have. A block contains the hash of the previous block header, therefore if a block changes all subsequent would have to change too. Therefore a block gets more secure the more subsequent blocks exist (Antonopoulos, 2015). A valid hash for a block has to start with a defined number4 of zero bits. To find the hash of the block POW is used. After successful finding a suitable hash, the miner is rewarded with a mining incentive, which depends on the block height. The bitcoin network is a collection of individual nodes, that operate in a distributed manner, and can be joined by anyone without any special permission, which is also called a permissionless public blockchain (more on that in Chapter 2.2.5).

Bitcoin increases the level of privacy for users by eliminating the trusted third party, normally used for validating financial transactions (Nakamoto, 2008). The simplified blockchain header in Figure 2 shows the inclusion of the Merkle root of all transactions. Within the bitcoin network, all transactions are at the bottom of the tree and hashed in pairs of 2, until only one (the Merkle root) hash remains. This procedure can work with an arbitrarily number of transactions and the Merkle root will always be 32 Byte. Figure 3 shows what a Merkle tree of bitcoin transactions looks like. Bitcoin uses the collision free hash algorithm SHA256 (Bashir, 2018).

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: Simplified bitcoin block design

(Nakamoto, 2008)

Abbildung in dieser Leseprobe nicht enthalten

Figure 3: Merkle Tree

(Antonopoulos, 2015)

Blockchain is often referred to as a database, which is not correct, because the blockchain network doesn’t have a database itself, but nodes that participate in the system have one. For example, Bitcoin Core, which is used to run a node on the bitcoin network, uses a LevelDB to store data of the bitcoin blockchain. The above described architecture (blocks that contain hashed transaction data and are linked to the previous block) is the basic blockchain architecture. Most blockchain system are based on this architecture and are then individualized by using a different protocol. The protocol contains specifications about the block size (e.g. bitcoin uses a maximum block size of 1MB at the moment), used hashing algorithm, etc. As mentioned the bitcoin blockchain is a public and permissionless system and therefore a consensus must be achieved within the network. In other financial systems this step is done centralized by a trusted bank automatically or manually. As there is no central authority, that can be trusted in the bitcoin network a decentralized consensus must be achieved. Every node running on the bitcoin network has a full copy of the bitcoin blockchain. If a new block is added to the chain every node can check its validity by simple rules that are defined. In bitcoin, this consensus is emergent because there is no voting or similar going on (Antonopoulos, 2015).

For the further reading it is important to understand that the blockchain itself is not a medium for data storage, rather than a distributed network of nodes that store data, which is in the case of bitcoin is financial transaction data. As conclusion of this Chapter can be said that the unique architectural characteristics of a blockchains are the decentralized consensus, the chain of blocks, where every block is linked to its parent block which makes the blockchain immutable. These properties differentiate blockchain systems from other systems, but within blockchain systems there is a wide variety of types possible. These distinct types will be covered in the following Chapter.

2.2.4 Categories of Blockchains

Blockchain differ in its functionality, depending on their state of evolution. Therefore, in literature blockchains are often classified in categories from blockchain 1.0 up to blockchain 3.0. The bitcoin blockchain, is a prime example for a 1.0 blockchain. Blockchains in this category aim to be a cryptocurrency and to solve problems regarding financial transactions. A blockchain of the second generation includes contract functionality. This enables a blockchain system to operate beyond simple financial transactions in a whole market. This could be stocks, mortgages, smart contracts or smart property. The most advanced blockchain category is 3.0. Blockchains in this category are not only used in financial markets, but rather in government, health, science and many more. (Swan, 2015).

Within this thesis this three-tier approach is used for categorizing blockchain systems. There are also opinions (e.g. Unibright.io) that state 4 different categories of blockchains, where 3.0 refers to decentralized applications (DApps) and blockchain 4.0 to the industry adoption. Swan(2015) sees DApps as part of blockchain 2.0 systems. In conclusion can be said, there is no clearly defined definition in literature for the different tiers of blockchain systems.

Blockchain systems with smart contracts and decentralized applications are covered more in depth within Chapter 2.2.6

2.2.5 Properties of Blockchains

Blockchains have unique properties in its system and protocol design which define their area of application. Within this Chapter the properties accessibility / visibility, tokenization, block design and consensus mechanism are discussed.

Accessibility / visibility

The bitcoin blockchain, is not owned by any company, central authority or the developer team. All potential users that are willing to can participate as a node in the bitcoin network and read / write to the blockchain hence bitcoin is a public blockchain accessible and visible for everyone. In contrast to public chains, private blockchains are only open to a selected group of individuals or organizations. Private blockchains are owned by a single entity and are not open to public, which makes them more appear like a cryptographical secured database (Bashir, 2018). Compared to traditional distributed databases a private blockchain offers the use of Smart Contracts (Lai & LEE Kuo Chuen, 2018). Compared to public blockchains, private ones are not trustless and often not fully decentralized. When talking about public and private blockchains often the words permissioned and permissionless are used as a synonym, which is misleading. A permissioned blockchain is a trade-off between the public and private design. In a permissioned blockchain system, a potential user must verify their identity before being able to participate. After the verification a user can take on different roles within the blockchain system (Bashir, 2018).

When refereeing back to the discussion about decentralization in Chapter 2.1 and in more detail to Table 1: Examples of distributed systems, based on Buterin (2017), public blockchains achieve more decentralization than private or permissioned systems, because public systems use decentralized politics. More decentralization does not mean all public chains are superior to private or permissioned ones, it depends on the area of application and the purpose the blockchain has to serve.

Tokenization

The intended purpose of the bitcoin blockchain, as stated by Nakamoto (2008), is to create a digital cash system with its own currency, hence bitcoin issued its own coin as a currency, which makes it a tokenized blockchain. In the context of blockchain, coins are a unit of a cryptocurrency, while tokens are a digital representation of an asset. These digital coins or tokens operate as an incentive layer on the blockchain, mostly as an incentive for mining. Therefore, tokenization is closely related to the consensus mechanism and the accessibility of a blockchain. Public chains use PoW or PoS to achieve consensus and distribute tokens or coins to the miner of the newest block. Private chains often do not rely on mining or staking as consensus algorithm and therefore don’t have a token or coin issued, which is called token-less blockchain (Lai & LEE Kuo Chuen, 2018).

Block design

Bitcoin has currently blocks that are 1MB in size, created approximate every 10 minutes and contain financial transactions. The size and speed of blocks is specified in the source code of the bitcoin protocol and therefore it can be unique to every blockchain. The time between blocks in a PoW secured blockchain is not defined in a timeframe, rather than a difficulty for the PoW mechanism. This difficulty is determined dynamically and based on the total processing power on the network. An increase of mining power since the last difficulty calculation leads to a difficulty increase, while a reduction of mining power causes a reduction of mining difficulty. In the case of bitcoin, the difficulty adjustment is done every 2016 blocks, which equals approx. 14 days (Antonopoulos, 2015).

Compared to bitcoin, the Ethereum blockchain uses a different approach regarding block size and a slightly changed difficulty adjustment. On Ethereum the block size is not determined by the size of its content, but rather the amount of fees that occur. Fees for the execution of transactions or smart contracts on Ethereum are specified in gas. When a block reaches its gas limit (currently 8 million) the block is considered full. The gas limit is defined by the miners which can vote on increasing or decreasing the limit. Compared to bitcoin, the block time on Ethereum does have a targeted time range. When the previous block was mined in under 10 seconds the difficulty will increase, between 10 and 19 seconds the difficulty will stay the same, a block time of equal or greater than 20 seconds will decrease the mining difficulty. Ethereum uses a “Difficulty Bomb” which increases the mining difficulty at a certain point so much, that it is not profitable anymore. This method is used to force the switch from PoW to PoS consensus mechanism (Antonopoulos & Wood, 2018).

Consensus Mechanism

Blockchain solved the problem of decentralized consensus on a certain state. To achieve consensus across the network, different mechanisms can be used, as explained in Chapter 2.2.3, bitcoin uses PoW. To achieve consensus with the PoW mechanism, a lot of computational resources are required, which leads to high energy consumption. For PoW a variety of hashing algorithms can be implemented.

Another mechanism that is already used in blockchains for cryptocurrencies is Proof-of-Stake (PoS). This mechanism relies on market forces rather than computing power. The number of tokens owned by the participating node defines the chance to add a block to the existing chain. By owning more tokens in the network, the interest of the miner to secure the network and therefore his tokens increase. Another type of PoS is Delegated Proof-of-Stake (DPoS), where the users vote for witnesses which generate consensus (Lai & LEE Kuo Chuen, 2018). These described mechanisms are suitable for blockchains that use tokens or coins to incentive the miner (PoW) or the voter (PoS & DPoS) and as a protection against 51% attacks. In token-less blockchains different and often proprietary mechanisms that make use of classical state machine replications (SMR) to provide fault tolerance are used (Lai & LEE Kuo Chuen, 2018). A state machine replication can be seen as copies of a system (at least 3 to provide fault tolerance) which maintain the same state or output (Schneider, 1990). For example, Hyperledger Fabric utilizes state machines to achieve Byzantine fault tolerance (BFT) when ordering transaction, which results in one transaction list every participating node has agreed on, even if there were malicious nodes on the network. (Bashir, 2018)(Castro & Liskov, 2002).

Table 3 shows a summary of the used hash algorithms and consensus mechanism among the biggest cryptocurrencies by June 2018.

Abbildung in dieser Leseprobe nicht enthalten

Table 3: Used Consensus Mechanism and Hash Algorithms5

2.2.6 Smart Contracts & Decentralized Applications

Contracts are a critical element in an economy, especially in finance. Bitcoin, as a single payment transaction system, is currently not able to process complex financial contracts6. While economic contracts consist of a mandatory payment for a service or good in an exchange, financial contracts are purely cash flow based. Financial contracts are mostly expressed in number and are therefore well suited for self-execution and smart contracts (Brammertz & Mendelowitz, 2018). The principles of smart contracts are not new, it origins from Szabo(1997). The most used smart contract platform based on blockchain is Ethereum. There are currently (May 2018) more than 26,364 contracts that have a verified source code7. A very common used standard for Ethereum smart contracts is the ERC20 token standard, which is used to issue and manage digital tokens based on the Ethereum blockchain. The specific code for every smart contract is public available and can be seen by using a blockchain explorer8. In the Ethereum whitepaper, published by Buterin(2014), the use cases of smart contracts are separated in full-financial, semi-financial and non-financial. As earlier discussed, Brammertz and Mendelowitz(2018) only sees application for cash-flow contracts, but Buterin(2014), planned to use Ethereum also for non-cash-flow contracts such as voting, decentralized governance and even cloud computing. The execution of a smart contract requires a fee which is issued to the miner, that executes the contract code (Buterin, 2014).

As smart contracts, DApps are not specifically build to run on a blockchain, but most decentralized apps are a derivative of smart contracts. The BitTorrent peer-2-peer network is used for several DApps. For example, OpenBazaar is a decentralized selling platform. Centralized platforms for offering goods and services like eBay have policies and fees for using the platform. OpenBazaar works with the BitTorrent protocol and has therefore no central authority, and for decentralized payments bitcoin is used. To have a complete decentralized application the app has to satisfy four criteria’s: No single point of failure (SPOF), internal currency (tokens), decentralized consensus and open source code (Raval, 2016). A recent example of a DApp running on the Ethereum blockchain is the game CryptoKitties9, which is a game where users can collect and breed digital cats. For each different action that can be taken with one of those digital cats, a smart contract exists. The complete code for the application can be seen at the contract address 0x06012c8cf97bead5deae237070f9587f8e7a266d.

2.3 Data Security

This Chapter will introduce data security with an emphasis on data integrity, which is a main component of this thesis. Furthermore, the current situation of data integrity in blockchain systems will be discussed.

The amount of intangible assets within an organization is continually increasing.According to Vacca(2017), the amount of intangible assets reached an average of 84% within S&P 500 organizations, in 2015. To protect corporate value, it is necessary to have a data & information security plan in place. The National Institute of Standards and Technology(2018) defines a cybersecurity framework to help organizations to identify, assess, and manage cyber risks. According to the cybersecurity framework by NIST (2018) data security is achieved when:

“Information and records (data) are managed consistent with the organization’s risk strategy to protect the confidentiality, integrity, and availability of information.”

The definition above is also known as the CIA (confidentiality, integrity and availability) triad. In the following, each of the three dimensions will be discussed in detail, whereby, particular emphasis will be put on integrity.

The definition used by NIST, published in 44 U.S.C § 3542 Definitions, describes confidentiality in the context of data security as:

[...]


1 This data can be seen with any kind of bitcoin block explorer, for example:
https://blockchain.info/block-index/14849/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f

2 https://blockchain.info/en/tx/4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b

3 See block #537017: https://www.blockchain.com/btc/block/0000000000000000002268d0e61dc6e4c7bbdb69696abfa8248503aec6508594

4 Within the bitcoin system this is a variable and is recalculated every 2016 blocks. This step is also known as difficulty adjustment.

5 The cryptocurrencies are ranked regarding market capitalization, retrieved from https://www.coinmarketcap.com on June 6 2018

6 Bitcoin can execute simple code with Bitcoin Script

7 https://etherscan.io/contractsVerified

8 Example EOS Token: https://etherscan.io/address/0x86fa049857e0209aa7d9e616f7eb3b3b78ecfdb0#code

9 Cryptokitties website: https://www.cryptokitties.co

Author

Share

Previous

Title: Assessment of Data Integrity Risks in Public Blockchain Systems