Blockchains are a distributed append-only way of storing data. They are reliant on a large network for their maintenance, and can only grow in size. Jeni Tennison, the ODI’s Technical Director, looks at scenarios for how blockchains might evolve over time
Blockchains are emerging from their origins in cryptocurrencies and being explored as a mechanism for storing data of other kinds. We are very early in our understanding of when and how it’s best to use blockchains as a technology. We need to anticipate and plan for what happens when blockchains scale from low levels of use to potential ubiquity for other applications like recording marriages, registering land ownership, ensuring musicians get paid, and maintaining supply chain provenance.
Blockchains are maintained by a distributed network of nodes: computers that store the blockchain and may add data to it. There are drivers for having a few blockchains that are each maintained by a large number of nodes and for having many blockchains that are each maintained by a small number of nodes. We’ll end up somewhere in the middle. The right number of blockchains will change over time. We have to ensure that it’s possible to adjust: to split blockchains or to merge blockchains as required.
Blockchains are attractive as a data store because they are maintained by a network of nodes, making them robust and tamper-proof. Robustness ensures that the data is always available. A blockchain being tamper-proof guarantees data integrity; even if some nodes are compromised, the other nodes won’t accept changes they make to the blockchain.
These important characteristics arise when a blockchain is maintained by a large network of nodes. A blockchain that is maintained by a single node could be struck by a hardware failure, or could rewind and rewrite the blockchain it is maintaining without detection.
Blockchains that are only maintained by small numbers of nodes can get into situations where the majority of the network is owned by a single organisation. This happened with the GHash bitcoin mining pool in 2014, and was the reason behind Onename’s recent migration from Namecoin to the bitcoin blockchain. When more than half a blockchain is owned by a single organisation, it is possible that they can collaborate to alter the content of the blockchain or to accept invalid transactions.
The fact that small networks of nodes undermine the utility of a blockchain is a driver towards having a few, large-scale blockchains maintained by many nodes.
The size of a blockchain grows over time because it is an append-only data store: you can add data to a blockchain, but you can never remove it.
The bitcoin blockchain is currently about 49GB in size. It’s been growing steadily by about 2.5GB/month (though that rate is increasing). With a limit of 1MB/block and 1 block every 10 minutes, the maximum rate of increase in size will be just over 4GB/month.
The bitcoin blockchain is relatively small as bitcoin transactions take up very little data. Other applications for blockchains may require more storage, or a speedier rate of growth (larger blocks and/or more frequent blocks).
Every node in a blockchain network needs to be able to store and process the entirety of the blockchain. Blockchains that are used by lots of applications will be large in size. The vast majority of data within a blockchain supporting multiple applications will be irrelevant to the application any particular node is interested in. Some nodes might only be interested in the land registry data, some only in statements of copyright ownership.
Large blockchains require nodes that are interested in a given application to take on all the data from other applications using the blockchain. They might hesitate not only because of size but due to ethical concerns about the data those blockchains contain. These are drivers towards having more, smaller and more specialised blockchains.
Blockchain applications need to be able to migrate from blockchain to blockchain without interruption of service. What patterns make this possible? For example, it would be useful for a transaction in one blockchain to be able to point to a transaction in another blockchain, either as a precursor or an original. How can transactions within another blockchain be addressed and pointed at? What does a URL for a transaction look like?
What happens to blockchains when they come to the end of their life? Like other internet phenomena, blockchains should be automatically archived for posterity. Is this a role that the Internet Archive should be taking on, or is there another institution that can provide this service? How can blockchains be discovered to enable this to happen automatically?
While we are at early stages of the use of blockchains, we are already encountering some of these challenges. We have to start working through the implications now to avoid them becoming big problems in the future.
How do you see the blockchain ecosystem evolving? What are the challenges you see coming down the line? What are the solutions people are already working on? Tell us what you think in comments below or by emailing email@example.com.
This post was supported by Deutsche Bank and is part of ODI Labs’ work on data infrastructure. If you’re interested in supporting our R&D work on data infrastructure and blockchains, email firstname.lastname@example.org.