Last month, craft brewery Brewdog announced it will release all its recipes for free. Beer geek, homebrewer and ODI web developer Stuart Harrison discusses why this is a positive move, and how Brewdog can make this even better
The ODI’s Jeni Tennison shares her frustration in trying to access ‘free’ Land Registry property data, and explains that even when data is publicly available and free to access, restrictive licensing makes users' lives harder than it would if it was open
Drawing on poems by R.D. Laing, ODI Technical Director Jeni Tennison explores ‘knots’ that open data owners and users get trapped in when they don’t take time to develop a common language, build trust and identify shared goals
Blockchains are a distributed append-only way of storing data. They are reliant on a large network for their maintenance, and can only grow in size. Jeni Tennison, the ODI’s Technical Director, looks at scenarios for how blockchains might evolve over time
ODI Labs recently delivered a talk to the Sydney Blockchain Workshops, asking whether blockchains are always the right choice for global data infrastructure. Blockchains are a great technology for shared-write, trust-free data collaboration networks, but not everything needs those capabilities, and for many applications the technology may have undesirable aspects.
Data stored in blockchains cannot be changed, which means that personal data they contain cannot be removed. Jeni Tennison, the ODI’s Technical Director, explains why it’s really important that we design blockchains to protect people’s privacy
Legal grey areas can create risks for organisations that rely on data-based products and services from third parties. These grey areas affect reusers of government data, large and small, as well as organisations relying on commercial products. If they don’t know exactly what they can do with the data they have, they may end up either exposing themselves to legal action or unnecessarily restricting what they do with that data.
Keeping personal data separate from open data is a core principle to protect people from harm. Anonymisation is an effective process to manage the release of data that is derived from individuals. At present, guidance for anonymising data is sparse and unclear. This restricts important information from being released safely that others can benefit from.
We're launching a new tool today called __[Open Data Pathway](http://pathway.theodi.org)__. It's a self-assessment tool that will help you assess how well your organisation publishes and consumes open data, and identify actions for improvement.
During the last week of February, the ODI teamed up with the University for the Creative Arts (UCA) to create a 5-day workshop for students. The aim was to engage them with open data, this is what happened next.
We often refer to open data as a public good, but what does this mean? And what does it imply about how our national information infrastructure should be managed?
When I say that open data is a public good, sometimes people reply, “but people can do bad things with open data (so it’s not really good)” or “but it is more easily used by people who are data literate (so it’s not really public)”. These statements are both true, but based on a misunderstanding of what a ‘public good’ is. Wikipedia says:
In economics, a public good is a good that is both non-excludable and non-rivalrous in that individuals cannot be effectively excluded from use and where use by one individual does not reduce availability to others.
In other words, a public good is something that you can’t stop anyone using, and that doesn’t get used up. The examples of public goods that people tend to use are “clean air”, “lighthouses” or “public parks”. Open data also fits this economic definition.
The objections to describing open data as a public good apply equally to public parks. People can do immoral or unlawful things in public parks: they can let their dogs make a mess, deal drugs, mug people. We don’t deal with this misuse by closing public parks or having identity checks at every entrance, but through the same laws and social norms that apply elsewhere. Similarly, misuse of open data such as providing out-of-date flood alerts or misrepresenting statistics can and should be addressed through laws and social norms, not through restricting access to that data.
Public parks, like other public goods, don’t benefit everyone equally: those that live close by and those with children or dogs benefit more than those who live further away or those with impaired mobility. Lighthouses benefit those who own and man ships, and their families, far more than anyone else. Similarly, open data might bring most benefits to those for whom the data is relevant, those who are data literate, or those who already have lots of data. But from an economic point of view, these public goods are non-excludable: once they are available, it’s impossible to prevent others from benefiting from them.
Public goods cost money to create and maintain, but because it’s not just the people who pay for it that benefit from them, it can be hard to get enough people to contribute to their maintenance. This is known as the free rider problem: it’s the feeling “why should I pay when they’re not?” If sufficient people feel that way, contributions to maintenance fall away, the public good falls into disrepair, and everyone loses.
When it comes to open data, the fear of the free rider problem leading to open data disappearing can become a problem in itself. On several occasions I’ve heard people say they would rather pay for data because doing so reassures them that the data will continue to be available long term. Who would invest in developing a product or service reliant on a resource that could disappear at any time?
We can learn about how open data should be paid for by looking at how other public goods are maintained. There are several methods:
Government: Funding by government is the usual solution to the free rider problem. It removes people’s individual choices over contributing to public goods: we pay our taxes; government uses the money to create public goods; democracy and accountability act as controls over which public goods are created and maintained.
Collaboratives: Groups can club together to create a public good that all the members benefit from. To make this sustainable, and avoid members reneging on their commitment to contribute (while still being able to benefit from the public good the other members continue to maintain), such groups usually need to have a contractual obligation to ongoing contributions.
Cross-subsidy: When the group that is the primary beneficiary of a public good also has to pay for a private good (something members from outside the group don’t benefit from), a portion of the payment for the private good can be used to subsidise the public good.
Volunteering: Volunteering can be a powerful mechanism for maintaining public goods. It usually needs to be supplemented with other types of contribution (eg you can’t pay for servers with volunteer time), and it requires an infrastructure that actively provides volunteers with something useful to do.
Social norms: Public goods can be maintained simply due to social pressure: creating the public good becomes the Thing To Do (and not maintaining it disapproved of); contributing to a public good either becomes a normal cost of doing business or is a target for charitable donations.
For open data to thrive as a public good, we will need to draw on all these models. What you think about how much the taxpayer should fund public goods such as open data will depend on your political outlook. But for sure as the role of the state in society changes, we will see changes in the way open data is maintained: away from government, as public services are mutualised or privatised; towards the crowd, as the internet enables collaboration.
We need to learn which governance models work well together; what guarantees are needed for reusers to trust the supply of open data; and about the range of roles that government, and everyone else in society, can play. And this is particularly necessary for the governance of open national information infrastructure, something that ODI will be particularly focused on this year.
There are no comments yet.
The Ordnance Survey has adopted the Open Government Licence (OGL) as the default licence for all of their open data products. This is great news for the open data community as it simplifies licensing around many important UK open datasets. It’s also an opportunity for other data publishers to reflect on their own approach to data licensing.
The original “OS Open Data licence” was based on a customised version of the first version of the OGL. Unfortunately these changes left the open data community in some doubt about how the new clauses were to be interpreted. For example the Open Street Map community decided that the terms were incompatible with the Open Database Licence, requiring them to seek explicit permission to use the open data. These are exactly the problems that standard open licences are meant to avoid.
By switching licence the Ordnance Survey have not only resolved outstanding confusion but have also ensured that their data can be freely and easily mixed with other UK government sources. The knock-on effects will also simplify the licensing of local government data released under the Public Sector Mapping Agreement. The result is a much clearer and simpler open data landscape in the UK.
At the ODI we’ve previously highlighted our concerns around the proliferation of open government licences. Many of these licences have taken a similar approach to the OS Open Data licence and are derived from earlier versions of the OGL.
We think this is a good time for all data publishers to consider their licensing choices:
If your custom licence is derived from the OGL then consider adopting the original version unchanged.
If you’re using a bespoke licence then consider how adopting a standard licence such as the OGL or the Creative Commons Attribution licence could benefit potential reusers.
For more background read through our guidance on open data licensing and our draft guidance on problematic licensing terms.
Ultimately, simplification of the open data licensing landscape benefits everyone and we ask other publishers to follow the Ordnance Survey’s lead.
Image: Flickr(CC BY-SA 2.0) - Adrian Scottow
There are no comments yet.
The decision about which licence to use is one of the most important steps in publishing an open dataset.
While lots of licences aim to be "open", the terms they include may fall short of actually being open data, making reuse of the data very difficult.
In order to make an informed decision publishers need to have a clear understanding of what they hope to achieve by opening up their data, the ways they will hope it will be used, and the types of users they wish to engage.
From 17-18 November the United Nations Development Programme and the World Bank ran two days dedicated to open data in Bishkek, the capital of Kyrgyzstan. Kyrgyzstan is a small country in Central Asia, bordering China, Kazakhstan and Tajikistan. After independence from the Soviet Union, the Kyrgyz Republic has developed into a democratic republic.
CSV stands for comma-separated values. It is a simple format for tabular data and relatively easy to process. We analysed more than 20,000 links to CSV files on data.gov.uk – only around one third turned out to be machine-readable.
What follows is a, technical, collection of tips. It assumes you are familiar with base R, how to install packages and how to do basic operations. Two interactive introductions to R are: DataMind and Code School.