Open data is still in its infancy. The focus so far has been on encouraging and supporting owners of data to publish it openly. A lot has been written about why opening up data is valuable, how to build business cases for open data sharing, and how to publish data in order to make it easy for people to reuse.
But, while it’s great there is so much advice for data publishers, we don’t often talk about how to be a good reuser of data. One of the few resources that give users advice is the Open Data Commons Attribution-Sharealike Community Norms.
I want to build on those points and offer some more tips and insights on how to use open data better.
It almost goes without saying that in order to use data you need to understand it first. But effective reuse involves more than just understanding the structure and format of some data. We are asking publishers to be clear about how their data was collected, processed and licensed. So it’s important for reusers to use this valuable information and make informed decisions about using data.
It may mean that data is not fit for the purpose you intend, or perhaps you just need to be aware of caveats that impact its interpretation. These caveats should be shared when you are presenting your own analysis or conclusions, based on the data.
Attribution is a requirement of many open licences and reusers should be sure they are correctly attributing their sources. But citation of sources should be a community norm, not just a provision in a licence. Within research communities the norm is to publish data under a CC0 licence, because attribution and citation of data is already well-embedded as a best-practice: every scientific paper has a list of references.
The same principles should apply to the wider open data community. Acknowledging sources not only helps credit the work of data publishers, it also helps to identify widely-used, high-quality datasets.
Consider adding a page to your application that lists both the open source software and open data sources that you’ve used in developing it. The Lanyrd colophon page provides one example of how this might look.
If you’re using someone’s data, tell them! Every open data publisher is keen to understand who is using their data and how. It’s by identifying the value that comes from reuse of their data that publishers can justify continual (and additional) investment in open data publishing.
Engage with publishers when they ask for examples of how their data is being reused. Provide constructive feedback on the data itself and identify quality issues if you find them. Point to improvements in how the data is published that might help you and others consume it more easily.
If it was hard for you to get in touch with the publisher, encourage them to provide clearer contact details on their website. Getting them to complete an Open Data Certificate will help make this point: you can’t get a Pilot rating unless you provide this information.
If open data is a benefit to your business, then share your story. Evidence of open data benefits provides a positive feedback loop that can help people to unlock more data.
In some cases it’s not easy or possible to provide feedback directly to publishers, so share what you learn about working with open data with the wider community.
Do you have some tips about how to consume a dataset? Consider writing a blog to share them. Maybe you can even share some open source code to help work with the data.
Have you identified some issues with a dataset? Those issues may well affect others, so share your observations with the wider community, not just the data publisher.
The open data commons consists of all of the openly licensed and inter-connected datasets that are published to the web. The commons can grow and become more stable if we all contribute to it. There are various ways to achieve this beyond attribution and knowledge-sharing.
For example, if you’ve made improvements to a dataset, perhaps to enrich it against other sources, consider sharing that new dataset under an open licence. This might be the start of a more collaborative relationship with the original publisher or open up new business opportunities.
Some datasets are built and maintained collaboratively. Consider contributing some resources to help maintain the dataset, contributing your fixes or improvements. The more people do this, the more valuable the whole dataset becomes.
Direct financial contributions might also be an option, especially if you’re a commercial organisation making large-scale use of an open dataset. This is a direct way to support open data as a public good.
A mature open data commons will consist of a network of datasets published and reused by a variety of organisations. All organisations will be both publishers and consumers of open data. As we move forward with developing open data culture we need to think about how to encourage and support good practice in both roles.
The suggestions in this blog should prompt further discussion. We’d like to develop this further into some guidance for open data practitioners.
What do you think should become part of the community norms around open data? What have we missed? Share your ideas in the comments.
We’re always looking for interesting open data stories here at the ODI. If you have ideas, experiences or perspectives you’d like to share, pitch our Editor a blog.
Happy open data reuse!