Introducing Open Data



Dr David Tarrant · @davetaz

Overview

  • Defining open data
  • Benefits
  • Open, closed and personal data
  • Why now?
  • Law and Licensing
  • Ensuring data quality

Exercise

What is Open Data?

Definition of Data (1)

A collection of facts, information and statistics that can be analysed to develop new knowledge

Definition of Data (2)

A collection of numbers assigned as values to quantitative variables and/or characters assigned as values to qualitative variables

Definition of Data (3)

The lowest level of abstraction from which information and then knowledge are derived.

data stack

Data: Information without context

okf

Definition of Open (OKF)

A piece of data or content is open if anyone is free to use, reuse, and redistribute it - subject only, at most, to the requirement to attribute and/or share-alike.

- Summary of Open Definition (v1.0)

Retired August 2014

ODI (Version 1)

Open data is information that is available for anyone to use, for any purpose, at no cost.

- Open Data Institute FAQ

Retired November 2014

opendefinition.org

  • Access
  • Redistribution
  • Reuse
  • Integrity
  • Attribution
  • Non-discriminatory
  • Remix and Combine

Availability and Access

The data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.

Reuse and Redistribution

The data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets.

Universal Participation

Everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

Cross sector benefits

Social Data

trains Helps us get from A to B quicker

Open data can help make decision that affect our safety fire

Economic Data

gov_spending Open data reveals how countries spend (or underspend) their budgets.

Open data identified a £1bn industry - and changed how it was regulated. P2P

FT Coverage of SMTM
Lend Invest policy change

Environmental Data

Open data brings the evidence of climate change climate change

rosling Open data makes us aware of the impact we have on our planet

  • Public Services - innovation and savings
  • Civil Society - enables greater scrutiny
  • Private Sector - enhances services
Enables benefits across the economy

Challenges and Risks

Image Credit: Ulrich Atz

Types of personal data

Open personal data

Data about people, not a person.

Available to anyone

Has been anonymised

e.g. number of people attending an event, gender split, age ranges. Bigger numbers are better.
Available personal data

Data about a person

Available to the person only!

Often known as MiData

e.g. credit scores, energy consumption and spending.
Closed personal data

Data about a person which is neither open or available

Might belong to you or be collected by a company

Opportunity

Why now?

Why now?

  • Policy Drivers
  • Technical Standards
  • Best Practice Guidelines

Participating Bodies

Policy drivers

g8 g20 world bank

Technical bodies

w3c

Best practice developers

OFD ODI

A Global Movement

Barometer
http://theodi.github.io/open-data-barometer-viz/

Open Data Burkina Faso: Our schools, our data from Open Data Institute on Vimeo.

Law and Licensing
(please note, I am not a lawyer and this section should not be treated as legal advice)
okf

Definition of Open (OKF)

A piece of data or content is open if anyone is free to use, reuse, and redistribute it - subject only, at most, to the requirement to attribute and/or share-alike.

- Summary of Open Definition (v1.0)

Retired August 2014

Exercise

What are Intellectual Property Rights?

IPR and Licensing

Law and IPR Venn Diagram

Rule of Thumb

  • Do you have rights or permission to publish?
  • Do you have rights to use the information/data?
  • Is the data derived from other sources?
  • What are the permissions concerning those sources?

Personal Data

  • Data Protection Act 1998
  • Data relating to a living identifiable person must be processed fairly and lawfully
  • Processing that is not immediately apparent to users e.g. cookies (new laws and guidance) damages available to data subjects

Anonymisation is hard:

http://bit.ly/WuMdiJ & http://bit.ly/H6b9cK

Open Government Licence (OGL)

The Open Government Licence

Other Licences


Creative Commons 4

ODbL
Open Database Licence


Re-Mixing

Licence compatibility tool

http://clipol.org/tools/compatibility

Public domain assumption and myth

Endagered rhinos

Is the picture clear?

Who should own crowd sourced data?
Ensuring Data Quality
Best Practice Guidelines

Exercise

What makes data usable?

Guidelines

Open Data Certificate
5 Stars
5 Stars

5-Stars

5 Stars

http://5stardata.info

5 Stars

Available on the web (whatever format) but with an open licence, to be Open Data
★ ★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★ ★ ★ as (2) plus non-proprietary format (e.g. CSV instead of excel)
★ ★ ★ ★ use URIs to denote things, so that people can point at your stuff
★ ★ ★ ★ ★ All the above, plus: Link your data to other people’s data to provide context

Adding context

Adding context

General

  Raw Pilot Expert Small
Title/Description/Publisher/URL
Release Type

Legal

  Raw Pilot Expert Small
Right to publish
Data licensed
Content licensed
Clear privacy statement
Sources of data documented  
Audited Anonymisation    

Practical

  Raw Pilot Expert Small
Useable period described  
Availability period described  
Discoverable from home page    
Listed in a collection    
Referenced from publication/application    
Quality problems listed    
Quality control process described      

Technical

  Raw Pilot Expert Small
Data hosted online
Type of data defined
Machine readable metadata  
Clear technical documentation  
Persistant & common identifiers used    
5-Star Linked Data      
Machine readable provenance      
Data can be verified      

Social

  Raw Pilot Expert Small
Support for improving/fixing  
Email support  
Discussion groups/forums    
Social media channels    
Supported community      
Tools and guides available to work with data      

Demand driven open data

Demand driven data, Gurin 2014

A Global Activity

Open Data Certificates - Contribute
http://certificates.theodi.org

One Objective

  • 3-Star data
  • Standard level Open Data Certificate

★ ★ ★

Recap

  • Defining open data
  • Benefits
  • Open, closed and personal data
  • Why now?
  • Law and Licensing
  • Ensuring data quality
The biggest evolution of the web, since the web itself.

Knowledge
for everyone

ODI Creative Commons