CSVlint.io


Publishing data that doesn't suck


James Smith ยท @floppy

CSV is amazing

CSV is the data Kalashnikov: not pretty, but many wars have been fought with it and kids can use it.
Friedrich Lindenberg - @pudo

CSVs are terrible

Stick .CSV on it
and call it data

From "What is a CSV? A case study of CSVs on data.gov.uk" by Ulrich Atz

Reuse-Ready Data

CSVlint

http://csvlint.io

RFC 4180

Errors

  • wrong content type
  • ragged rows
  • blank rows
  • invalid encoding
  • not found
  • stray quote
  • unclosed quote
  • whitespace
  • line breaks
  • no header
  • empty column name
  • duplicate column name

Warnings

  • no encoding specified in HTTP response
  • encoding not UTF-8
  • no content type in HTTP response
  • excel (.xls extension)
  • check options (single column)
  • inconsistent values

Information

  • non rfc line breaks (not CRLF)

Status badges


From "What is a CSV? A case study of CSVs on data.gov.uk" by Ulrich Atz
{
  "fields": [
    {
      "name": "id",
      "constraints": { "required": true }
    },
    {
      "name": "price",
      "constraints": { "required": true, "minLength": 1 }
    },
    {
      "name": "postcode",
      "constraints": {
        "required": true,
        "pattern": "[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}"
      }
    }
  ]
}
JSON Table Schema
http://dataprotocols.org/json-table-schema/

Constraints

  • required
  • unique
  • minLength
  • maxLength
  • pattern (regular expression)
  • type (XML Schema types)
  • minimum
  • maximum
  • datePattern (strftime)

Data types

  • String
  • Integer, Float, Double
  • URI
  • Boolean
  • Non Positive Integer
  • Positive Integer
  • Non Negative Integer
  • Negative Integer
  • Date
  • Date Time
  • Year
  • Year Month
  • Time

Schema Errors/Warnings

  • missing value
  • min length
  • max length
  • pattern
  • header name
  • missing column
  • extra column
  • unique
  • below minimum
  • above maximum

Multi-file datasets

  • Datapackage
  • .zip upload
  • Separate validations

Source code

App: http://github.com/theodi/csvlint

Gem: http://github.com/theodi/csvlint.rb

All MIT licensed (of course)

Cid

Continuous Integration for Data

Future plans

  • data analysis
  • heuristics
  • integrations with other services

Please use it!

http://csvlint.io

Contribute!

http://github.com/theodi/csvlint

Stuart Harrison Sam Pikesley James Smith Jeni Tennison

Open Data Institute Tech Team
@ukoditech
info@theodi.org
irc.freenode.net #theodi

ODI

http://theodi.github.io/presentations

ODI Creative Commons