#Opendata: notes on the way to (I)
Opendata world is here!
After the age of hardware, came the age of software. From now on the age of data is here.
Why Now?
World Bank Is Opening Its Treasure Chest of Data
At least, this is a funny reason for a ephemeral
As a visualization tools and interface fan, since sometime ago I took notes talking with several people and finding out what are the main problems when you try to work with the new opendata repos.
Second, the formats. I suggest to the public institutions to publish their datasets in several formats at the same time. Of course I’m talking about free formats like the stantards: CSV. ODS, XML/RDF, KML, JSON, … We can open a public discussion on which free formats we prefer. With a standarization of the published formats we can achieve at least two goals: in one hand lots of people will not be wasting any more time in order to convert files. The conversion is far from being an easy reproducible process; the software versions of the file editors and the converters can mess around.
ei, people, data is loading!
Data life
I see every public dataset as a work-in-progress. Most of the public datasets can be improved. Quite often the data is not accurate, contains errors or can be either updated and extended. All the improvements will create a real life of every dataset, a timeline of the history of the dataset. For example: the list of all the public buildings in Catalonia (Catalan Government) I dived into this dataset. It is great to have it, but we need to improve a lot more the geo location of the buildings. the ones who can better improve this dataset are the citizens and specially the data workers. All the efforts to clean and improve a dataset must be put together, so we do not lose them.
- to follow the evolution of the datasets (contributors, contributions, time)
- to be sure that two or more are using exactly the same dataset
- to improve the dataset with a simple git pull request
- to publish a dataset as a software release
- changing thedataset with a simple git fork
- to use live & dynamic data reading directly from the repository.
- … and a lot of new ideas that will come out…