Obstakels in het werken met open data

Presentatie gegeven over open data en de obstakels waar je in de praktijk mee te maken krijgt. En passant ook een open data maturity model gemaakt wat wel enig hout snijdt.

Dit was leuk om te doen en ik ben benieuwd of Rotterdam snel kan handelen op dit vlak. Zie mijn favorieten op Twitter voor de reacties.

Update:
Rotterdam Open Data, April 21 2010
Nog een foto van het event.

Week 273: Objects, Hack de Overheid, Copenhagen, European Data Forum, Linked Data, Metropolis Lab, all new Foursquare

I’ve been into something of a speculative realism binge lately reading quite some books and even more blogs from the field of current day philosophy. Last Monday I finished Ian Bogost’s Alien Phenomenology which is highly recommended if you want to read up on object oriented ontology.

Preparations for our Hack de Overheid hackathon are entering their last weeks and things are speeding up. If you want a nice day of civic hacking with friendly people and good food and drinks, I’ll say head on over to our signup page.

Getting some work done and then it was off to Copenhagen with the Tuesday night train. Travelling that way with your own bedroom, going to sleep in one city and waking up in another is by far the most relaxing way to go (except when the train has a two hour delay before your 00:32 departure).

You try to travel by rail because it's good and stuff but things go wrong too regularly. Stuck at HBF at night with a two hour delay.

And now by magic I will go to sleep in Berlin and wake up in Copenhagen.

I visited Copenhagen for the European Data Forum to see what the data driven discussions were about on the European level. We got informed about a lot of European programs, a lot of talk about Linked Data and not very much pertaining to the stuff we do from day to day. Some friends from the open data movement were present and the event was quite informative all in all.

The focus on Linked Data in many of the participants is heartening and understandable but ultimately it is a doomed approach. I got into an argument about this during lunch with some developers. There are problems on two levels. On the low level, Linked Data does not solve any actual problems for developers but it does cause many for them because of lack of tooling, learning curves, interoperability costs etc. This is both a problem in proposition and marketing but it is not seen as such by the Linked Data community. Until that is recognized, adoption of Linked Data technologies will remain as dismal as it is right now.

On the higher level, the fact that there is so little interoperation and so much problems standardizing and getting things to work together may be symptoms of the fact that the models of the world being aimed for are too complicated. Engineers will always mistake the map for the territory, but it is curious that they would be able to sell that many other people on it. The engineers’ answer to the fact that things do not work yet is of course: that they need more time/money/resources thrown at the problem. The fact that the cost/benefit ratios have gone completely skewed is not being noticed because it is in no one’s best interest to do so.

Fortunately people on the ground doing real work in open data, such as us and the Open Knowledge Foundation, are encountering these problems and fixing them because in the real world we have no other choice. Rufus Pollock presented about the folly of perfect models and APIs and he’s right on both counts (I presented about this myself before).

Government agencies that can’t release their data on a website properly, are probably not ever going to have APIs that are usable or stable enough for anybody to build something serious on. They would better dump the data and have the developers with a vested interest build their own APIs or whatever they need. Similarly Rufus argued against overmodeling againts a room of European funded academics. I’m not very hopeful but some of it may have changed some hearts and minds.

The same day Berlin celebrated its own open data day, which I unfortunately had to miss. I hear that a lot of people showed up which is good because a lot of work is still to be done in that field. A list has been started to discuss open data in public transit, which should be a high priority. After having gone around Copenhagen for a couple of days with its Google Transit support, not having such a transit facility in a city is such an annoyance and cause of opportunity cost that it should be counted as a criminal offense on part of the transit operators.

European Data Forum - Going to be interesting at least

After two days of talking about data I also visited the Metropolis Lab at the Overgaden art institute where they were having talks about developing the creative city. It was a nice and cozy event, pretty much the complete opposite of the previous one I had visited where artists, architects and festival curators were discussing their work. Given the description of the event I had expected a bit more about games and other procedural media/systems.

I did see Tor Lindstrand present about architecture and I must say that was an awesome experience.

Metropolis Laboratory - another gathering for which we are too practical from the looks of it (now discussing authenticity and authority)

The rest of the time in Copenhagen I spent eating and drinking quality things. Coming back to Berlin that was one of the most important differences I noticed, the fact that food and drinks in Copenhagen were about three times as expensive but also at least twice as good than I had in Berlin.

The other is that the opulence and organization of a Nordic capital is a stark difference to what we are used to in Berlin. It is nice being in a city that is not destitute for a while though Copenhagen may be too polished to live in for any amount of time.

Nice cross station where the train suddenly is street level and there is no wall.

Egg muffin from heaven

New place, totally game

I also browsed the Avignon festival website which I will be visiting in July and came across this item on the programme by Sévérine Chavrier who is staging a play “Plage ultime” inspired by the works of J.G. Ballard. I will be arriving just too late to see that, but I do wish that more theater makers would take note. My current experience indicates that France is doing well in theater innovation (Gisèle Vienne is another name to watch out for) and Kornél Mundruczó is also showing a work “Disgrace” at Avignon (who I saw before in Rotterdam).

It's raining outside and the food here is sublime. I don't think I'm going anywhere.

Kaffe & Vinyl win @straboh

And then it was back to Berlin on Friday night.

End of the week we also got surprised by the all new Foursquare, with a major update to both the mobile client and the website.
Can you tell we're the commercial messages are going to be?

I have to say that I absolutely love the new engagement that this view allows. The main timeline that you now see, though noisy can stand up to the best that either Facebook or Path or Instagram have to offer and that showdown is clearly the direction that Foursquare is headed. Engagement around pictures, likes and comments is high and this update may very well increase that.

I have been a bit annoyed by some changes, but then again I may very well be too much of a power user while they are going for a mass market appeal. For most users what they have changed is an improvement.

For some others like myself and Tantek Çelik, the lack of a local friends view is a bit of an annoyance, especially if —like me— most of your friends live somewhere else. I quite like knowing what everybody in Amsterdam has been up to, but it does not have to be front and center to my experience because I can’t act on it (except in virtual ways).

For most users this is unlikely to be an issue because all of their friends will be in the same city anyway. Because I thought complaining is only going to fix that much, I made a single serving view of foursquare with only the people within a 50k radius: Old Fashioned Checkins.

This was very easy to do because of Foursquare’s excellent developer APIs and support. Another feature missing from the mobile client right now is being able to explore for venues that you have not visited yet. If I look around my house now, I almost only get to see places that I have already been to. Not much serendipity in that. These are undoubtedly things that are going to be improved upon on future updates, but this has been one of the first changes in foursquare that has been so jarring.

Then the rest of the week work to finish saba has continued apace as well.

Friendlier and more open government data

This is an English rewording of a post in Dutch earlier this week just in time for this Reboot session.

I’ve recently been involved with several government initiatives to make government data more accessible and to show what’s possible if that data is publicly available. The premise is that if government data is open, developers and other third parties kan reuse that data and make useful stuff with it. Stuff that can for instance serve as inspiration for our Dutch show us a better way contest: Dat Zou Handig Zijn.

Most recently we had a Hack the Government devcamp/unconference where people interested in this stuff could exchange ideas and build stuff.

Hack de Overheid
photograph by Anne Helmond

A while back I did a project on widgets which was mostly a supply side initiative triggered by a listing of readily available data.

Simultaneously Ton and James did a project on process recommendations for the public sector when it comes to releasing their data. As a part of that project I helped them sift through the currently available data sources from a technical point of view and to see which of those sources could be repackaged in an interesting and more user and developer friendly way.

That wasn’t very easy. Dutch government does publish a ton of data online but usually in rather unaccessible formats and interfaces and without clear descriptions on what it is. We picked two data sources and we managed to realize two new websites based on that data: Schoolvinder (‘School finder’) and Vervuilingsalarm (‘Pollution alert’).

School finder

A Dutch institution called the CFI already has a store with a lot of data on schools. We used their search interface and output (which though ugly and slow was remarkably reusable) and rebranded that into a simplest possible school search engine which is easily understood by parents looking for schools in their area.

Besides that we also create a canonical URL for every school in the Netherlands so other parties can refer to that and we can build stuff on top of those school pages.

The first problem this fixes is that the website is poorly usable and worded almost exclusively in jargon. Employees from the ministry of education told us later that the CFI data is not meant to be used by the public but we think this is still a fix.

Secondly it fixes the fact that this information is quite hard to find and to refer to. Our search engine is easy and open and school data is republished at unique URLs using microformats.

We would have liked to link our school pages to some numerical data from another database of the CFI but that was too hard to realize within the alotted time. Even linking to that other site proved to be too hard because those webpages were shielded in a needlessly complex ASP.NET environment.

Pollution alert

Pollution alert takes the predicted particulate values for a number of sensor stations and makes those accessible. I made a scraper to take the predicted data from the RIVM site and store it in Google App Engine. From our own store in Google App Engine, we show the geocoded stations, we push alerts out to Pachube and Twitter, graph the data and provide an API. We believe there is a lot of latent interest in the general public for this kind of data but that usable presentation forms have not been forthcoming.

RIVM to their credit does publish most of their data online, but definitely not in the most accessible formats nor in ways that enable normal people to audit their living environment.

Principles

The sites we built are very advanced prototypes which are completely functional and even highly scalable. During building we followed some principles which may be interesting to touch upon here.

Friendly design

Both sites have a pleasant and friendly design created by Buro Pony. This is important because people are more inclined to use sites that look nice and experience those sites as being more user friendly.

A nice design needs to be accompanied by clear and simple writing without jargon that connects with the way people think about the stuff you’re describing.

Most websites can be improved massively by just implementing these two points.

Playing well with others

Both websites also connect with a bunch of other sites that improve upon the concept. They’ve been built on Google App Engine a system which is easy to develop for and which is readily scalable. Maps are retrieved from Google Maps, graphs are provided by Google Charts and sensor data is pushed to Pachube and Twitter.

The experience on the main site is just a part of the whole. The data needs to be accessible from and easily pushable to where it’s needed most.

On the Hack the Government event somebody said something along these lines: ‘systems integration is difficult and complicated and if you’re good at it, you can make a lot of money with it’. This is a well known Enterprise IT mantra but if there’s one thing that the abundances of mashups proves, it’s that integrating systems on the open web is everything but complex.

On the open web we have usable and developer friendly API standards with tooling1, besides that we have proper standards for identity and authentication2.

If you don’t dig yourself into a hole, it really doesn’t have to be that difficult. And none of this is exactly new, this technology has been around for ages and it just builds on the strengths of the internet.

Standards based

Both sites are completely standards based. As an experiment I wrote both in a conservative form of HTML5 (and validated that) partly out of curiosity to see how it would turn out and partly because I think that our current Dutch industry standard XHTML is a dead end.

Added to that I have sprinkled in some microformats in places where it was obvious to do so (e.g. school addresses). The notion that it takes significant extra time to add microformats to a project is absurd3 and these days the advantages of adding them keep piling up.

Yes, it takes some effort to learn to use standards and microformats properly, but once learned I think it actually takes more effort not to use them.

Quickly

Finally, both sites have been built in a couple of days over the course of about a week and a half. We wanted to show that when we’re talking about a quick project, it really can be quick and that building a non-trivial usable beautiful website does not need to cost a lot of time or money.

All of this can be improved. Let’s get at it4.

  1. cURL as the simplest example []
  2. And it’s all open which means it can’t be hijacked by one party and controlled or made needlessly complex. []
  3. Client: “Yeah, you can do that when you have time to spare.” []
  4. Or like Chris says it: “This can all be made better. Ready? Begin.” []

Highlights for Designing Data-Intensive Applications

Amazon RedShift is a hosted version of ParAccel. More recently, a plethora of open source SQL-on-Hadoop projects have emerged; they are young but aiming to compete with commercial data warehouse systems. These include Apache Hive, Spark SQL, Cloudera Impala, Facebook Presto, Apache Tajo, and Apache Drill [52, 53].
In these situations, as long as people agree on what the format is, it often doesn’t matter how pretty or efficient the format is. The difficulty of getting different organizations to agree on anything outweighs most other concerns.
Therefore, to maintain backward compatibility, every field you add after the initial deployment of the schema must be optional or have a default value.
That means you can only remove a field that is optional (a required field can never be removed), and you can never use the same tag number again (because you may still have data written somewhere that includes the old tag number, and that field must be ignored by new code).
If you are using a system with multi-leader replication, it is worth being aware of these issues, carefully reading the documentation, and thoroughly testing your database to ensure that it really does provide the guarantees you believe it to have.
Today, most data systems are not able to automatically compensate for such a highly skewed workload, so it’s the responsibility of the application to reduce the skew.
Unfortunately, these tools don’t directly translate to distributed systems, because a distributed system has no shared memory—only messages sent over an unreliable network.
Safety is often informally defined as nothing bad happens, and liveness as something good eventually happens.
A much better solution is to build a brand-new database inside the batch job and write it as files to the job’s output directory in the distributed filesystem, just like the search indexes in the last section. Those data files are then immutable once written, and can be loaded in bulk into servers that handle read-only queries. Various key-value stores support building database files in MapReduce jobs, including Voldemort [46], Terrapin [47], ElephantDB [48], and HBase bulk loading [49].
A complex system that works is invariably found to have evolved from a simple system that works. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. John Gall, Systemantics (1975)
When copies of the same data need to be maintained in several storage systems in order to satisfy different access patterns, you need to be very clear about the inputs and outputs: where is data written first, and which representations are derived from which sources? How do you get data into all the right places, in the right formats?
It would be very natural to extend this programming model to also allow a server to push state-change events into this client-side event pipeline. Thus, state changes could flow through an end-to-end write path: from the interaction on one device that triggers a state change, via event logs and through several derived data systems and stream processors, all the way to the user interface of a person observing the state on another device.
But this choice is not free either: if a service is so popular that it is “regarded by most people as essential for basic social participation”[99], then it is not reasonable to expect people to opt out of this service—using it is de facto mandatory.

Don’t release anonymized datasets

There is no thing as an anonymized dataset. Anybody propagating this idea even tacitly is doing a disservice to the informed debate on privacy. Here’s a round up with some recent cases.

Re:publica

Just today Berlin visualization outfit Open Data City published a visualization of the devices that were connected to their access points during the Re:publica conference earlier this month. The visualization is a neat display of the ebb and flow of people in the various rooms during the event.

It is also a good attempt to change the discourse about data protection in Germany. The discourse tends to be locked in the full stop stance where absolutely ‘nothing is allowed’ without a ton of waivers. Because of that hassle, a lot of things which could be useful are not implemented. A more relaxed approach and a case by case decision on things would be better. In the case of Re:publica there does not seem to be any harm in making this visualization or in releasing the data (here find it on Fusion Tables where I uploaded it).

What I find to be a disservice to the general debate is the application of ‘pseudonymized’ data where the device ids have been processed with a salt and hash. The identifying characteristics have been removed but the ids are still linked across sessions making it possible to link identities with devices and figure out who was where exactly when during the conference.

http://twitter.com/stefanwehrmeyer/status/337972064306724866

To state again: at a professional conference such as Re:publica there would in all likelihood be no harm done if the entire dataset would be de-anonymized. The harm done is the pretense that processing a dataset in this way and then releasing it with the interlinkage across sessions is a good idea.

Which brings me to my next point.

Equens

http://twitter.com/AlexanderNL/status/337693480014970880

Yesterday the Dutch company, Equens, that processes all payment card transactions announced a plan to sell these transactions to stores. Transactions would be anonymized but still linked to a single card. This would make it trivial for anybody with a comprehensive secondary dataset (let’s say Albert Heijn or Vodafone) to figure out which real person belongs to which anonymized card. That last fact was not reported in any of the media coverage of this announcement which is also terrible.

http://twitter.com/ThijsNiks/status/337699593464709120

After a predictable uproar this plan was iced, but they will keep on testing the waters until they can implement something like this.

Today Foursquare released all real-time checkin data but with suitable anonymization. They publish only the location, a datetime and the gender of the person checking in. That is how this should be done.

License plates

Being in the business of opening data we at Hack de Overheid had a similar incident where a dataset of license plates was released where the plates had been md5’ed without a salt. This made it trivial to find out whether a given license plate was in that dataset.

This was quickly fixed. Again this is not a plea against opening data —which is still a good idea in most cases— but a plea for thinking about the things you do.

AOL search data

The arch-example of poorly anonymized search data is of course still the AOL search data leak from back in 2006. That case has been extensively documented, but not extensively learned from.

Memory online is frightfully short as is the nature of the medium but it becomes annoying if we want to make progress on something. Maybe it would be better altogether to lose the illusion that progress on anything can be made online.

For the privacy debate it would be good to keep in mind that the increasingly advanced statistical inference available means that almost all anonymization is going to fail. The only way around this is to not store data unless you have to or to accept the consequences when you do.

Week 270: Amsterdam encounters, data visualization, foundational work

Last week was a week for some work in the Netherlands and some much deserved catchup with friends and colleagues over there.

On monday the protocol of the meeting we had in the Berlin parliament about open transit data was published. It contains all the proceedings and slides.

On Tuesday I went to Hilversum to give a workshop on journalistic data visualization over there. It’s always fun to give these and it’s going to be even more fun to see the results coming out of it.

Full house

After that I bounced over to Utrecht to relax a bit in the Village. It had been too long ago and it’s still the best coffee store in the Netherlands. After that I went to Hubbub headquarters for some future planning with Kars Alfrink.

On Wednesday we had a lot of stuff to do with the (Open State) foundation (more on which later). That same evening we had a board meeting.

Pavement anti-aliasing

On Thursday I had a nice lunch with Tim de Gier and finished my next game review for the paper.

Today's office

This is the view from the Amsterdam office. Pure luxury for that city.

Bought paper

Also I had to buy the new book “Koorddansen in de Kaukasus” by Olaf Koens about his adventures in the Caucasus. It is a fast paced collection of stories in this very bizarre part of the world.

Approaching the saucer

I also managed to visit the newly opened EYE movie institute on the IJ shore. A beautiful building with a stunning view, heralding in a new era for this part of Amsterdam.

Today's office

Next it was the train back to Berlin and prototypes for some new applications.

Week 257: moving office, Kotti, to Amsterdam again, Open Coop kicking it off, Social Cities of Tomorrow and explorations in theory and practice

Writing these notes on a Sunday afternoon wit a mug of steaming coffee within reach as they are meant to be written.

This Monday I finally made it out to the Finanzamt with a fully filled in form for Steuerliche Erfassung (or something). After that I went to the Agora Collective to get my stuff. It is a great place, but I don’t want to be fixed in a coworking space. There are a myriad reasons why that is not a great fit, but being able to shape and own your own workplace is built-in in most offices and is purposefully left out of coworking.

Then I moved into the contur & konsorten office on Adalbertstraße with my stuff. A Burogemeinschaft with 10 people where everybody has their own independent desk, with its own walls and bookshelves, a place to put my professional library and hang my posters. In short: a place to call my own. In a total coincidence I am now a staircase neighbour of my friends at the Maker’s Loft which could lead to more serendipity in the future.

The office is smack on Kotti, the most important urban maelstrom in Berlin. It is a place where many large streams of traffic and people meet with the U-bahn transport hub (connecting U1 and U8) and the roundabout connecting the main thoroughfare of Skalitzer Straße with the Kottbusser Damm. Betahaus, co-up, the Maker’s Loft and many other creative places are within throwing distance and the area sports equal amounts of hipster cafés and Turkish eateries with the addicts holding their own on the main square. They can be a hassle, but their presence is inseparable from the conditions that made that part of Kreuzberg exactly what it is: a free-haven for people looking for cheap housing be they immigrants or artists —or both.

Tuesday was spent at the new office in presentation prep with the evening closed off by meeting with the local Open Knowledge Foundation chapter. It was a fruitful discussion exchanging various ideas on how to boost the openness movement in Berlin.

OKFN meetup

On Wednesday, I took a leisurely train ride to Amsterdam which seems to feel shorter and shorter the more I get in the rhythm. That day the long awaited Code 4 video launched. I’m immensely proud of the work we did and I don’t think there’s anybody who has pulled off a game like that anywhere in the world, so it might be well worth a look:

A more detailed write-up on that project is forthcoming.

Thursday I continued working on my presentation at the Open Coop. I also ripped the video of minister of economic affairs Maxime Verhagen endorsing open data from the NOS site, because their site sucks.

Friday was the big day of Social Cities of Tomorrow where I got the honor to be the first to present our case of ‘Apps for Amsterdam’ to the assembled audience. It was a wonderful event put together by our esteemed friends and colleagues of the Mobile City: Michiel de Lange and Martijn de Waal who have been leaders in this field for the better part of the past ten years. The keynotes by Usman Haque, Natalie Jeremijenko and Dan Hill were superb and they remain a source of inspiration for our creative work.

Getting our aeropress on with a new device that does tenth of a centigrade precise temperature with built-in scales.

I feel like I have to remark on two things that I thought of during the conference:

The entire day was infused with a critical stance against open data and transparency within government. Usman Haque served the opening volley with a criticism of indiscriminate data transparency and an approach to further civic engagement by giving people the tools to collect data themselves. After that Dan Hill also added some criticism against traditional methods of social change.

I agree with their points and criticisms and I would have liked to address them but that was impossible in the time given to me to present our case. I would like to say that if anybody in the Netherlands has been deeply involved on all levels in the government transparency movement and is acutely aware of the problems, issues and realities of data transparency, it is probably us1. Besides that we have employed most of the techniques Dan Hill presented during the last couple of years: shaping decision making processes, deploying long lasting interventions and using the sleights of hand required to realign large organizations and work with far too many people.

We have been and will be hard liners for the cause of government transparency out of necessity and conviction. I will always defend that data that has already been collected by government and carries no issues of privacy or national security with it, belongs to the public and should be accessible by the public.

The other issue is that the conference should was probably most valuable to the people in the Netherlands who are not as current on design and technology as I have come to take for granted. The lack of reflection was painfully clear in some of the questions asked by the audience. This is a common issue, but I have seen it often in the past during Mobile Mondays or the lecture Manuel DeLanda gave in Amsterdam.

Dan Hill talked about going from the matter to the meta level and back again and all three keynoters showed that they are very capable of doing that. In the Netherlands I have found that many practitioners struggle a lot with the matter and they don’t have the time or the interest to ascend to the meta level, even though that would feed back positively into their material undertakings.

I have been looking for collaborators in the Netherlands who look beyond their narrow field and manage to recombine multiple theoretical and practical strands back into their work but there are very few. I hosted the UX Book Club Amsterdam a while, but found that most attendees there took their field of design too narrowly and the field of UX too seriously. Similarly the Berlage Institute is doing a postdoctoral course ‘to explore the forces that shape the built environment in the contemporary world’ which is limited to architects. I don’t know anybody who believes that the problems that will plague our cities in the next fifty years will be solved drawing from the monoculture of architecture school.

It is as if most people in the Netherlands are trapped within the operational closure of their own practice.

I don’t know where I would fall, but I struggle every day with striking a balance between theory and practice and I think if you do not feel that struggle you should take a long hard look at what it is you are doing.

After Social Cities of Tomorrow we had a party at our offices in the Open Coop because they officially incorporated as a cooperation and are set to do great things. The party was rather tremendous and good parties are key to getting things done in Amsterdam.

And then there was this band playing in the office. #nofilter

And now it is Sunday while I am typing these notes and because of a lack of gourmet coffee, it is off to the Hubbub studio in Utrecht to be the murder board for Kars’s LIFT presentation.

  1. For a primer on the issue, read danah boyd’s “Six Provocations for Big Data”. []

Week 254: game designing, data journalism, django, Praxis and game jam

Winter light

Last week started with recuperating from the second massive move we did getting massive wood furniture from Saxony. That was spent with a long overdue first visit to the Barn here.

The next day I peeked in a bit with the game design process at Hubbub.

Then I went to the Django meetup in Berlin organized by Jannis Leidel over at The Maker’s Loft.

I was also pleased with this write-up by Kevin Slavin of the Social Cities of Tomorrow conference over on his Tumblr (which is pure gold by the way).

The event Social Cities of Tomorrow is also intended as an alternative to the increasingly popular idea of ‘smart’ or ‘intelligent’ cities.

It is good to see our friends from the Mobile City to be so well attuned with the international cutting edge when it comes to smart city rhetoric.

Berlin data journalism meetup

Wednesday I visited the Daten & Journalisten meetup at the taz headquarters here in Berlin and I presented some of the data journalism projects we did both with Hack de Overheid and with Monster Swell.

Got my metagame deck!

On Thursday I dropped by Praxis, the office of Rainer Kohlberger and worked there for a bit. That day also marked the awards ceremony for the Apps voor Nederland contest and the success allowed us to get our minister of economic affairs to side with open data on television.

Trying out this view

On Friday I was off to Friedrichshain to receive my team for the gamejam and that ended the week. Results of the gamejam are in this event write-up.

Still jamming

Dutch Train Times are open

Two years after I harrangued the Dutch national railways on radio about their closed data policy and debunked all their arguments why they would not open up this data.

This month the NS opened up their data via an official API. And one of the first applications is this live train map of the Netherlands which is just wonderful. It simply exposes something that we knew implicitly and displays it very fluently.

This is just one of the many open data dominoes falling this year, but a very nice one and yes it looks like victory is within our grasp.

Tekst NRC.next-artikel openbaar vervoersgegevens

Stukjes schrijven zonder hyperlinks erin is gek, maar dat komt erbij kijken als het op krantenpapier gedrukt moet worden. In ieder geval is vandaag een stuk van mij gepubliceerd op de opinie-pagina van de nrc.next: “Dat moet toch beter kunnen? — Geef vervoersinformatie vrij.

In dat stuk betoog ik dat de gegevens van het openbaar vervoer vrij beschikbaar moeten zijn voor reizigers. Dit is onderdeel van een langer thema op dit blog en van een recente ontwikkeling voor open data in de breedste zin van het woord.

Hier de integrale tekst zoals ik hem heb opgestuurd (en die waarschijnlijk met kleine wijzigingen geplaatst is1) en nu mét links:

Ik wil een mobiele telefoon die me precies vertelt wanneer ik weg moet voor mijn volgende afspraak, waar ik moet instappen, waar ik eruit moet en hoe ik daarna precies moet lopen of fietsen. In Japan bestaan dat soort systemen al terwijl we hier vaak niet eens weten wanneer de volgende tram komt. We kunnen dat hier ook maken als we bij de gegevens mochten. Die zitten alleen vast in één van de informatie-goudmijnen waar Alexander Klöpping het hier over had op 27 juli.

In Nederland hebben de openbaar vervoersbedrijven bedacht dat zij de informatie verzamelen bij 9292 en dat wat zij aanbieden goed genoeg is voor iedereen. Helaas is dat het niet. Op 9292ov.nl staat een site uit de jaren ’90 waar je zo goed en zo kwaad als het gaat een reis kunt plannen. Tegenwoordig hebben sommige vervoerders ook applicaties voor de iPhone. Die van 9292 is werkbaar en de NS heeft er nu ook eindelijk zelf eentje (maar het is de vraag of die het blijft doen als het gaat sneeuwen).

Maar wat als je dan geen iPhone hebt? Dat is precies het probleem. Doordat de vervoerders op de gegevens zitten en poortwachter spelen ben je overgeleverd aan wat hun platform-du-jour is. Heb je een telefoon die niet hip genoeg is of te alternatief, dan heb je pech gehad. Nu hoeven zij natuurlijk niet voor iedereen een applicatie maken, maar wij mogen dat dus ook niet zelf doen. Ze zeggen dan dat: ‘de gegevens van hen zijn’. Alleen vergeten ze dan dat zij er voor ons zijn en dat wij al twee keer hebben betaald voor die gegevens: via de belastingen en via ons kaartje.

Je mobiele telefoon is maar één plek waar deze gegevens handig zijn. De mogelijkheden zijn eindeloos maar we weten pas wat werkt als iedereen die een idee heeft dat uit kan proberen. Als we moeten wachten op de creativiteit van onze vervoerders, wachten we op een bus die nooit komt.

In Japan zijn ze ver hiermee, maar bijvoorbeeld in de VS zijn deze gegevens ook al open en in Londen heeft Transport for London pas alles vrijgegeven. Wat ze daar zien is dat er een grote groep techneuten zit te springen om ermee aan de slag te gaan. Volwaardige applicaties waar wij jaren op moeten wachten worden daar binnen enkele dagen gelanceerd.

Zoals pas op een Science Hack Day —een dag waar techneuten en wetenschappers samenwerken— in Londen waar door een paar mensen een live metrokaart werd gemaakt met de posities van alle treinen. Verder is daar net een fietsen-leensysteem (de Boris Bikes) gelanceerd met nu al meerdere concurrerende mobiele applicaties waarop je kunt zien waar je fietsen kunt huren. Dit gaat verder dan je telefoon: mensen bouwen horloges met daarop live vertrektijden van haltes in de buurt, gratis SMS- en telefoondiensten en een Augmented Reality-weergave van metrolijnen bestaat ook al. Alleen niet in Nederland…

De vervoerders hoeven dus alleen maar hun gegevens vrij te geven, ontwikkelaars bouwen dan voor geld, prestige of plezier en de reiziger wint, want hij krijgt de keuze uit meer en betere applicaties.

Nu is onze overheid wel traag maar ook niet helemaal achterlijk. Er komt over een paar jaar —riant laat— een Nationaal Datawarehouse Openbaar Vervoer (NDOV) waar alle gegevens in moeten. De aanbesteding start binnenkort maar daar moeten een paar harde eisen bij als we mee willen kunnen met de rest van de wereld.

1. Alle gegevens in het NDOV wat betreft plannen, actuele locaties, vertrektijden en storingen moeten leesbaar zijn voor mensen (via een website) en voor computers (via een API). 2. De gegevens moeten voor iedereen vrij beschikbaar zijn zonder beperkingen. En 3. er moet altijd een basisplanner aangeboden worden die van hoge kwaliteit is, maar daarnaast en daarbovenop moeten anderen kunnen innoveren.

Nederland slibt dicht en de visie op mobiliteit zoals in de troonrede verwoord gaat niet verder dan meer en meer asfalt. Daartegenover staat de hard-core vouwfietsbrigade maar het hoeft niet zo zwart-wit. Met de juiste informatie op het juiste moment kun je de best mogelijke keuze maken, of dat nu het OV is, de fiets, de auto of een combinatie daarvan.

Goede informatie kan ons helpen om één vervoersnet te maken waarbinnen je zorgeloos reist. Denk aan het geweldige gevoel als je de trein en je aansluitingen haalt en wél op tijd bent. Dat kan vaker en makkelijker. Openbaar vervoer wordt misschien nooit een feest, maar het kan wel een stuk minder beroerd —en misschien zelfs leuk!— worden.

Gepubliceerd worden in de krant is natuurlijk fijn, maar ik hoop vooral dat dit stuk iets wordt om dit verhaal mee verder te krijgen en om bij de mensen op de juiste plek het inzicht verder te helpen.

Bedankt in ieder geval Alexander Klöpping voor de aanleiding en Reinier Kist en Antoinette Brummelink voor de feedback.

Hier ook een foto van het stuk.

  1. Ik zit in het buitenland, dus ik heb het stuk zelf nog niet in de krant gelezen. []