Elections and Campaign Data Visualization

With the upcoming Dutch elections the campaigns are heating up and we will be allowed to choose our parliamentary representatives yet again. It is interesting to see how statistics and their visualization are used to clarify and position issues in our complex world.

Issues with visualization

You cannot release statistics and visualizations without thinking through the ramifications of these actions. Every non-trivial bit of information has biases and values attached to it. You can never know what will happen, but you can at least think about it.

The other day at the Rotterdam Open Data meeting, someone vehemently defended the point of view that we should not publish data because it could be spun in a way that is harmful to society. A wholly subjective and belittling point of course to which we countered, that unfounded claims can be launched already and without authoritative data sources we do not have a good way of debunking them.

A lot can go wrong when using data visualization, just see this video of a presentation by Alex Lundry which is familiar territory, but is brought nice and quickly:

Or this recent example from a so called Dutch quality newspaper about Greec and other European edging towards the brink of financial ruin spotted by the great Sargasso:

This is guilty of the fiddling with origins and axis scales that is so common in juicing up statistics for presentation purposes. Other faults from the video: “sin of ommission”, “correlation is not causation”, “pie-charts suck”. Most of this is treated pretty well in Tufte’s Visual Design of Quantitative Information where he calculates the lie factor of faulty infographics.

US Job Losses

Data visualizations —especially charts of statistics— in elections are also nothing new, but with the increase of open data and data processing tools, we are bound to see more of them coming out and I hope to see more dynamic ones especially.

The Obama Job Chart (below, taken from Creekside Chat) is a very static traditional chart which could have just as easily been punched out of Excel (though the extra visual touches are nice), but the most important part of this chart is how it supports an overall narrative:

I take issue with the poster’s critique because the chart clearly says that it shows “Job Loss” and not absolute unemployment. Any turnaround of the economical situation will be necessarily coupled by a trend as displayed in the chart (losses have to edge back to zero before they can become gains). Also the comparison to the amount of money in a wallet does not really work because money spent is an absolute loss, while the amount of people in the job market is a pool which is in flux.

UK Job Losses

The UK will have their General Elections next Thursday. In the run-up to the elections, Russell Davies spotted this nice interactive chart by Labour to clarify how they helped reduce unemployment.

Jobs Interactive Map | The Labour Party

It remains to be seen how far these kind of more technocratic online methods support the narratives and media plays that an election revolves around. It does not look like it has helped Labour that much in their struggle.

Combined Approaches

So how to combine online more mechanical and easy to ignore material with the mass-media appeal of legwork on the campaign trail?

What’s more likely to be pivotal is the canny use of the latter to leverage the former: ensuring that every casual contact goes into a database, every issue raised by a constituent (or inferred from a pattern of facts on the ground) is captured and tracked, everything that shows up in the gillnet of your feeds is exploited for its propaganda or organizational value. —“Harvey Milk, community development and the digital balance sheet” by Adam Greenfield

As suggested by Adam Greenfield, a combination of both may be the best option, but besides the much praised Obama campaign we haven’t seen much successful work along those lines yet and even the Obama grassroots organization has been underutilized since the inauguration.

The Dutch Situation

One question would be: Where is the Dutch job loss chart at. If I can massage the correct statistics from the CBS, I’ll see if I can whip up something.

Many political organizations in the Netherlands, do not have the budget or maturity in web infrastructure to be able to quickly create and deploy bespoke applications that are situated within their workflows and fit within campaign deadlines.

A small but comprehensive overview of online activities for the Dutch general elections can be found on Spotlight Effect (in Dutch) but small really is the operative word. I am aware of a couple more initiatives due to come out but it’s quite meagre.

Also when talking about the overarching themes, I haven’t spotted the ones that our election is supposed to be about yet. Unless it is whether you envision a divided Netherlands where a discontented white proletariat rules over both foreigners and intellectual elite alike or whether you want a whole country governed by sane and rational people.
Issues such as education, technology, healthcare, immigration, urban and ex-urban planning for a decreasing population, our international position, energy and food security and all of those with a vision of at least 10 years into the future are sorely lacking. This is probably because most of the population is too shuttered inside their blocks and suburbs to be able to look over the rim of the nearest enclosing dyke.

Update:
This seems to be the overarching theme of the elections for the PvdA.
Iedereen telt mee

Update:
Alex Lundry notices that the Obama job loss chart is being updated by the Washington Monthly. Here’s the April version:

And here’s the same chart for the Netherlands albeit a lot less granular (if anybody has that data, I’d greatly appreciate it):
Werkloosheid Nederland

Table Viewer for Music Hackday

This entire weekend was taken up by Amsterdam Music Hackday for which Alex, Dirk and I had planned to build a prototype version of a surface table projector for music discovery.

The functionality we envision helps ad-hoc groups of people who find themselves in the same location/venue/party to compare their music tastes and see where the overlaps and where the holes are. The table would be a turn-taking jukebox with tangible interactions and nice visuals for all users and spectators.

Easier said than done, of course. We spent a great part of the week and most of the weekend hacking, building, eating, drinking coffee, staying up to the wee hours, literally stabbing ourselves with scalpels, cursing a great deal and drinking whisky to get the thing together, when finally on Sunday in the last hour before the presentation we managed to integrate everything to the level that we could shoot a demo video.

Pictures of the proces and demo videos below:

Real men eat meat!

BENQ

Top view

Point to interface

Last.fm

What we built was just an initial step on the way to the jukebox I described above, but it seemed to look promising enough to net us the first prize from last.fm at Music Hackday for which we were very happy.

We like to thank last.fm, the organizers and the participants of Music Hackday. It was a great event and for us it was a great occasion to finally get this project started.

We will develop the table further and build out the functionality we had envisioned to make it a real locus for social music discovery. It should be hanging in one of our studios soon, so get in touch and visit if you want to try it out.

Obstakels in het werken met open data

Presentatie gegeven over open data en de obstakels waar je in de praktijk mee te maken krijgt. En passant ook een open data maturity model gemaakt wat wel enig hout snijdt.

Dit was leuk om te doen en ik ben benieuwd of Rotterdam snel kan handelen op dit vlak. Zie mijn favorieten op Twitter voor de reacties.

Update:
Rotterdam Open Data, April 21 2010
Nog een foto van het event.

New disciplines for a real-time data world

Some posts that had been sitting in my browser tabs for a while combined together in a brand new job guide for 2010. You can also read this as a follow-up post to my previous post on Why developers are important, this is which developers are important. This post has been lying in my drafts folder for a while, but it has actually only become more relevant.

Some interesting jobs for the coming year(s):

Scalability Engineer

These are already highly sought after ever since Twitter was failwhaling half of the time. Having the competency to keep a website running while it is experiencing massive growth is going to be highly sought after. Some technologies such as Google App Engine promise to make this easy, but they introduce a set of problems of their own. Traditional relational databases are abandoned more and more for the looser often schemaless variety of BigTable-like NOSQL databases that live in the cloud (CouchDB, HSQL, Cassandra, MongoDB, Tokyo Cabinet etc.) or can be scaled at will. If you want to get up to speed on this stuff really fast, there’s a NOSQL conference in London April 20-22nd.

Also knowing your Scala, Tornado, Twisted, NodeJS or other non-blocking framework is increasingly important, since we’re slowly moving out of the request/single response paradigm for the web.

Key skills: Everything command line, functional programming, traditional database management, SQL, virtual machine configuration, puppet

Client Side King

The web based client already was the biggest delivery mechanism for functionality and experience, but it is going to become more and more important. Functionality which you would not have thought possible in a web application, will become available. Some apps may at first be functionally inferior to their native versions, but the fact that they are web native and inherently social will draw people in. After a while either the apps will become more capable or the users won’t care anymore.

Developments in HTML5 bode well for those of us who would like to abolish slow and crashy plugins (yes, Flash). Audio, video, hardware accelerated 3D graphics and much more will soon be native to the web. Just look at Quake II written in JavaScript. There is still a place for adding Silverlight and Flash to certain sites, but the benefits of those technologies are much harder to argue for.

Key skills: JavaScript all variants, styles and frameworks; web-native UI design; marginal IE skills required (since you will not be building for that platform but you should know its limitations); iterative development; guerilla user testing

Algorithm Cook

Ridiculous amounts of data requires strong analytics, very capable navigation and a new sort of editorial proces. These databases draw more and more information from the real world:

“The advent of inexpensive high-bandwidth sensors is transforming every field from data-poor to data-rich,” Edward Lazowska, […] said (NYT) and “Today,” he added, “you have real-time access to the social structuring and restructuring of 100 million Facebook users.” (same source)

Better algorithms will allow us to make better sense of all this data and will provide inputs for the other fields. Everything can have an interestingness in a given context for a given person.

Key skills: multivariate statistics, data wrangling, screen scraping, machine learning, data mining, Excel, SPSS, R/SPlus, Matlab, NumPy, digital signal processing

Visualization Artist

Making sense of all the information requires condensed views with aesthetic qualities. There is simply too much data out there for us to be able to grasp it, so being able to filter and mine the datasets with the help of the other disciplines is essential. But after that step any data needs to be refined, represented and made interactive.

“Decode” ends with “Network,” which examines the interconnections of mobile technologies and the Internet. It also illustrates how digital imagery is helping us to make sense of a frenzied, often confusing world. (NYT)

There are tons of frameworks, tools and libraries in a variety of languages for anybody who wants to try out visualizing stuff. In the end no single one will fit the bill and the best result is achieved combining, mixing and writing something by yourself.

There’s a new O’Reilly book coming out for anybody in the finer arts who’s interested in getting their feet wet with Processing: “Processing for Visual Artists” Then after a while you may be able to produce stuff such as:

Key skills: aesthetic sense, 2D/3D graphics, cognitive psychology, Processing, OpenGL, JavaScript, SVG, design tools, Tufte

And we haven’t even treated the Natural Language Processer, the Urban Information Planner and the Machine Vision Trainer yet but there’s considerable overlap with the above disciplines. If you have any other that we should look at, please suggest them in the comments.

Update: And the New York Times is just looking for somebody with roughly this description.

In code we trust — Vertrouwen in de overheid, bureaucratie en technologie

Boeiende vinger aan de pols in open overheid Amerika met Noel Hidalgo (New York), Dmitry Kachaev (Washington D.C.) en Alissa Black (San Francisco).

Share photos on twitter with Twitpic

Allissa Black (erg tof!) geeft een overzicht van de verschillende initiatieven die de stad San Francisco (met Twitter-burgermeester Gavin Newsom) heeft genomen om open overheid te bevorderen. Misschien een goed idee om als nieuwe burgermeester van Amsterdam ook iemand te kiezen die tenminste snapt wat internet is.

San Francisco loopt uiteraard buitengewoon voorop in dit soort dingen —noblesse oblige. Ze hebben een data-inventaris en -richtlijn aangenomen en er wordt elk kwartaal een verslag uitgebracht van de stand van zaken. Ze maken sites zoals: DataSF, RecoverySF en een Citypedia om kennis te delen binnen de stad.

Haar stappen die je moet nemen:

  1. Stel een team aan om het initiatief te doen groeien
  2. Breng richtlijnen uit
  3. Zorg dat het een verwachting van de overheid is.
  4. Voer beleid door.

Ook nog op de vraag welke software men gebruikt om participatie van burgers te bevorderen het antwoord: Uservoice en IdeaScale.

En de tip om te participeren in de O’Reilly online conferentie voor Gov 2.0 International.

In Amerika zijn heel veel initiatieven op allerlei niveau’s: stad, staat, federaal die hiermee bezig zijn en er lijkt heel veel ‘web literacy’ (mediawijsheid) te zijn en ook budget voor al dit soort dingen. Toch moet ik gevoelsmatig zeggen dat we in Nederland nauwelijks achterlopen.

Op overheidsniveau hebben we Ambtenaar 2.0, Hack de Overheid, en nog een serie initiatieven (ik wacht nog op data.gov.nl). In Amsterdam zouden we bijvoorbeeld het Open City manifest kunnen aannemen maar iets als Amsterdam Opent is al heel tof.

Waar het in Nederland beter kan:
1. Duurzame web-platformen die gebouwd zijn door bedrijven die het snappen op open technologie met een blik vooruit op (ipv één keer oplevern en iets onbruikbaars over de schutting gooien).
2. Vaste, aanspreekbare teams die de materie beheersen aanstellen voor deze onderwerpen.

(Deze blog is snel geschreven door Alper na een sessie op SxSWi in Austin, Texas. En tevens gecrosspost naar het groepsblog voor alle Nederlandse aanwezigen.)

Sea fare

Just saw this picture of the Wellington RFID farecard system at Adam Greenfield’s Flickr stream (CC-by-nc-sa photograph):
Notxtian

It’s called a Snapper card.

Compare this to the London based Oyster card:

And the Hong Kong Octopus card:

This international sea food theme makes me think that we have definitely missed a branding opportunity here (and this for a country of fishermen). Our entire OV-chipkaart system has been grossly underdesigned on all fronts, so no surprises there.

So I’ve got two proposed alternative names for our low countries farecard system:

  • Herring card (Haringkaart)
  • or

  • Mussel card (Mosselkaart)

How do we get this change implemented? And anybody care to mockup a concept?

Municipal boundaries of the Netherlands

Goal

I want to create a simple visualization tool for Processing so I can input a set of values for each Dutch municipality and then color a map based on those values.

This is harder than it seems because there is no convenient source for the cartographic data for the boundaries of the Dutch municipalities. So the first step is to acquire those boundaries.

OSM

This blogpost in Dutch put me on track for this dataset. OpenStreetMap has a pretty complete picture of the Netherlands and they track municipal boundaries under boundary admin_level=8.

The data dump contains all administrative boundaries on several levels and is something of a mess. The informationfreeway link on the blogpost which should generate an OSM file with only the relations with admin_level=8 but that specific API seems to be down. Also the approach of importing the OSM file back into a PostGIS database before rendering anything struck me as somewhat too cumbersome. So an alternative approach was called for.

There are some readily available dumps at CloudMade both with OSM files (format) and Shapefiles for these administrative boundaries. That seemed to be a useful starting point.

An OSM file is just an XML file with series of nodes, ways and relations in it. It is filterable by the generic processing tool Osmosis with the options that I guessed --way-key-value to filter ways with the admin_level tag and --node-key-value to do the same for nodes. Osmosis does not specify anything for relations which I want to preserve to be able to identify each boundary by the name of the municipality.

Having done that, the resulting OSM file needed to be drawn to the screen. I made a simple XML reader in Processing to display the resulting boundaries. The result did not seem to be completely what I wanted both with missing and unlinked geographical features and I think not everything properly labelled. For this particular application a wiki-map does not seem like the most suitable source of data.

CBS

CBS (the Dutch statistics office) also provides a dataset with administrative boundaries of the Netherlands. It is a bit hard to track down on the site and there’s a reference to the Kadaster which isn’t entirely clear, but the generalized Shapefile is workable.

One problem is that Shapefiles are only understood properly by GIS people and there are hardly any libraries for web developers to work with the data format. Sunlight Labs recently released their ClearMaps library to aid developers wanting to work with Shapefiles, which is a big step in the right direction.

Another problem is that the file on the CBS site is from 2006 and that several municipalities have merged/split, so that adds some problems for correlating it to data. And come to think of it, this municipal rejiggering makes any historical data view of the Netherlands a daunting task. Somebody on Wikipedia has generated a similar map of the Netherlands for 2010 supposedly using data from the CBS, but I can’t find that dataset on the site.

Oddly enough there’s also nothing readily available to draw Shapefiles in Processing. Perusing the forum yields this post which points to Geotools which is a massive set of Java libraries consisting mostly of a huge dependency nightmare mitigated somewhat by Eclipse and Maven.

Geographical data

Viewing the Shapefile in qGIS shows that it does indeed contain the municipal boundaries with correct labeling. Having verified that, we need to extract the geographical data from the file into a format for easier reuse. Linking all the Geotools dependencies to my Processing sketch does not seem like an attractive proposition. Using the Geotools quickstart to setup Eclipse to pull in the libraries and run the Java code, did work pretty conveniently.

Poking around the Shapefile with Java and using the very poor javadocs (the User Guide’s usefulness turned out to be extremely limited) and sources posted online that are available of Geotools yielded something worthwhile after a full day’s work. I also found lots of forum posts of very confused people with few replies and little insight to be gleaned from them. This really seems to be an underdeveloped field.

It turns out the Shapefile read with Geotools contains SimpleFeature classes (UML for those, and the Wikipedia lemma for the OpenGIS standard) of which you can call the getDefaultGeometry() methods.

Geotools also provides a default Drawer.java which you can use to display the features (via LiteShapes) in the Shapefile using Java AWT Graphics. This turned out to be useful mainly for debugging purposes and to verify that Geotools does indeed properly read in the Shapefile. Using a GeomCollectionIterator to walk through the points and extract the coordinates that way turned out to be a dead end (especially because I didn’t get the role of the various Transforms).

Another idea was to generate SVG from the Shapefile but the GenerateSVG class did not seem to be included in my library checkout and fiddling with the maven file seemed risky.

Finally the following piece of code yielded for me the two pieces of data I was looking for, the names of the municipalities and the content of the SimpleFeatures as MULTIPOLYGONs.


  String gemShapefile = "/Users/alper/Documents/projects/muniboundaries/cbs/buurt_2008_gen2/gem_2008_gn2.shp";
  File file = new File(gemShapefile);

  FileDataStore store = FileDataStoreFinder.getDataStore(file);
  FeatureSource featureSource = store.getFeatureSource();

  FeatureCollection features = featureSource.getFeatures();
  FeatureIterator iter = features.features();

  int counter = 0;

  PrintWriter pw = new PrintWriter(new FileWriter("/tmp/geo.txt"));

  while(iter.hasNext()) {
  	SimpleFeature feature = iter.next();

  	Collection props = feature.getProperties();

  	if (counter > 0) {
  		String gemNaam = "";
  		String geoWKT = "";

    	for (Iterator it = props.iterator(); it.hasNext();) {
		Property property = it.next();

		if (property.getName().getLocalPart().equals("the_geom")) {
			geoWKT = property.getValue().toString();
		}

		if (property.getName().getLocalPart().equals("GM_NAAM")) {
			gemNaam = property.getValue().toString();
		}
	}

    	pw.println(gemNaam + "; " + geoWKT);

    	System.out.println(gemNaam);
  	}

  	counter++;
  }

  pw.close();

The funny thing is every feature has a property “the_geom” which contains the geometry data and its toString() method yields the geometry data as Well-known text. That turned out to be all we needed.

It turns out the coordinates in the Shapefile are in Rijksdriehoekscoördinaten which are cartesian coordinates based on a custom projection and associated rectangular grid for the Netherlands. This is easily verified using this web form to convert a pair back to GPS and looking that up in Google Maps.

The above code produces lines such as:
Leek; MULTIPOLYGON ( ( (224599.999985 582499.999985, 224999.999985 581299.999985, […] 223366.621185 581866.621185, 223499.999985 581999.999985, 224440.97068499998 582470.485385, 224499.999985 582499.999985, 224599.999985 582499.999985)))

This is a simple list of coordinates that define the boundaries of the polygons. A MULTIPOLYGON contains one or more POLYGONS. A POLYGON is one list of the boundary with zero or more lists defining any holes within that boundary. I figured that out looking at the specification for GeoJSON (same data model, different markup) which is the format I am going to republish this information in.

Drawing

With these coordinates, it became quite easy to write a Processing sketch to draw these boundaries. I looked up a datasource for the last European elections and hooked that up for the colors.

Results of the 2009 elections for European Parliament

Making iterative sketches in Processing with Eclipse is somewhat cumbersome because you need to utilize quite a high level of abstraction if you don’t want your classes to interfere with each other but still Eclipse allows me to work quickly and lets you write Java 1.5 level code against the Processing core.jar (that alone is worth the effort).

I’m going to release a generic Processing sketch where you only need to add a data file with colors or values for each name. Publication as these boundaries as both GeoJSON and SVG is also forthcoming. I’m also looking for the more recent 2010 Shapefile from the CBS with all the current municipal boundaries in it. If there’s demand I can also extract the living quarter and neighborhood level administrative boundaries which are in the other Shapefiles.

Update: Some research shows there’s a very promising avenue to do this stuff by converting the entire Shapefile to GeoJSON as explained in this StackOverflow post and then drawing thath using either ProcessingJS or OpenLayers.

Update: Managed to convert the data to GeoJSON and draw it using ProcessingJS:
Processing the Netherlands

This opens up a ton of possibilities for interactive visualization and sharing. More to follow.

Verkiezingsuitslagen Grootste Partijen Gevisualiseerd

Snel een visualisatie in elkaar gedraaid op basis van mijn Europees Parlement visualisatie van vorige week. De huidige verkiezingen voor de gemeenteraad 2010 (klik voor groot):

Grootste landelijke partijen per gemeente genomen waar mogelijk. Sommige gemeenten doen nu niet mee, zie Wikipedia voor het lijstje en van 22 gemeenten moet de uitslag nog komen.

De kaart met de grootste partijen van de vorige verkiezingen (voor het Europees Parlement in 2009):

Het is lastig vergelijken met de Europese verkiezingen omdat dat een heel ander soort verkiezingen is met een nog lagere opkomst (en zichtbaar meer winst voor het CDA). Wilders deed bij die verkiezingen op een stuk meer plaatsen mee en dat is duidelijk zichtbaar. Je ziet dat de PVV nu vervangen is door de VVD, maar als Wilders in al die gemeenten wel meedeed, had het heel heel anders af kunnen lopen.

Beide plaatjes mogen overgenomen worden onder een Creative Commons Naamsvermeldingslicentie voor Alper Cugun en een link naar http://alper.nl.

Stuur dit door op Twitter!

Update om 02:00. Volgende en laatste update morgenochtend.

Update: Bezig met de laatste ronde invoer en het is zelfs nog wachten op enkele laatste uitslagen die niet op teletekst staan. Daarna misschien ook de gegevens van de vorige gemeenteraadsverkiezingen overnemen omdat dat beter vergelijkt dan de Europese verkiezingen.

Update: Nog steeds aan het wachten op de uitslagen van Geldrop-Mierlo en Lelystad.

Update: Laatste verlate uitslagen ook verwerkt.

Update: NRC heeft ook een verkiezingskaart die er leuk uitziet maar hier en daar wat lastig werkt.

Toename in veiligheid door de ov-chipkaart

De OV-chipkaart is een groot succes in de Amsterdamse metro. Het aantal zwartrijders is afgenomen en navenant ook het aantal geweldsincidenten.

Mooi en nog meer onbegrip voor de mensen die tegen de ov-chipkaart zijn. Hij mag nog wel verbeterd worden, maar de waarde van de chipkaart is nu duidelijk bewezen.

De GVB zegt in de pers dat het niet mogelijk is om de harde kern van zwartrijders aan te pakken. Ik denk meer dat het een kwestie is van niet willen of dat de kosten niet opwegen tegen de baten. Het is de vraag in hoeverre je de kieren dicht van een veiligheidssysteem.

Ik post dit even hier omdat de GVB een van de meest gesloten en niet web-savvy bedrijven van Nederland is, dus voor de lezers zonder enige hoop van wederhoor.

1.

Wat mij een paar keer is gebeurd is dat er een marginaal iemand vlak achter je mee door de poortjes glipt. Ik ben meestal te in gedachten verzonken met muziek op om het snel op te merken, maar de mensen van het GVB die bij die poortjes staan DOEN NIKS. Wat hebben ze dan voor zin? En wat kan ik er dan van zeggen als de mensen van het GVB het goed vinden wat er gebeurt?

2.

Evenzo mensen die poortjes intrappen: doorseinen via beveiligingscamera’s, metro waar ze in zitten subtiel vertraging laten oplopen en ze door politie op het volgende station uit de metro laten trekken, onder curatele stellen en dwingen schade te vergoeden.

Het kan wél als je het maar wilt.

Verkiezingen

Een overzicht van de uitslagen van de Europese Verkiezingen 2009 per gemeente:
Results of the 2009 elections for European Parliament

Een uitleg hoe ik deze kaart gemaakt heb volgt binnenkort.

Ik maak me klaar om woensdag nog zo’n kaart uit te draaien voor de gemeenteraadsverkiezingen. Ik ben alleen bang dat het nog een pessimistischer beeld zal opleveren.