Data mountain becomes data heap

It should have taken years to index the data for his thesis into a database, but thanks to Tomas Zwinkels’ stratagem, it only took a few weeks. His colleagues can also enjoy the benefits.

He loves bad ideas, says Tomas Zwinkels, and he genuinely means it. ‘The good thing about a bad idea is that it puts you in a situation which you have to get yourself out of’, he says. ‘And that makes for a delightful challenge.’

Well, he would know.

The PhD student was awarded the NWO Talent grant three years ago. He wanted to examine the careers of every Dutch politician since the Second World War in order to determine the factors for success or failure, and the real reason as to why too many or too few women are members of the House of Representatives, to name just a few points.

Chaotic mountain

The enormous number of politicians and the chaotic mountain of data was something which he didn’t think about. This alone would take years to index into a database. So what should he do?

That is when he came upon an idea, literally on the back of a serviette. CodeThing: a programme which enabled him to reduce the input time from a few years to a few days. His brother is a programmer, which is handy, as he could help him adjust the programme to suit his needs.

The idea? Man and computer working together to assess data and subsequently label it.

Cooperation

‘The idea is pretty simple, certainly for programmers’, Zwinkels admits. ‘However, when they write a computer programme, they try to make the computer act like a person and that is complicated. In this programme, I let people do what they are good at – assessing – and the computer do what it is good at, namely allocating codes to large quantities of data.’

This means that the computer gets a heap of information – for example: ‘What are you doing right now?’ You can give answers such as: surfing the internet, reading, procrastinating, working… The computer assesses these answers and gives them a code. The person watches and decides whether this is accurate and makes adjustments where necessary.

Staggering

The result was staggering. In a few days, the heap of data, which should have taken years to process, disappeared. And it doesn’t only work with Zwinkels’ data.

‘We were approached by another group of researchers from Oxford who were researching timing and had also compiled an enormous heap of data. A student assistant had started the processing, but had only managed to process a few per cent in a few months. We pottered about one afternoon and had already processed three quarters of the data within a few hours’, says Zwinkels. ‘The researchers were close to tears.’

Publicity

Up until now, around eleven to twelve people have been using CodeThing, which he has made free – for scholars, at least. That is not very many, but that is also because he hardly got any publicity for his finding, he says. He hopes and expects that this will change. ‘After I get my PhD I will take a few years’ leave to develop it further’, says Zwinkels. ‘I have deliberately saved up to do that.’

This is also the reason why he has made it available for scholars but not open source. ‘If a large market research company should want to use it, then they can pay for it’, he finds.

And if that does not work? ‘Then I can always hand it over to the open source community’, says Zwinkels.

For the meantime, CodeThing belongs to him – and his brother, of course.

Interested? Go to codething.net

09-11-2015