Sunday, 25 December 2011

NS API: stationslijst naar een array

Om de lijst met stations direct uit het geheugen te kunnen gebruiken plaats ik twee arrays: codes[] en names[] in mijn code.

Het converteren van de api xml output naar twee plaintext arrays was een aardige klus die steeds vaker terugkwam, omdat de NS nieuwe stations bouwde. Ik besloot om een script te schrijven.

Het was zonder twijfel veel mooier om een parser in een mooie programmeertaal te schrijven, maar de oefening met pipes en de commandos sed, awk, grep, paste en cut was leuk.

Het script converteert uppercase stationscodes naar lowercase, internationale karakters (umlaut etc.) naar normale karakters en haalt enkele stations zoals "Utrecht" weg omdat er ook al een "Utrecht Centraal" bestaat. Ik besloot verder om namen als Munchen te schrijven als Munchen (zonder umlaut) om het typen te vergemakkelijken.



#!/bin/bash
cat ns_api_stations.xml | grep code | sed s/\<[\/a-z]*\>//g | sed 's/^[ \t]*//' | awk '{print "\"" tolower($0) "\","}' > codes.txt

cat ns_api_stations.xml | grep name | sed s/\<[\/a-z]*\>//g | sed 's/^[ \t]*//' | sed 's/[àâä]/a/g; s/[ÀÂÄ]/A/g; s/[éèêë]/e/g; s/[ÉÈÊË]/E/g; s/[îï]/i/g;
s/[ÎÏ]/I/g; s/[ôö]/o/g; s/[ÖÔ]/O/g; s/[ûüù]/u/g; s/[ÛÜÙ]/U/g; s/ç/c/g; s/Ç/C/g' | awk '{print "\"" $0 "\","}' > names.txt


paste codes.txt names.txt | grep -v '"Amsterdam"' | grep -v '"Almere"' | grep -v '"Berlijn"' | grep -v '"Eschmarke"' | grep -v '"Den Haag"' | grep -v '"Utrecht"' | grep -v '"Leiden"' | grep -v '"Rotterdam"' > merged

cut -f1 < merged > codes.txt
cut -f2 < merged > names.txt
rm merged

Saturday, 24 December 2011

Moving to Google's App Engine

I transitioned the social analysis application to http://socialgraphanalysis.appspot.com/ last week. It turned out to be quite easy. What I came across was this (among others):

When the application is being developed, database structures change often. In MySQL I could select my old data and convert it without much of a hassle. The kind of queries that you can run on App Engine are so limited that the old data and the new data cannot be properly distinguished. In the end, I added a version attribute to all my entities, so that I could later do a "SELECT  * WHERE version = '3'" to perform a transition.

Thursday, 15 December 2011

Yahoo Pipes uses multiple IP's

For my Twitter analysis program I need access to quite a lot of data from the twitter api (about 200 requests per visitor). Unfortunately the twitter api has an awful rate limit of 150 requests per hour per IP, so it is impossible to route this all through my server. Fortunately the api accepts jsonp requests, so I decided to offload fetching the data to the client.

The problem is that for most of my users 150 requests just won't cut it. After looking up a single person with 150 followers, the user has to wait for an hour. A caching solution is needed.

I decided to create a workflow like this for every request that would normally go through the twitter api:

  1. Ask my cache at waleson.com for the info. If successful, return.
  2. Try the twitter api via json p. If successful, send to waleson.com cache before returning.
  3. Ask my server to act as a proxy to the twitter api (Actually I've hooked up some others webservers that I happen to manage that my server uses as proxies).
However, I had played with Yahoo Pipes some months ago. (And I _really_ liked it.) Pipes accepts jsonp requests too. Perhaps it could be used to bypass the 150/hour/ip limit? Maybe not, maybe my Pipe is put on a single server and all requests will be made from that server's IP addresses?

Well, I set up a pipe, made it do a request to a logger on my server and did about 60 results. As it turns out, Pipes uses multiple IPs (this contains 7 unique IP's, there might be many more, but this should do for now):

83.87.80.102
87.248.125.49
87.248.125.49
217.12.1.125
87.248.125.48
217.146.191.18
217.12.1.124
217.146.191.19
217.146.191.19
87.248.125.49
217.12.1.124
217.12.1.124
217.12.1.125
87.248.125.49
87.248.125.48
87.248.125.48
87.248.125.48
217.12.1.125
217.146.191.19
217.146.191.19
217.12.1.124
87.248.125.48
217.12.1.124
217.12.1.124
217.12.1.125
87.248.125.48
217.12.1.124
217.12.1.124
87.248.125.48
217.146.191.18
217.12.1.125
217.12.1.124
217.146.191.19
87.248.125.49
217.12.1.125
217.12.1.124
217.12.1.125
217.146.191.18

217.12.1.125
217.12.1.125
87.248.125.49
87.248.125.48
217.146.191.19
217.12.1.125
217.146.191.18
87.248.125.49
87.248.125.49
217.12.1.125
87.248.125.49
217.12.1.125
217.12.1.125
217.12.1.125
217.12.1.124
217.146.191.19
217.12.1.124
217.146.191.19
217.146.191.19
217.146.191.19
217.12.1.124
217.146.191.19
217.12.1.124


User agent: Mozilla/5.0 (compatible; Yahoo Pipes 2.0; +http://developer.yahoo.com/yql/provider) Gecko/20090729 Firefox/3.5.2

So, I am currently putting Pipes somewhere in my workflow to act as another set of proxies to support even more users. Awesome!

Wednesday, 14 December 2011

Coming soon, social graph analysis in your browser


Over the last couple of months I've been spending some of my spare time on this project. A web-based interface to twitter analysis.

It works thus:
  1. Generate a network. (consisting of all followers of @xxxx, @xxxx himself and all followers of @yyyy but not @yyyy himself)
  2. Download the data from twitter. (And cache the results to my server, twitter has a horrible 150 requests per hour policy)
  3. Let a grouping algorithm put similar people close together (in a web worker, hurray!)
  4. Let the user analyse the created network. In the image above I have selected the followers of some of my friends and I am currently checking my influence within their connections.
Looks promising, when do you release?
Within a week or two. I have some cool stuff that I want to throw in and I need to fix some caching/proxy issues.

How did you make this?
Javascript, jQuery, web-workers, intelligence (in my brain), and simple html/css

Idea time: Language Acquisition 2.0

I have to admit, I am somewhat hesitant about posting this here, because frankly there is serious money in this idea. Nevertheless, it's idea time!

I did a number of Language Acquisition courses in the course of my English Language and Culture programme at Leiden University. The focus was on acquiring vocabulary and on pronunciation. Most educators believe that doing+feedback is learning, and if we can trust Machine Learning, then this is certainly true.

Nevertheless, the classes were the only time most students did any "doing". Sure, we were given tasks to practice at home, but I for one thought speaking to the mirror was a somewhat surreal exercise and hardly ever did it. (Moreover, I was often pressed for time, and somehow creating interesting programs always won over repeating the strut vowel a thousand times). The practice and feedback within the classes wasn't exactly the pinnacle of modern learning either. Imagine a group of 20 awkward glancing individuals, speaking one after another, afraid to criticize eachother's horrid pronunciation flaws, and without feedback, failing to identify their own. Not much "doing" was done, and the feedback loop for every student was routed through the language lab's teacher. The single teacher was capable, but the students were too. The students were just a bit less capable, more hesitant and always ill-prepared.

Without a doubt there is room for improvement. This is where the idea comes in. Think stackoverflow meets blackboard meets youtube meets chatroulette. An online community where you send in videos or audio (youtube), or have conversations with others (chatroulette) and are graded for your performance (stackoverflow). The community is shielded off from the outside, your college / school needs a subscription for you to access it. The content has one goal: to be reviewed by peers and or teachers, one, two or perhaps three times, then it is deleted or archived.

For instance, a student joins a Language Acquisition 101 course, and he (or she, but I'll refer to it as he, because screw you, this is my blog and I do what I want) gets access to the site and starts off with a rating of 0. The teachers have made a selection of areas you need to focus on in this course (the strut vowel, the silent r, getting a distinct British accent, they name it). The staff attaches instructional video's to the separate tasks, because the student naturally forgot what the heck the strut vowel is. The student then submits a video of himself pronouncing the sentences of the exercise. Another students later rates this video with regards to every separate task (how well did the student do on the strut vowel, how well on the silent r etc.). The rating consists of a simple five-star selector, with optional comments. The scores are immediately added to the student, stackoverflow style. Students can earn badges (Triple A*** strut vowel expert) and get a higher status in certain traits. Once one gets a higher status, ones feedback gets more value etcetera.

I believe that this is the next step in language acquisition. Students can take their time to get acquainted with the material (by watching the instructional videos), take their time to analyse eachother's videos, are invited to be active in the community and they learn every single step of the process. When they do, they get feedback and when they give feedback they analyse the speech of others.

Are you a developer and do you have the time to build it, go ahead! When you get rich, buy me some beer. Or wine. Are you in investor and think this idea is great? Give me a million bucks and I'll consider it. Are you a university and think this is great? Get some guys from the CS lab (or girls, oh no wait, never mind, it's the CS lab), and give them a million bucks.

Idea time: wikitours. Spoken guides on public transportation.

This idea dates from 2007 when I was strolling around Paris and is very simple: Free, guided tours of cities on your smartphone.

  1. dictate a lot of wikipedia entries on building, statues, parks etc.
  2. mark the entries with gps
  3. map public transportation routes
  4. find corresponding wikipedia entries along the way
  5. get on a bus, open the application, enter the number of the route
  6. sit back, watch out the window and learn about everything you pass by
In 2011 I was reminded of this idea by some apps that were actually developed. Wiki Talking Tours for instance. In 2007 I was thinking about this for a couple of weeks, but the project was simply too big. It's essential parts were non-existent so everything needed to be done from scratch.

It is very promising to see that all these basic parts: the device itself (smartphone), a map interface (Google maps), spoken wikipedia entries (Spoken Wikipedia), transportation routes (Google transit), have all come forward in the last couple of years.

Creative innovation is much cheaper, easier and faster now that this "idea-infrastructure" is in place.

Idea time: supermarket navigation

This idea is from early 2008.

I'm a man and am not too fond of shopping for groceries etc, I can't seem to find my way in supermarkets.

I feel like this mouse:


Compared to me, my wife finds exactly what she needs like lightning:



I admit, it's a deficiency, and with time and energy I could probably get a little better at it, but finding all items on the groceries list in supermarkets is certainly not one of my talents. Stereotypes dictate that women feel the same about navigation in cars. For the benefit of these women (and in reality for most men as well) we have invented personal navigation devices (PND's). Enter your destination, follow some simple steps, arrive. Unless you are on a road-trip, it's getting to the destination that matters, the journey itself does not.

One navigation problem down, one too go, I still can't find my way in supermarkets. I was craving for an ancient grocery store where I could tell the clerk: Good sir, I need dinner but am too poor to go out eating, please provide me with some ingredients that match my appetite and budget. These stores are rare and probably expensive.

In comes the navigation device. Back in 2008 I thought, let's mount a PND to a shopping cart. Now, in 2011, I'd say: just make an app. Whichever you would use, the interface could be outlined like this:
- [optional] make a shopping list on the store's (mobile) website
- go to the store
- present your credentials to the pnd/app
- fetch your shopping list
- offer the user to select a meal and put all the ingredients on the shopping list
- show the shortest route through the supermarket to get all the items

The benefits for the user should be clear, the benefits for the supermarket as well. They can manage their inventories better (in one hour, 400 people are going to buy turkey stuffing and we are almost out!), they can detect trends, do customer profiling etc..

Privacy is an issue of course, but that's not my problem ;)

Idea time: RFID+E-Ink, electronic price tags without batteries

I got this idea years ago, somewhere in 2004 after I had heard both of RFID and E-Ink. The order is irrelevant ;) The idea is simple, we take:

1) RFID readers, which send radio waves to tags, which pick up some of the energy in the wave, do some computations and send a reply. The tags are brilliant: no batteries, no connected power source of any kind except for the antenna.

2) E-Ink displays, which need power only to change pixels. After the power is cut off, the pixels remain in their current state.

The result: a small tag with a display. The display gets initialised and updated by an RFID-reader and after that retains its state indefinitely.

Perfect for price tags on shelves in supermarkets, which need to be updated every now and then but are hellish to replace.

I soon found out in 2004 that Epson had already done this: http://gizmodo.com/026090/epsons-electronic-ink-%252B-rfid--21st-century-price-tags

Idea time

I have lots of ideas for projects for which I have absolutely no time. I haven't got the time to actually implement them, I don't have the money to get patents and besides, I'd be happy if someone else made a profit. It's just that the idea creates a spark in my brain and it needs to get out. So I decided to put some of the projects on this blog. If you are a developer, and if you have the time, feel free to pick them up!

Also, I don't like patents on ideas. Am I right in assuming that if an idea is posted on the web, you can not get a patent on it?

P.S., vanity is involved. Some of these ideas have been picked up by big companies that came up with the same thing. I'd like to be able to say: I thought of it way back in 20xx ;)

Idea time: Ticket dispenser 2.0

At most busy front offices you have these ticket dispensers, which need no explanation whatsoever:


I hate waiting in line. Some weeks ago, I had to wait 45 minutes to get some simple bureaucratic thing done. So it hit me, why not hook up these ticket dispensers to a twitter account! After I get a ticket I can start following the account which simply spews out all numbers that are up. I can do some more shopping, get something to eat, and check twitter to see if my number is about to come up.

Seems great right? I'm not entirely sure about the ethics though. It comes close to jumping line, but then again, I see people leave gigantic queues after collecting their ticket to grab a bite to eat. Being able to see which number is on from the screen of your smartphone just makes this leaving less of a gamble.