The A/B City

Jan 9, 2012

One of the key ingredients of many successful online companies has been rapid iteration and improvement of their services via A/B testing. In its essence, you split your users into two (or more) groups, serve them variants of your service (e.g., different algorithms, user interfaces) and then sit back and measure how each group behaves. Once you are operating at web-scale, the sheer size of visitors and potential for rich data collection can really inform those companies about how they are performing and what ideas work better than others. Size matters the most: (we hope that) once we are dealing with a large enough sample, we will randomize all the other confounding factors that may play a role in what we are trying to measure. In other words, the web turns the world into a living laboratory.

Unfortunately, this cool technique is not readily available for the physical world. Imagine, for example, trying to evaluate a policy that aims to reduce the number of children being killed by cars. How do you split your users (and ensure randomization)? How do you cope with the fact that people will already behave differently in different geographic areas? (Moreover, how do you come to terms with the ethical questions?). The sad conclusion seems to be that, when we intervene on the physical world and observe a change, it remains difficult to speak of anything more than a correlation between what you did and the behavior you (hope to have) caused.

Luckily, alternative approaches exist. The equivalent of an A/B test when you can’t randomize your sample is a quasi-experiment. In a recent paper, we adopted this perspective in order to examine the impact of changing the user access policy to London’s (shared) Boris bikes. By splitting our data around the time of the change, carefully cleaning it, and ensuring that we maintain a large (temporal and spatial) scale of data, we examined how sensor readings from bicycle stations can be used to observe how the policy change propagated across time and the city. Interestingly, the data showed us that this change in policy resounded differently in different stations: some locations that were areas that people went to in the morning and left from in the evening flipped their pattern completely.

The rest of the details are in the paper (reference below). However, since this is the year that politicians are taking on coding, maybe they should also start taking a few hints from web companies and start running their own A/B tests as well.

N. Lathia, S. Ahmed, L. Capra. Measuring the Impact of Opening the London Shared Bicycle Scheme to Casual Users. (To appear) In Elsevier Transportation Research Part C, accepted December 2011.