Online Stalking: London, Paris, New York

Much like the Strava controversy a few weeks ago, this is a great example of how seemingly innocent data can be used for nefarious purposes.

Citymapper is a journey planning application that integrates all modes of transport (public, cycling, walking, driving) in major urban areas. Starting in London, Citymapper is now available in New York, Paris and Amsterdam as well as further afield (as you’ll see shortly).

Citymapper hasn’t disclosed the number of users it has. The Google Play store states between 5-10million downloads; assume the same, if not higher, for Apple’s App Store. Remember that it is only available in major cities and you can see that a large percentage of the world’s capital cities use this application.

On a personal note, Citymapper is a ‘must-have’ app for anybody living in London, especially for a non-local. Citymapper’s ability to respond to train non-availability, cancellations and tube strikes whilst still delivering a live and accurate route recommendation has certainly saved a few people caught in the rain or running late for job interviews.

So, what kind of data does Citymapper have?

On any given day, in cities around the world, they know the exact routes of millions of people; they know where people are travelling, when, and even what modes of transport they are taking.

This information would be hugely useful and have huge applications for any organisation that operates in one of the world’s major cities… it could also be used maliciously should any of this data be publicly facing.

In October 2015, Citymapper rolled out an update that allowed it’s users to share routes and arrival times with their friends. Even friends that don’t have the application can view the trip as it all works through the web browser. Each time a trip is planned on Citymapper a URL is generated that allows your friends to view your trip on a web page. Below is an example.

As you can see there isn’t anything hugely compromising and no personal identifiable information is available. You have a start location, an end location, a route and some timing information. In this instance, a random inhabitant of London travelled from Tooting to Balham on the Northern Line before getting an Overground train to Battersea, all in all taking 26 minutes.

The eagle-eyed amongst you might see where this is going.

The URL (https://citymapper.com/trip/Tbs6odu) has a fairly short unique identifier. “Tbs6odu”, 7 characters long with uppercase, lowercase and numeric characters.

By way of comparison, most online sharefile programs that generate random URLs often have upwards of 20 characters; inclusive of uppercase, lowercase, numbers and special characters (Aj5ye&hsk8Pq@3Hh%#3Q), which is exponentially harder to brute force.

Using a Python script to generate alphanumerical codes 7 characters in length, and check if they are valid by firing an HTTP request to Citymapper was initially sluggish. Even though it is a comparatively short URL ID there are still ~3 x 1012 combinations to get through – slow progress if you need to remain below the threshold of Citymapper’s rate limiter. In an hour I had discovered less than 10 valid URLs.

However, there was a pattern!

  • T4v8muk
  • Tgg5743
  • Tbiwmq9
  • Tha7v1o
  • Tjrdjfp
  • Tdgv2zj
  • Tjgddh3
  • Twdwck3

Each of the URLs began with a capital ‘T’ and used no uppercase letters after the first character. Mathematically, this reduces the number of possible URL combinations from ~3 x 1012 to ~2 x 109.

A few tweaks to the Python script and it was possible to harvest over 35,000 valid URLs in just a few hours.

Whilst it was quite fun to browse to each trip individually, and see what the people of the world were up to, I decided to try and visualise all this data. With our list of valid URLs, it was then possible to use API requests to harvest the information available for each of the 35,000 trips.

Each API returned (broadly!) followed the following:

{'status': 'arrived', 'last_updated': '2016-09-15T10:13:09.126014+00:00', 'region_id': 'uk-london', 'endaddress': '', 'endname': '', 'message': '', 'share_type': 'eta', 'title': None, 'eta': '2016-09-15T10:13:00+00:00', 'startname': '', 'signature': '{"duration": 544, "end": {"address": "Tudor Stacks, 1 Dorchester Dr, Herne Hill, London SE24 0DL, UK", "coords": "51.458745,-0.096573", "id": "google:ChIJhzq09XYEdkgRnJYjDWZtzsA", "name": "Tudor Stacks, 1 Dorchester Dr, Herne Hill, London SE24 0DL, UK", "source": "3"}, "kind": "cycle_personal/fastest", "legs": [{"distance": 1694, "duration": 544, "ec": "51.458573,-0.096713", "mode": "cycle", "sc": "51.468142,-0.095144"}], "region": "uk-london", "start": {"address": "Bessemer Road", "coords": "51.468135,-0.095137", "source": "1"}, "time": "2016-09-15T11:01:44+01:00/NOWISH", "version": 2}', 'startaddress': '', 'coords': [51.458855, -0.096722]}

As you can see below it is possible to harvest, en masse, starts and ends to journeys, addresses, methods of transportation and lat/long coordinates.

Plotting all the lat/long coordinates into generates the following maps.

(To any non-GIS aficionados, the easiest way I found to accomplish this was using Google Fusion tables – a tutorial can be found here https://support.google.com/fusiontables/answer/2571232).

The World:

London:

However, not all API returns were created equally. Out of the ~35,000 API returns there were: 1,706 usernames, 3,623 locations that were tagged as ‘home’ and 1,009 locations were tagged as ‘work’. Combined with some OSINT research we can start to attribute trips to ‘real people’. Take the following API response (anonymised with x’s where appropriate):

{'status': 'expired', 'last_updated': '2017-04-04T19:33:55+00:00', 'region_id': 'uk-london', 'endaddress': '', 'endname': '', 'message': '', 'share_type': 'eta', 'title': None, 'eta': '2017-04-04T20:27:00+00:00', 'startname': '', 'signature': '{"car": 18701, "duration": 3759, "end": {"address": "XXXXX, XXXXX, London E17 XXX, UK", "coords": "51.5XXXX,-0.0XXXXX", "name": "Home", "source": "5"}, "legs": [{"distance": 391, "duration": 346, "ec": "51.4XXXX,-0.1XXXX", "in_station": "0/60", "mode": "walk", "sc": "51.XXXXX,-0.1XXXXX"}, {"end": "Victoria", "mode": "transit", "route_ids": ["NationalRailSN"], "start": "BatterseaPark", "stop_count": 2, "stop_ids": ["Platform_BatterseaPark_NationalRail", "Platform_Victoria_BGeS"]}, {"distance": 0, "duration": 330, "ec": "51.4XXXX,-0.1XXXXX", "in_station": "1/330", "mode": "walk", "sc": "51.4XXXXX,-0.1XXXXX"}, {"end": "WalthamstowCentral", "mode": "transit", "route_ids": ["Victoria"], "start": "Victoria", "stop_count": 12, "stop_ids": ["Platform_Victoria_V_dN", "Platform_WalthamstowCentral_Underground"]}, {"distance": 1349, "duration": 1205, "ec": "51.5XXXXX,-0.0XXXXX", "from_exit": "WalthamstowCentral_E2903", "in_station": "2/120", "mode": "walk", "sc": "51.5XXXX,-0.0XXXXX4"}], "price_pence": 390, "region": "uk-london", "routing_request_id": "02ffc71d-daa5-4828-bea3-a31adf3c3c6e", "start": {"coords": "51.4XXXXX,-0.1XXXX", "source": "1"}, "time": "2017-04-04T20:29:04+01:00/NOWISH", "version": 2}', 'startaddress': '', 'coords': [51.4XXXX, -0.1XXXX], 'user_name': 'Chris'}

As you can see, on 04 Apr 2017, Chris took a journey at 19:33 from Battersea to his home address in E17. He walked to Victoria station before taking the Victoria line to Walthamstow.

With a bit of help from electoral records and social media we can attribute Chris to an actual human being… with actual friends and an actual job.

Arguably this journey in isolation isn’t very useful to anybody, malicious or otherwise. If I ran my Python script for a month however, there would probably be enough data to start building a pattern of life for Chris (depending on how often he uses the application). This is especially pertinent as some of the journeys that I harvested were dated from over 2 years ago. However, I couldn’t confirm whether every journey ever made on Citymapper was available with such a small dataset.

What is interesting though is that if you take an ‘end location’ and work backwards you can see which individuals have been to certain locations.

In my dataset there were 5 instances of journeys planned to visit the Eiffel Tower in Paris; the 5 people had made their way there from shopping, bars, or hotels. Not surprising.

But what if we look at somewhere less reputable; such as Amsterdam’s red light district;

We can see that a handful of people may be unaware that their trips are publicly available. If we used OSINT to research these trips and people, might we find a happily married man to blackmail?

Would Oscar’s employers be happy to know that he was taking a trip home at 04:03 on a Wednesday morning?

The Fix

I wouldn’t classify this a bug or a security flaw, per se, but there is more Citymapper can do to prevent these types of attacks from being used in the wild:

  1. To protect future URLs increase the ID complexity either by increasing the length or including uppercase and special characters.
  2. Audit your historical trips and remove the links to trips over a few days old, there would be no reason for the link to remain after a trip is complete.
  3. Remove first names or home labels from publicly facing API

Disclosure

We e-mailed Citymapper’s operations team to raise the issue and their engineering team promptly responded and fixed the issue within a week – thank you Citymapper!

  • 7th November ’17 — Research conducted
  • 9th November ’17 — Vendor notified
  • 16th November ’17 — Citymapper pushes out a patch, rendering this attack infeasible – seeking solutions to existing URLs and confidentiality issues.
  • 13th February ’18 — Article published