Blog

Using NYC Taxi Data to identify Muslim taxi drivers

21 Jan , 2015  | by:

Remember that NYC Taxi data set that allowed you to see who visited a gentlemen’s clubs and which celebrity took a taxi where? Reddit user uluman now seems to have found a way to distinguish Muslim taxi drivers from the set. He explains how:

Since Islam instructs followers to pray 5x daily at specific times, I wondered if one could identify devout Muslim hacks solely from their trip data. For drivers that do pray regularly, there are surely difficulties finding a place to park, wash up and pray at the exact time, but in many cases banding near prayer times is quite clear. I plotted a few examples.
Each image shows fares for one cabbie in 2013. Yellow=active fare (carrying passengers). A minute is 1 pixel wide; a day is 2 pixels tall. Blue stripes indicate the 5 daily prayer start times which vary with the sun’s position throughout the year.

  • Taxi data: http://www.andresmh.com/nyctaxitrips/
  • Prayer times: http://www.islamicfinder.org/prayerDetail.php?city=New%20York&state=NY&country=usa&lang=english
  • Tools: Python / Python Imaging Library

The result is an eerie prediction of the religion (and devoutness) of a cab driver. Not everyone is convinced, as is evidenced from the Reddit thread

(In)activity as sensitive personal data?

This data plotting brings up some interesting legal questions, especially from an EU perspective. Under the EU Data Protection Directive, the processing of personal data is subject to certain restrictions. For a special category of data considered sensitive, the regime is even stricter as the default rule is that such processing is prohibited (Art. 8 Directive). This category of special data includes, ‘personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, and the processing of data concerning health or sex life’ (Art. 8 Directive, emphasis added). A question that comes to mind when looking at this data plotting is: if you can deduce someone’s religion by their (in)activity at certain times of the day, such as around prayer times, is that data then sensitive personal data?

Whatever the answer may be, it is clear that those releasing data sets should be careful when it includes data on the (in)activity of people. Perhaps, this is something that providers of open data and companies like Uber can take into account, seeing as the latter has plans to share data with the city of Boston.


 

By: , PhD-researcher at Maastricht University, the Netherlands.

 

, , , ,


6 Responses

  1. Gilbert says:

    “If you can deduce someone’s religion by their (in)activity at certain times of the day, such as around prayer times, is that data then sensitive personal data?”

    Are you investigating this question or are you trying to answer that question through court proceedings?

    Reply
  2. Tomasz says:

    “If you can deduce someone’s religion by their (in)activity at certain times of the day, such as around prayer times, is that data then sensitive personal data?”

    The source data is not. The deduced data is.

    And don’t get me wrong — using the movement data to e.g. to optimize ride scheduling is still fine. Using it to profile religious preferences (and storing the results) is no longer OK as soon as it is down to the identifiable individual.

    Reply
    • Anna Berlee says:

      “The source data is not. The deduced data is.”

      Well, both could perhaps be considered personal data. At least, if you look at it from a legal perspective. Under the EU Data Protection Directive, even data that cannot be directly linked to individuals can be personal data. Namely when it is indirectly identifiable.

      On this issue, the Data Protection Directive reads: “to determine whether a person is identifiable, account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person.” Although the Directive also says that it does not apply “to data rendered anonymous in such a way that the data subject is no longer identifiable”. That this data set was badly anonymized is clear (see Vijay Pandurangan), and hence it might still be identifiable.

      Therefore, I think even the source data itself, might also be personal data. For the EU such a conclusion might not be the stretch it seems.

      ———
      For reference see:
      Post by Vijay Pandurangan: http://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/

      On difference EU and US privacy approaches, for example:
      (1) The EU-U.S. Privacy Collision: A Turn to Institutions and Procedures by Schwartz on SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2290261 , and
      (2) Schwartz & Solove’s ‘Reconciling Personal Information in the United States and European Union’ on SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2271442

      On personal data in the EU and definitions;
      Opinion Article 29 Working Party on Personal Data: [PDF] http://ec.europa.eu/justice/policies/privacy/docs/wpdocs/2007/wp136_en.pdf

      Reply
  3. Joly MacFie says:

    Vijay Pandurangan discussed this issue in a presentation to BetaNYC back in July 2014

    Reply
  4. Sebastian says:

    According to the CJEU even a picture of a person wearing a plaster may indicate a broken wrist, hence be considered sensitive personal data [1] So careful next time you post a picture on FB… Time to adapt to the digital reality we live in? [1] C-101/01 Criminal Proceeding against Bodil Lindqvist (6 November 2003).

    Reply

Leave a Reply

Comments RSS Feed