Sunday, January 05, 2014

The Snapchat Leak

This was first published on the O'Reilly Radar
The number of Snapchat users by area code.
The number of Snapchat users by geographic location. Users are predominately located in New York, San Francisco and the surrounding greater New York and Bay Areas. 
While the site crumbled quickly under the weight of so many people trying to get to the leaked data—and has now been suspended—there isn't really such a thing as putting the genie back in the bottle on the Internet. Just before Christmas the Australian based Gibson Security published a report highlighting two exploits in the Snapchat API claiming that hackers could easily gain access to users’ personal data. Snapchat dismissed the report, responding that,

Theoretically, if someone were able to upload a huge set of phone numbers, like every number in an area code, or every possible number in the U.S., they could create a database of the results and match usernames to phone numbers that way.

Adding that they had various "safeguards" in place to make it difficult to do that. However it seems likely that—despite being explicitly mentioned in the initial report four months previously—none of these safeguards included rate limiting requests to their server, because someone seems to have taken them up on their offer.

Data Release

Earlier today the creators of the now defunct SnapchatDB site released 4.6 million records—both as an SQL dump and as a CSV file. With an estimated 8 million users (May, 2013) of the app this represents around half the Snapchat user base. Each record consists of a Snapchat user name, a geographical location for the user, and partially anonymised phone number—the last two digits of the phone number having been obscured. While Gibson Security's find_friends exploit has been patched by Snapchat, minor variations on the exploit are reported to still function, and if this data did come from the exploit—or a minor variation on it—uncovered by Gibson, then the dataset published by SnapchatDB is only part of the data the hackers now hold. In addition to the data already released they would have the full phone number of each user, and as well as the user name they should also have the—perhaps more revealing—screen name.

Data Analysis

Taking an initial look at the data, there are no international numbers in the leaked database. All entries are US numbers, with the bulk of the users from—as you might expect—the greater New York, San Francisco and Bay areas. However I'd assume that the absence of international numbers is  an indication of laziness rather than due to any technical limitation. For US based hackers it would be easy to iterate rapidly through the fairly predictable US number space, while "foreign" numbers formats might present more of a challenge when writing a script to exploit the hole in Snapchat's security. Only 76 of the 322 area codes in the United States appear in the leaked database, alongside another two Canadian area codes, mapping to 67 discrete geographic locations—although not all the area codes and locations match suggesting that perhaps the locations aren't derived directly from the area code data. Despite some initial scepticism about the provenance of the data I've confirmed that this is a real data set. A quick trawl through the data has got multiple hits amongst my own friend group, including some I didn't know were on Snapchat—sorry guys. Since the last two digits were obscured in the leaked dataset the partial phone number string might—and frequently does—generate multiple matches amongst the 4.6 million records against a comparison number. I compared the several hundred US phone numbers amongst my own contacts against the database—you might want to do that yourself—and generated several spurious hits where the returned user names didn't really seem to map in any way to my contact. That said, as I already mentioned, I found several of my own friends amongst the leaked records, although I only knew it was them for sure because I knew both their phone number and typical choices of user names.

Conclusions

As it stands therefore this data release is not—yet—critical, although it is certainly concerning, and for some individuals it might well be unfortunate. However if the SnapchatDB creators choose to release their full dataset things might well get a lot more interesting. If the full data set was released to the public, or obtained by a malicious third party, then the username, geographic location, phone number, and screen name—which might, for a lot of people, be their actual full name—would be available. This eventuality would be bad enough. However taking this data and cross-correlating it with another large corpus of data, say from Twitter or Gravatar, by trying to find matching user or real names on those services—people tend to reuse usernames on multiple services after all—you might end up with a much larger aggregated data set including email addresses, photographs, and personal information. While there would be enough false positives—if matching solely against user names—that you'd have a interesting data cleaning task afterwards, it wouldn't be impossible. Possibly not even that difficult. I'm not interested in doing that correlation myself. But others will.

17 comments:

  1. Great work man i would like to congratulate you on this effort http://mrleakdetection.com/

    ReplyDelete
  2. It’s an interesting approach. I commonly see unexceptional views on the subject but yours it’s written in a pretty unusual fashion. Surely, I will revisit your website for additional information. snapscore kopen

    ReplyDelete
  3. When your website or blog goes live for the first time, it is exciting. That is until you realize no one but you and your.Recover my hacked snapchat account

    ReplyDelete
  4. I am impressed by the information that you have on this blog. It shows how well you understand this subject.
    data analytics course
    data science course
    big data course in malaysia
    360DigiTMG
    big data analytics training in malaysia

    ReplyDelete
  5. Excellent post.I want to thank you for this informative read, I really appreciate sharing this great post.Keep up your work
    data science course
    360DigiTMG

    ReplyDelete
  6. can you want to remove your account SO READ THIS ARTICLE IT,S HELP you alot
    Delete Snapchat Account

    ReplyDelete
  7. Here it about precision of comprehension and to be inventive. smm panel

    ReplyDelete
  8. Great job! I would like to say that this is a well-written article as we are seen here. This article is very useful and I got so much information about it. Thanks for sharing this article here. can you make money on Snapchat

    ReplyDelete
  9. I will truly value the essayist's decision for picking this magnificent article fitting to my matter.Here is profound depiction about the article matter which helped me more.
    data analytics course

    ReplyDelete
  10. I think I have never watched such online diaries ever that has absolute things with all nuances which I need. So thoughtfully update this ever for us.
    difference between analysis and analytics

    ReplyDelete
  11. I see some amazingly important and kept up to length of your strength searching for in your on the site
    data science course

    ReplyDelete
  12. Somebody Sometimes with visits your blog normally and prescribed it as far as I can tell to peruse too.
    iot training in delhi

    ReplyDelete
  13. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing.Best data analytics course in Hyderabad

    ReplyDelete
  14. I see some amazingly important and kept up to length of your strength searching for in your on the site
    hrdf contribution

    ReplyDelete
  15. Snapchat is an engaging and fun way to remain in touch friends by look at this website sharing countless crazy images and videos. The images can vary either being taken at school, at the mall or even the bathroom.

    ReplyDelete