Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
8M GitHub profiles were leaked from GeekedIn's MongoDB (troyhunt.com)
184 points by jashkenas on Nov 17, 2016 | hide | past | favorite | 84 comments


I'm one of the affected users.

My initial reaction was that I had absolutely no idea that this site even existed. While this is publicly facing data, it might explain some situations that I experience, such as people sending me generic recruitment emails to my old email address (the one leaked here, that hasn't been on my GitHub page for months now) filled with the name of the GitHub repository with the most stars and telling me how they liked my code in that repository (even though the most popular one on my profile is not software-related at all).

Mini ask HN to those whose email address is public on GitHub: How frequently do you get these kinds of recruitment emails?


Maybe one a month. Usually "We saw your profile on GitHub which matches <our vague search criteria> and would you be interested in <hackathon/conference/somesuchthing>?".

To hijack your comment for my own follow-up question. I'm in the EU (Netherlands). Many of these mails are clearly mass E-Mails with just my name as a template sent by some US-based company, they'll include an unsubscribe link for future E-Mails, but aren't unsolicited mass-marketing E-Mails like these illegal in the EU, and if so would it be worthwhile reporting them, and who to?


Flag as spam with your email provider would be a good start.


If they don't follow the CAN-SPAM regulations, they're illegal in the US as well.


I once cloned a .NET repo to my Github and ever since I keep getting emails from recruiters about .NET job opportunities despite not having .NET (and things related to .NET) listed as one of my skills anywhere else


Heh. I made a PR to a haskell project with lots of stars (pandoc) and since that I've gotten a few mails about Haskell opportunities.

Even though a simple two character change to pandoc is my only Haskell experience...


My email is public on my GitHub profile (it's the same as my git author email, which is also public) for the last few years, and I don't have a huge problem with spam. I get a couple emails occasionally (a few every month), and they're easy to say "no thanks" to.

What made the most positive difference was deleting my LinkedIn account, I was getting way more spam there (via its own messaging system, not via email) that was very unpleasant.


Well I'm puzzled because my email is not public on my Github profile and I don't recall it ever was :S


Maybe your profile is private, your commits aren't. Your email in the commits can be scanned easy as well.


I thought so, but then they would have other emails too, and I don't see why they'd tag it as "primary" if they'd gone those lengths..


>> How frequently do you get these kinds of recruitment emails?

I'm actively contributing to popular projects, have more than a dozen of followers, and my email is public but I have never received any recruitment emails. It's a pity.


I have ~450 contributions in the last year, ~100 followers, my email address is public, and I have two mildly popular repositories (with a couple of hundred stars).

I wish they were useful, but most of them are automated and when you follow up, you usually get greeted by an email stating that they're looking for someone with completely different background than you have.

You might be lucky not to receive them. I feel like, if I were looking for a job, I would have more luck with "Who wants to be hired" threads posted here on a monthly basis.


I don't understand how you can use that thread unless you're currently unemployed. Is the hope just that no one else at your company reads it?


> ...if I were looking for a job...

I didn't use it, and probably won't any time soon, but once I start looking for a job, I feel like my chances are much higher with that thread than with the emails I get from people who purchased the services this company is offering.


I get that, but how do you prevent your current company from seeing the post and identifying you?


Well, luckily, my contract does have a clause stating that any party needs to let the other party know in advance that they are leaving the position / getting fired, so I will not have a problem with that.


Maybe once every 2 or 3 months I get an email from a recruiter, and it gets immediately marked as spam.

Far more frequently though, I get emails from people asking interesting questions either about one of my projects or about something they're doing that is similar to one of my projects. I consider the (very low) cost of recruiter spam to be a price worth paying to enable those other interactions.


If you haven't already, change the email to some "noreply" address in the commits you author and add that to your GitHub profile so they can be attributed to you. I've built a similar scraper for recruiters as a source for talent leads when I used to work at a big company.


I don't see why people have such a huge issue with email being in public commits. I maintain a large project (runC) and sign all of my commit with my company email -- which makes noreply.github.whatever not a viable option. Not to mention that sometimes I get useful emails rather than recruiter spam, and I got my current job from using my email in commits (my employer approached me).

But for those who care, the best solution would be if GitHub didn't show emails in their webui or APIs -- you have to clone the repo to get the information. That makes it prohibitively expensive to scrape every repo on GitHub (so you'd have to narrow down what repos you're interested in).


I don't think I've ever received a recruitment email due to my email being public on GitHub. They probably get caught in the spam filter.

Also, people using GitHub should know better than to call this a leak. If it was public on GitHub, it's public on the internet.


I created a (semi) popular application using C++ & Lua. I used to get automated recruitment mails as a result of that every few weeks.

But then I created a github organization and moved the repositories beneath that instead of under my own login - and at that point the mails ceased, immediately.

I wonder if the crawlers just assumed "Organisation == Company". It was an unexpected benefit anyway.

The only downside is those online sites where they rank your popularity/skills in different languages now only consider my personal repositories and I'm not longer in the top 5% of C++ coders. I can live with that!


Fairly regularly. On the address leaked too.


Maybe 5-6 a month? Sometimes a bit more. (Excluding one company who would spam me all the time but stopped.)

But my resume is public, my email is my screenname@gmail so it's not like anyone has a hard time guessing who I am.

I also recently graduated from Berkeley - so I have no doubt that with a simple search things could rank me as being possibly looking for a job. Then again, there's no way in hell I'm a "Senior Blah Blah", so I would guess a decent amount of the emails I get have been from scraping. (I always gave real recruiters a better email address.)


Regularly to the public email listed, I just set mine to private, didn't even realize it was public...


Up to 5 times per month here.


From GeekedIn's announcement [0]:

    It is a site that crawls open-source code hosting sites (e.g. github and bitbucket) 
    and creates profiles of open-source projects and open-source developers.
    Those profiles include things like technologies used by a developer on open source
    projects (e.g. Scala/Java, .NET, Clojure, Python, etc), libraries and frameworks
    (e.g. Hibernate, Spring, JQuery, bootstraps, etc.), on many cases locations of
    the developer, and even when a developer used a particular
    technology (for Open Source projects)

[0] "I'm creating www.geekedin.net" https://www.linkedin.com/pulse/im-creating-wwwgeekedinnet-er...


If they're based in Spain, isn't this a clear breach of EU data protection legislation?

Is it even possible for the service to comply with the data protection directive? It's necessary to obtain consent before processing personal data, but that's not the case here.

https://en.wikipedia.org/wiki/Data_Protection_Directive


Boy, I gotta say I was really nervous when I saw "Have I been pwned" in my inbox. Took a while to convince myself that it's just my public-facing Github profile.


My email address has already been breached on three different services. The previous two were sites that I've used once and never again, this one I obviously never heard of until today, when the breach happened.

Even though I use a password manager and unique passwords, I still get the shills whenever I see an email from the HIBP domain.

It's just something that makes me feel uncomfortable, even though I know that the breach is specific to a site that I have never used frequently and doesn't affect any of my other profiles.


Then imagine how hard my heart dropped when I was taking a summer nap and suddenly Google notified me (through my Android phone) that some suspicious login activity was taking place in my main mail account. The panic!

Yes, I reused my Gmail password in $pwnd_service. Bad idea.

Fortunately Google managed to detect the unusual activity and lock them out, but maybe others aren't that lucky. That event led me to finally use a password manager and stop reusing passwords everywhere.

What an afternoon, trying to change password from all services I could remember. Can you remember all services you signed up for with your email? Cause I definitely didn't.

Quite a humbling experience.


> Can you remember all services you signed up for with your email? Cause I definitely didn't.

You might also want to check your browser settings to see which domains have saved cookies, saved passwords, or the "never save passwords on this domain" flag.


No kidding, definitely an "oh shit" email.


Init protocol 0: wipe everything, airgap with tails


[ Link to clip from Mr. Robot: "I need to wipe everything." ]


Init protocol -1: burn drives, start PDP-11/40, await further instructions

You don't want to know about protocol -2...


Prepare three envelopes...


How is this notable? It's just public data scraped from GitHub. I wouldn't care even if they intentionally redistributed this database to everyone.

The more in-depth metrics are also things you can scrape from GitHub. If you aren't comfortable with these metrics being calculated about your public actions, then you probably shouldn't have signed up for a "social coding" site.


I agree. Redistribution of public knowledge can hardly be described as a "leak".

I received an email from haveibeenpwned.com about this, but I don't see how this makes sense, given that no information was revealed that users hadn't consented to reveal.


Doesn't github already make majority of this data public? With the exception of emails... Previously they used gravatar hashes which allowed email brute forcing (since removed). Then they also were not omitting emails from commit messages.

https://cloud.google.com/bigquery/public-data/github


It's like DMV records being public. You used to have to go there, or phone, and you were only looking for information about a specific person.

But if it's available on the internet, it becomes a different thing entirely. Rather than being track-downable for specific purposes, you become harvestable for mass purposes.

Merely being known is sometimes the first step in being victimized, and these kinds of things make being known easier and more frequent.


> You used to have to go there, or phone, and you were only looking for information about a specific person.

Exactly. If $INTERNET_DATA_HARVESTER had to pay someone to watch whenever it wanted to see what I did on the internet, it would cost them a lot of money, so they would only watch if they had a reasonable suspicion of profit. Even at starvation wages of $2/day, that's $700/yr/person, which would bankrupt the major data harvesters. Even if Facebook only paid each starving person to stumble over to the DMV one day a year, it would cost them $42 billion to monitor their users.


I used a recruiting tool about 2 years ago that crawled GitHub & other sources, building a profile which including email addresses. One developer's profile didn't have an email address, which I thought was odd, as the email would likely be in a git repo. Sure enough, it was.

As I write this, it occurs to me that there's not a foolproof way to verify an email in the developer's GitHub repos belongs to the same developer. GitHub user janedoe may or may not be the same person that made a commit w/ the email address jane@doe.com.


The email verification was easy when they provided gravatar ids, which used to just be MD5(email).


Two days ago I received an email from a recruiter at http://sourced.tech/ mentioning they had analyzed my open source contributions and they had a position that matched me. It looks like source{d} brags about scraping personal data from sites for recruitment purposes.

The problem is the email they used is not publicly facing in my Github account, but it _is_ the login email that I use for Github. I never got the "Have I Been Pwned" email, but this is very concerning to me, I keep this email very private.

Coincidence?


Are you sure none of the commits on any of your repos accidentally have the author email as the private one you login with?


I was about to say the same thing. For example if you go to someones GitHub profile and they have a public e-mail listed but you then go their repositories, filter by "source" (repos created by them, not forks) and you go to the oldest one they have of that kind, git clone that and then

     git shortlog -s -e
You might find that they were using their school e-mail back then for example.

Also if you clone all their repositories and everything else they have contributed to you might find an occasional commit in which they accidentally used for example the e-mail they have at work.


You can just retrieve the Git patch directly from the commit page (add .patch to the URL), which has all the metadata of the commit, including author info.


Since you brought them up, I got an e-mail from them recently, too. It was a huge ROTFL for me.

They boast on their FAQ page that they are "different from a recruiter", and the contact person wrote "We have analyzed your open source contributions on Github and think that you could be a good fit for [web backend programmer]". The thing is, they sent me an e-mail to the address that is present in just one repository and has nothing to do with web at all.

Then, when I asked what repositories have they looked at, what caught their eye, and why do they think I would like the position, the contact person sent me a lengthy e-mail about how their recruitment process looks like, not addressing my questions at all.

All in all, they just sent me some e-mail templates filled after trivial keyword matching, which I find funny, because they claim they are so much better than "regular recruiters".

Though, to be honest, they did give the offer details upfront, without me asking about that.


I worked at sourced.tech

Back then we crawled all public github repos and took a look at all the commits so any email you've used for making commits with git and pushed to GitHub is public.

The analysis is somewhat legit. The email is ofc automated but it was quite high quality. I left on March this year so things may have changed an awful lot.

EDIT: By high quality I mean they give you all the possible details of the offer. Again, this might have changed.

If you are unhappy with them feel free to rant on Twitter mentioning them and they'll get back to you very quickly.


So it's just public info you can already get from Github?


Well, kinda, but based on the examples given, the data also includes assessments and assumptions about you presumably based on the aggregated information that is available.

They appear to be building recruiting profiles of github users based on their public GH profile and commit info.


But you just said they are all public data, it just happens someone spent the time to build a profile more "digestible". So in the end, really, this is public data.


I looked at my data. It's all public data anyway. I'm not surprised or concerned. Perhaps I'm missing something obvious?


Yeah not sure if this is dangerous in any way? Maybe this will allow me to get more stars on my Github profile.


For anyone who is not aware, Have I been pwned[0] is a great way to keep tabs on data breaches involving your data. You just give them your email and you'll be notified if it shows up in data dumps from any major breach.

[0] https://haveibeenpwned.com/


I just signed up to HIBP because of this.

But I'm still nervous about it, because now I'm known by another party, and they know where I've been compromised and with what email. HIBP is probably good people, but they can be breached too. It's why it's taken me this long to convince myself to try it.


All of HIBP's data is breaches that are already floating around the web, I believe.


And here they all are in one convenient place.


I'm responding to your worry about HIBP being breached–there is no new data to leak.


First breach for my current email. There goes my clean track-record, although I guess this one doesn't count that heavily.

_Edit_: Just checked my leaked info. Either Troy's service or geekedin didn't scrape the data properly, because it shows up my very tiny secondary account I'd made long back for some github specific testing (that required a second account)


I've long subscribed to HIBP, so I got the relevant email about this today; however, when I check on the site, I don't get the "raw geekedin data" button in Safari nor in Firefox. What's up?

EDIT: I basically re-subscribed, and once I re-verified the address, I got the button.


Grr, i'm both affected bu this and the recent Donate Blood leak, is there anything I should be doing to increase my security to protect against these kinds of leaks?

Or, even protect things like my Bank Accounts, Utility accounts etc. against spear phishing attacks?


http://i.imgur.com/fWT8usT.png

I really should have abused the github email bug further to do something more fun with scrapers.


I was a little surprised to see mr Hunt publishing the IPs of the Mongo servers. That's not something he usually does.


https://api.github.com/users/1 This isn't hard to do. All they did was iterate the user ids from 1-n. It is the Rails way to assign ids incrementally.


This isn't correct.

The users/1 doesn't take you to the user with id=1, it takes you to the user with username=1, which is very different. For example you can see my github profile on this api by doing

https://api.github.com/users/cvoege

The link you gave there is to the person with the username 1.

Second of all you can't see my email since it's not publicly available on github, and most people's aren't as far as I know.

Third of all incremental integer ids isn't the "rails way", this would be decided by the database and has nothing to do really with rails or whatever framework you use.


That's incorrect, you're looking at user "1" which is here: https://github.com/1

The "created_at": "2012-06-07T06:10:07Z" also gives it away.

i.e., my profile is here: https://api.github.com/users/diziet

Also, numerical profile ids in URLs are more of a SQL thing rather than rails.


I just found out that GitHub actually publicly publishes everything I've done on their site. That's a bit creepy. https://api.github.com/users/cyphar/events


> It is the Rails way to assign ids incrementally.

Well... their databases way. It's not 'the rails way'.


It's also the Rails way unless things have changed since I last used Rails.


Can geekedin help us hire developers that have some security awareness? Maybe make a site that list those that work for departments that get breached in the future and there will be some accountability for the first time.


my leak says I only have 2 years of experience?!?! puleaaze! how do I correct them?


mine says 47 years. I AM NOT THAT OLD ;p


It seems like GeekIn is straightforward web-scraping by a greedy creep. This is why I just create a new throwaway account whenever I need to deal with Github in a logged-in way. Thank God for Mailinator.


Interestingly, I don't have my GH location set, and so their data defaults to ZA, and my location to 28.16256, -23.41612. Weird they couldn't just leave those values blank.


You can find some github profiles with old information in the waybackmachine. Not troyhunt (unfortunately as that would have been nice to show off my point!) but I found some others.


Demystified the Recruiter spam - "We've analyzed your GitHub contributions". Thanks Troy or else I would have kept wondering about these spams!


A friend of mine's github profile returned a 404 for a while today. I'm thinking it wasn't a coincidence.


I was wondering why I started getting recruitment emails. I'm glad I changed my email address on github recently.


[flagged]


I downvoted you because the point of your message is unclear and useless.


It's a message for the parent commenter. I realize that it is pretty useless to anyone else and for that I apologize.


The "data trading scene" referenced by the author. What/where is this? Dark web?


Public data LEAKED!!!


Yeah I agree. Full of shit. Seems like a gorilla marketing tactic for have I been pawned. I definitely feel as though i have been pawned now.

The data is not hacked and therefore you are legally entitled to expose via your HIPD service. When I mentioned that the IPs were exposed they were taken down in less than a couple of minutes. Yet 18 minutes after you replied saying that you contacted "someone" to take them down. Clever way to get on the front page of hacker news. Right out of the politics of fear playbook :) Have we been pwned?? Yes we have.

Traffic whore.

"One of the key projects I'm involved in today is Have I been pwned? (HIBP), a free service that aggregates data breaches and helps people establish if they've been impacted by malicious activity on the web. As well as being a useful service for the community, HIBP has given me an avenue to ship code that runs at scale on Microsoft's Azure cloud platform, one of the best ways we have of standing up services on the web today."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: