Doesn't github already make majority of this data public? With the exception of emails... Previously they used gravatar hashes which allowed email brute forcing (since removed). Then they also were not omitting emails from commit messages.
It's like DMV records being public. You used to have to go there, or phone, and you were only looking for information about a specific person.
But if it's available on the internet, it becomes a different thing entirely. Rather than being track-downable for specific purposes, you become harvestable for mass purposes.
Merely being known is sometimes the first step in being victimized, and these kinds of things make being known easier and more frequent.
> You used to have to go there, or phone, and you were only looking for information about a specific person.
Exactly. If $INTERNET_DATA_HARVESTER had to pay someone to watch whenever it wanted to see what I did on the internet, it would cost them a lot of money, so they would only watch if they had a reasonable suspicion of profit. Even at starvation wages of $2/day, that's $700/yr/person, which would bankrupt the major data harvesters. Even if Facebook only paid each starving person to stumble over to the DMV one day a year, it would cost them $42 billion to monitor their users.
I used a recruiting tool about 2 years ago that crawled GitHub & other sources, building a profile which including email addresses. One developer's profile didn't have an email address, which I thought was odd, as the email would likely be in a git repo. Sure enough, it was.
As I write this, it occurs to me that there's not a foolproof way to verify an email in the developer's GitHub repos belongs to the same developer. GitHub user janedoe may or may not be the same person that made a commit w/ the email address jane@doe.com.
https://cloud.google.com/bigquery/public-data/github