Harvesting phone numbers and email addresses from GitHub
Code-sharing site GitHub automatically sends email notifications to users. If you've commented on an issue, you'll get an email each time there's an update. That's pretty handy.
It also allows users to reply by email. The reply is then automatically posted in the issue thread. Also handy. But a little dangerous.
Lots of people have email signatures which contain personal details. When these people reply to a GitHub notification they may unwillingly share their contact details in public.
GitHub does a reasonable job of hiding this from the casual observer. When it detects the signature block, it hides it behind a button.
What happens if you click on that overflow button?
I've manually redacted the user's details.
Finding Numbers
A quick search for common email signatures like "This email and any attachments are confidential" reveals thousands of issues like this. Many have phone, fax, and postal details of the user.
Searching GitHub issues for "This email and any attachments" & "+44(0)"
gives a hundred or so results - mostly from UK individuals replying to emails.
Similar searches for other formats of phone numbers reveal thousands of potential accidental leaks.
What should GitHub do?
GitHub can obviously detect when a reply is an email - and includes an icon on the issues screen to show the user.
The signature detection isn't perfect - in the above example it hasn't recognised it.
GitHub could proactively scan incoming emails and offer to redact personal info from them.
Repository owners can also edit comments to make sure that personal data hasn't accidentally leaked.
This isn't just an issue of personal responsibility. When you receive an email, it is natural to reply. You may not even realise you have a corporately mandated signature. You may not know how to remove it. There's nothing in the email that suggests replying will post your details in public.
Yes, people should take a little more care - but I think GitHub should do a better job of detecting and deleting personal information.
Mike says:
I blame Microsoft. Not just because they own GitHub, but because no Microsoft mail client I've ever encountered has understood the concept of signatures being delimited with "-- \n".
Why do people put their email address in their email signature?
Alan Pope says:
Is the signature detection not just looking for the 'standard' dash-dash-space-carriage-return-line-feed' combo which delineates the body from the signature? This age-old standard is well known in tech circles, but obscure to anyone else. Fixing email is hard, but I wonder if there's some commonality between those examples which aren't detected? Maybe they're all using some home-grown mail client or mis-configured webmail system? Fixing those might be easier than fixing every bug tracker.