Harvesting phone numbers and email addresses from GitHub


Code-sharing site GitHub automatically sends email notifications to users. If you've commented on an issue, you'll get an email each time there's an update. That's pretty handy.

It also allows users to reply by email. The reply is then automatically posted in the issue thread. Also handy. But a little dangerous.

Lots of people have email signatures which contain personal details. When these people reply to a GitHub notification they may unwillingly share their contact details in public.

GitHub does a reasonable job of hiding this from the casual observer. When it detects the signature block, it hides it behind a button.

A GitHub issue. There is an "overflow" button onscreen.

What happens if you click on that overflow button?

A user's email signature - the phone number has been blurred out.

I've manually redacted the user's details.

Finding Numbers

A quick search for common email signatures like "This email and any attachments are confidential" reveals thousands of issues like this. Many have phone, fax, and postal details of the user.

Searching GitHub issues for "This email and any attachments" & "+44(0)" gives a hundred or so results - mostly from UK individuals replying to emails.

Similar searches for other formats of phone numbers reveal thousands of potential accidental leaks.

What should GitHub do?

GitHub can obviously detect when a reply is an email - and includes an icon on the issues screen to show the user.

A comment on GitHub, there's an email icon next to the user name.

The signature detection isn't perfect - in the above example it hasn't recognised it.

GitHub could proactively scan incoming emails and offer to redact personal info from them.

Repository owners can also edit comments to make sure that personal data hasn't accidentally leaked.

This isn't just an issue of personal responsibility. When you receive an email, it is natural to reply. You may not even realise you have a corporately mandated signature. You may not know how to remove it. There's nothing in the email that suggests replying will post your details in public.

Yes, people should take a little more care - but I think GitHub should do a better job of detecting and deleting personal information.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

2 thoughts on “Harvesting phone numbers and email addresses from GitHub”

  1. Mike says:

    I blame Microsoft. Not just because they own GitHub, but because no Microsoft mail client I've ever encountered has understood the concept of signatures being delimited with "-- \n".

    Why do people put their email address in their email signature?

    Reply
  2. says:

    Is the signature detection not just looking for the 'standard' dash-dash-space-carriage-return-line-feed' combo which delineates the body from the signature? This age-old standard is well known in tech circles, but obscure to anyone else. Fixing email is hard, but I wonder if there's some commonality between those examples which aren't detected? Maybe they're all using some home-grown mail client or mis-configured webmail system? Fixing those might be easier than fixing every bug tracker.

    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">