Who is the author "JC Shakespeare"?
Knowledge graphs are tricky beasts to create. Trying to extract semantic metadata from documents is a gargantuan task. Mix them together and you have a recipe for disaster.
While yak-shaving for my MSc, I found an interesting looking research paper authored by one JC Shakespeare.
As you can probably tell from that snippet, there is something a bit hinkey going on here. Here's the page that Google Scholar has scraped:
It's pretty easy to see what has happened here. The algorithm (whether via simple AI or complex regular expression) "knows" that a typical surname followed by a comma followed by a typical given name is almost certainly a reference.
And so "JC Shakespeare" becomes the author of a delightfully diverse set of papers.
Of course, Julius Caesar isn't the only play which gets picked up in this way:
Remember, AI is a great tool. It can be remarkably quick at drawing nearly correct conclusions from a diverse data set. When talking about AI, we usually discuss false positives and false negatives. But we also need to ask "is this the sort of mistake a human would make?"
As it happens, Google has been making this class of mistakes for a few years:
Andy Mabbett said on twitter.com:
A cautionary tale for the @WikiCite / citation metadata crowd #WikiCite
Leonie Löwenherz 🇪🇺🏳️⚧️ said on twitter.com:
Wusste gar nicht, dass #Shakespeare so viele wissenschaftliche Arbeiten veröffentlicht hat. Krasser Typ. #KI
HackerNewsTop10 said on twitter.com:
Who is the author “JC Shakespeare”? Link: shkspr.mobi/blog/2022/08/w… Comments: news.ycombinator.com/item?id=327346…
More comments on Mastodon.