Who is the author "JC Shakespeare"?
Knowledge graphs are tricky beasts to create. Trying to extract semantic metadata from documents is a gargantuan task. Mix them together and you have a recipe for disaster.
While yak-shaving for my MSc, I found an interesting looking research paper authored by one JC Shakespeare.
As you can probably tell from that snippet, there is something a bit hinkey going on here. Here's the page that Google Scholar has scraped:
It's pretty easy to see what has happened here. The algorithm (whether via simple AI or complex regular expression) "knows" that a typical surname followed by a comma followed by a typical given name is almost certainly a reference.
And so "JC Shakespeare" becomes the author of a delightfully diverse set of papers.
Of course, Julius Caesar isn't the only play which gets picked up in this way:
Remember, AI is a great tool. It can be remarkably quick at drawing nearly correct conclusions from a diverse data set. When talking about AI, we usually discuss false positives and false negatives. But we also need to ask "is this the sort of mistake a human would make?"
As it happens, Google has been making this class of mistakes for a few years:
Google Scholar has parsed this cafeteria lunch menu as an author list, and it's delightful pic.twitter.com/jobE6Z7bpI
— Alex Klotz (@AlexanderRKlotz) April 21, 2020
Andy Mabbett says:
A cautionary tale for the @WikiCite / citation metadata crowd #WikiCite
Leonie Löwenherz 🇪🇺🏳️⚧️ says:
Wusste gar nicht, dass #Shakespeare so viele wissenschaftliche Arbeiten veröffentlicht hat.
Krasser Typ.
#KI
HackerNewsTop10 says:
Who is the author “JC Shakespeare”?
Link: shkspr.mobi/blog/2022/08/w…
Comments: news.ycombinator.com/item?id=327346…
gerikson says:
This Article was mentioned on lobste.rs
More comments on Mastodon.