Who is the author "JC Shakespeare"?

Knowledge graphs are tricky beasts to create. Trying to extract semantic metadata from documents is a gargantuan task. Mix them together and you have a recipe for disaster.

While yak-shaving for my MSc, I found an interesting looking research paper authored by one JC Shakespeare.

Screenshot of Google Scholar result for "Tech Media Corruption in the Age of Information by JC Shakespeare".

As you can probably tell from that snippet, there is something a bit hinkey going on here. Here's the page that Google Scholar has scraped:

Screenshot showing a quote which ends with "Shakespeare comma Julius Caesar".

It's pretty easy to see what has happened here. The algorithm (whether via simple AI or complex regular expression) "knows" that a typical surname followed by a comma followed by a typical given name is almost certainly a reference.

And so "JC Shakespeare" becomes the author of a delightfully diverse set of papers.

Screenshot of Google Scholar results. Shakespeare has, apparently, written about law, technology, wine, and an article in German.

Of course, Julius Caesar isn't the only play which gets picked up in this way:

Narrative integrity Autobiographical identity by RIl Shakespeare.

Narrative integrity Autobiographical identity and the meaning of the "good life" Mark Freeman and Jens Brockmeier How sour sweet music is When time is broke, and no proportion kept! So is it in the music of men's lives. Shakespeare, Richard Il.

Apprenticeship HVI Shakespeare - The American Dreams.

CHAPTER 2 Apprenticeship Be merry Peter, and feare not thy Master. Fight for the credit of the Prentices. -Shakespeare, Henry VI.

Postmodern Color A Shakespeare - Spirits Hovering Over the Ashes.

Postmodern Color Ye white-lim'd walls! Ye alehouse painted signs! Coal-black is better than another hue, In that it scorns to bear another hue; For all the water in the ocean Can never turn the swan's black legs to white, Although she lave them hourly in the flood. -Shakespeare, Titus Andronicus.

Remember, AI is a great tool. It can be remarkably quick at drawing nearly correct conclusions from a diverse data set. When talking about AI, we usually discuss false positives and false negatives. But we also need to ask "is this the sort of mistake a human would make?"

As it happens, Google has been making this class of mistakes for a few years:

