As part of the Shakespeare Hackday I attended a few weeks ago, we discussed some interesting analysis which can be done on the text. Certain forms of analysis are hampered due to the archaic and inconsistent spelling. I wondered if that could be mined for anything interesting.
For example, in modern UK English we use the word "honour". In modern US English, it loses the "u" to become "honor". So, how was it spelt in Shakespeare's day?
I downloaded the XML representation of all the plays from the Bodleian.
To count all the occurrences of "honour" it's simply a case of running
grep -ci "honour" *.xml
. This can be repeated for "honor".
There's no clear consensus! Two of the most popular plays - Midsummer and Romeo & Juliet - exclusively use "honour" - but the rest of the plays are a general mish-mash.
Spelling, back in 17th Century was... flexible to say the least! As the pages were typeset, the individuals working on them would adjust the letters on the page to fit with their own preferred method of spelling, or to make words fit on a line, or because that's what they were told to do, or because there were no more printing blocks of that letter available, or they made mistakes.
And, of course, Shakespeare didn't write much of anything down! Really, the only surviving document we are sure was written by him was his Last Will and Testament. It contains neither the word "honour" nor "honor".