Open Source Shakespeare (in MySQL)


My good friend Richard Brent has often complained that my blog has very little Shakespeare content. Despite the domain name, I don't think I've ever blogged about The Big S. For shame! Fear not, my Brentish-Boy, this post is all about Shakespeare. And MySQL....

Ahem...

When I first started shkspr.mobi it was intended to be an easy way to get Shakespeare on your phone. At that time, there were no mobile formatted texts of his plays and sonnets, so I had to create them. Finding Shakespeare's works in a suitable format for conversion wasn't too hard - but it meant lots of crufty code to read text files line-by-line. Yuck.

A few years later, I stumbled across Open Source Shakespeare. The project grew out of Eric Johnson's MA thesis. It's a remarkably good idea with only one minor problem. The database it uses is Microsoft Access.

MS Access, as a database, could best be described as

deformed, crooked, old and sere, ill faced, worse bodied, shapeless everywhere, vicious, ungentle, foolish, blunt, unkind, stigmatical in making, worse in mind
(Comedy of Errors, Act IV, Scene II)

There are a few Open Source Shakespeare projects on GitHub, but they don't seem very practical.

So, naturally, I've decided to create my own version of Shakespeare's works - in MySQL 🙂

This is what it looks like:
Shkspr MySQL
You can download it from GitHub.

I've stripped out a lot of the extraneous stuff from the original version - word counts, etc. So it should be a fairly lean database which is easy to use. I'm not a database professional, so I would be grateful if you could suggest any improvements. Either using this blog's comment form or on GitHub..

There are four tables

Paragraphs

This is where the main body of text is. A typical row will look like this

  • WorkID: hamlet
  • ParagraphID: 639015
  • ParagraphNum: 3427
  • CharID: hamlet
  • PlainText: Has this fellow no feeling of his business, that he sings atngrave-making?
  • Act: 5
  • Scene: 1

Works

This is what translates the "WorkID" into something human readable - plus some extra metadata

  • WorkID: hamlet
  • Title: Hamlet
  • LongTitle: Tragedy of Hamlet, Prince of Denmark, The
  • Date: 1600
  • GenreType: Tragedy

Character

This is what translates the CharID into a human readable name and description

  • charID: hamlet
  • CharName: Hamlet
  • Abbrev: Ham
  • Works: Tragedy of Hamlet, Prince of Denmark, The
  • Description: son of the former king and nephew to the present king

Chapters

This gives the setting for each Act and Scene.

  • WorkID: hamlet
  • ChapterID: 18893
  • Act: 5
  • Scene: 1
  • Description: Elsinore. A churchyard.

What's Next?

The next steps for the project are fairly obvious:

  1. Write some high level example code to show people how to use the database.
  2. Make shkspr.mobi a showcase site which runs off the database.
  3. Fix any bugs and inconsistencies that people find.

You can download the Shakespeare MySQL Database from GitHub.


Share this post on…

11 thoughts on “Open Source Shakespeare (in MySQL)”

  1. Terence, the OSS site itself runs on MySQL, and has since 2003, when I launched the beta version of the site. The download page provides Access and CSV files because those are the most easily-consumed versions of the database. For whatever reason, I've actually never been asked for a mysqldump version of the site -- probably because whenever someone has downloaded the db, they want to use the database in their own personal project, so they'd rather import the data into their own table structure, rather than replicating OSS's.

    I'm glad you're finding the database useful, and I'm also glad to see it up on github.

    Reply
  2. Samuel Pickard says:

    Has this fellow no feeling of his business, that he sings atngrave-making?

    I think that this may be a question for both Terence and Eric then. The text has a new-line character n in it, which really really annoys me far more than is reasonable. Does this text really need formatting in it? Can't I decide how to word wrap the text?

    Reply
    1. Hi Samuel,

      Great question.

      Two points to note,

      1. Shakespeare is traditionally broken down into lines. The allows the reader to see rhyming couplets, get a sense of rhythm, etc. It's also useful for long soliloquies to be able to reference specific lines.
      2. The original DB used "[p]" to show new lines. I wasn't aware of any parsing tools which could easily strip that out and replace it with, e.g. >br/&lt>

      What one could do is create a separate tale which lists where the line breaks should be - then remove them from the text. To be honest, I think it's probably easier for the user to strip out the n is they're not needed.

      T

      Reply
      1. Samuel Pickard says:

        Good point, I'd not thought of referencing specific lines. n is much, much better than <br/>

        Reply
  3. Thanks for this, which I found via Bill Thompson on Twitter (@billt). One thing I notice immediately is that the sql file doesn't have the table definitions. I can guess more or less how the tables should be created but it would be useful if these could be included. A 'mysqldump' should produce a file including all you need to recreated the database elsewhere.

    Reply
    1. ... and another thing (now that I have imported your data).

      You should avoid multi-valued columns such as the 'Works' column of the 'Characters' table. Instead, have a separate table with two columns; CharId and WorkId. This will make it much easier to extract the data.

      I am a database professional (of sorts) but not a great expert on Shakespeare.

      Reply
  4. Sir Tim Berners-Lee proposes "5 Stars of Linked Open Data", the last of which is "link your data to other data to provide context". Accordingly, I'd suggest you add a line (or lines) to your "Works" table, with the URIs of, say, the equivalent English Wikipedia articles, and/or, their DBPedia (data) equivalents.

    Wikipedia has a Shakespeare bibliography whose list of links you may find useful. Lists of links for Male Shakespearean characters and female Shakespearean characters are also listed.

    Reply

What links here from around this blog?

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">