Open Source Shakespeare (in MySQL)

database github mobile mysql shakespeare · 11 comments · 500 words · Viewed ~1,764 times

My good friend Richard Brent has often complained that my blog has very little Shakespeare content. Despite the domain name, I don't think I've ever blogged about The Big S. For shame! Fear not, my Brentish-Boy, this post is all about Shakespeare. And MySQL....

Ahem...

When I first started shkspr.mobi it was intended to be an easy way to get Shakespeare on your phone. At that time, there were no mobile formatted texts of his plays and sonnets, so I had to create them. Finding Shakespeare's works in a suitable format for conversion wasn't too hard - but it meant lots of crufty code to read text files line-by-line. Yuck.

A few years later, I stumbled across Open Source Shakespeare. The project grew out of Eric Johnson's MA thesis. It's a remarkably good idea with only one minor problem. The database it uses is Microsoft Access.

MS Access, as a database, could best be described as

deformed, crooked, old and sere, ill faced, worse bodied, shapeless everywhere, vicious, ungentle, foolish, blunt, unkind, stigmatical in making, worse in mind

(Comedy of Errors, Act IV, Scene II)

There are a few Open Source Shakespeare projects on GitHub, but they don't seem very practical.

So, naturally, I've decided to create my own version of Shakespeare's works - in MySQL :-)

This is what it looks like: You can download it from GitHub.

I've stripped out a lot of the extraneous stuff from the original version - word counts, etc. So it should be a fairly lean database which is easy to use. I'm not a database professional, so I would be grateful if you could suggest any improvements. Either using this blog's comment form or on GitHub.

There are four tables

Paragraphs

This is where the main body of text is. A typical row will look like this

WorkID: hamlet
ParagraphID: 639015
ParagraphNum: 3427
CharID: hamlet
PlainText: Has this fellow no feeling of his business, that he sings atngrave-making?
Act: 5
Scene: 1

Works

This is what translates the "WorkID" into something human readable - plus some extra metadata

WorkID: hamlet
Title: Hamlet
LongTitle: Tragedy of Hamlet, Prince of Denmark, The
Date: 1600
GenreType: Tragedy

Character

This is what translates the CharID into a human readable name and description

charID: hamlet
CharName: Hamlet
Abbrev: Ham
Works: Tragedy of Hamlet, Prince of Denmark, The
Description: son of the former king and nephew to the present king

Chapters

This gives the setting for each Act and Scene.

WorkID: hamlet
ChapterID: 18893
Act: 5
Scene: 1
Description: Elsinore. A churchyard.

What's Next?

The next steps for the project are fairly obvious:

Write some high level example code to show people how to use the database.
Make shkspr.mobi a showcase site which runs off the database.
Fix any bugs and inconsistencies that people find.

You can download the Shakespeare MySQL Database from GitHub.

11 thoughts on “Open Source Shakespeare (in MySQL)”

Eric Johnson

Terence, the OSS site itself runs on MySQL, and has since 2003, when I launched the beta version of the site. The download page provides Access and CSV files because those are the most easily-consumed versions of the database. For whatever reason, I've actually never been asked for a mysqldump version of the site -- probably because whenever someone has downloaded the db, they want to use the database in their own personal project, so they'd rather import the data into their own table structure, rather than replicating OSS's.

I'm glad you're finding the database useful, and I'm also glad to see it up on github.

Reply 2012-04-21 11:49
1. Terence Eden
  
  Thanks for the comment - and thank you for creating the original version.
  
  Reply 2012-04-21 14:01
Samuel Pickard

Has this fellow no feeling of his business, that he sings atngrave-making?

I think that this may be a question for both Terence and Eric then. The text has a new-line character n in it, which really really annoys me far more than is reasonable. Does this text really need formatting in it? Can't I decide how to word wrap the text?

Reply 2012-04-23 07:32
1. Terence Eden
  Hi Samuel,
  
  Great question.
  
  Two points to note,
  1. Shakespeare is traditionally broken down into lines. The allows the reader to see rhyming couplets, get a sense of rhythm, etc. It's also useful for long soliloquies to be able to reference specific lines.
  2. The original DB used "[p]" to show new lines. I wasn't aware of any parsing tools which could easily strip that out and replace it with, e.g. >br/&lt>
  What one could do is create a separate tale which lists where the line breaks should be - then remove them from the text. To be honest, I think it's probably easier for the user to strip out the n is they're not needed.
  
  T
  Reply 2012-04-23 07:57
  1. Samuel Pickard
    
    Good point, I'd not thought of referencing specific lines. n is much, much better than <br/>
    
    Reply 2012-04-23 08:03
Joseph Haig

Thanks for this, which I found via Bill Thompson on Twitter (@billt). One thing I notice immediately is that the sql file doesn't have the table definitions. I can guess more or less how the tables should be created but it would be useful if these could be included. A 'mysqldump' should produce a file including all you need to recreated the database elsewhere.

Reply 2012-04-23 11:16
1. Joseph Haig
  
  ... and another thing (now that I have imported your data).
  
  You should avoid multi-valued columns such as the 'Works' column of the 'Characters' table. Instead, have a separate table with two columns; CharId and WorkId. This will make it much easier to extract the data.
  
  I am a database professional (of sorts) but not a great expert on Shakespeare.
  
  Reply 2012-04-23 11:51
Andy Mabbett

Sir Tim Berners-Lee proposes "5 Stars of Linked Open Data", the last of which is "link your data to other data to provide context". Accordingly, I'd suggest you add a line (or lines) to your "Works" table, with the URIs of, say, the equivalent English Wikipedia articles, and/or, their DBPedia (data) equivalents.

Wikipedia has a Shakespeare bibliography whose list of links you may find useful. Lists of links for Male Shakespearean characters and female Shakespearean characters are also listed.

Reply 2012-04-23 11:22
1. Andy Mabbett
  
  Since I wrote that, Wikidata - a linked-, open- data repository sitting alongside Wikipedia - has become available.
  
  So now you can use Wikidata URIs :
  
  The Merchant of Venice == https://www.wikidata.org/entity/Q206400
  
  and not just for works:
  
  Lady Macbeth == https://www.wikidata.org/entity/Q2454065
  
  Reply 2020-02-08 13:32
Richard Morrison

I've been tinkering with this data and have tidied it up a bit (according to my own preferences and those of Ruby on Rails).

Not a finished work, but something to be built upon, perhaps.

See https://github.com/edent/Open-Source-Shakespeare/issues/1

Reply 2012-04-24 00:05
1. Richard Morrison
  
  Did a bit more on this - it's fun!
  
  Romeo just edges Juliet in the stats for "Romeo and Juliet". Can you guess who is next with 10.7% of the lines (paragraphs)?
  
  http://bardofavon.herokuapp.com/works/34/characters
  
  Reply 2012-04-25 22:14