We’ve compiled a small corpus of five novels by Mark Twain, using the open versions available from Project Gutenberg. There are four files for each – the Novel without the Gutenberg Header information, and three files deriving from data gathered through the Edinburgh Geoparser (): a comma delimited file of locations in each novel, bounded by the Mississippi (except for Innocents Abroad), a KML file for Google Earth, and a JSON file of the original output from the Geoparser.
For a list of visualization tools please see:
You’ll notice that the place names don’t necessarily match the places in the novel! Can you clean the data using Google Spreadsheets or Open Refine?
Things to do with the Geoparsed Files
- View the KML in Google Earth – edit it’s contents using a simple text editor. This is an XML file, so be careful!
- Take the Comma Delimited file, and load it into a Google Spreadsheet, OR cut and paste its contents into GPS Visualizer.
- Combine the files into new ones, and take them to GPS Visualizer.
Things to do with the Texts and the Corpus
We’ve also provided some files that offer statistical information on the corpus, and on the results generated by the Geoparser.
Load these Comma Delimited Files into Density Design RAW!
There are more frequency files below.
The Individual Files
- A Connecticut Yankee in King Arthur’s Court
- The Adventures of Huckleberry Finn
- The Adventures of Tom Sawyer
- Life on the Mississippi
- Innocents Abroad
Preloaded into Voyant at
Hacking – What can you do?
Option 2 – Build your own data set to work with using the Open Access resources above. Use Google Spreadsheets, Open Refine, or MS Excel to clean up your data. Explore visualizing your data using GPS Visualizer (if it’s geospatial!), Density Design RAW, TimeMapper, or Google Charts.
Option 3 – Build your own text corpus using the Open Access resources above or use the Mark Twain Corpus (http://voyant-tools.org/?corpus=1392745215963.3330) and…
- A) Load them into Voyant and explore the visualization tools.
- B) Load them into Zotero (you’ll need your own laptop for this), and explore them using the Zotero Plugin, Paper Machines