This is a script for a workshop on using Voyant and TAPoR for a graduate class on research methods.
This workshop will quickly introduce you to computer assisted text analysis using Voyant and TAPoR. Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell and is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. TAPoR is web site for the discovery and review of text analysis tools including those in Voyant.
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with different texts.
- Then learn how to use the normal skin of Voyant with a single text and then a corpus.
- Then learn how to load your own text into Voyant.
- Finally, we will look at TAPoR where you can find other tools.
Remember that the tools entered in TAPoR like Voyant are research tools and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
- TAPoR 2.0 is available at http://tapor.ca
- The latest documentation site for Voyant is at docs.voyant-tools.org
- A web page on getting started with Voyant
- A list of screencasts on Voyant is at docs.voyant-tools.org/videos
- A list of Voyant tools can be found on TAPoR at http://tapor.ca/tools/filterbyattribute?att_val_id=36
- Older Voyant/Voyeur Tools Introduction from CWRC – http://cwrc.cs.ualberta.ca/index.php/General:Voyeur
- Older Quick Guide to Voyant/Voyeur – http://hermeneuti.ca/voyeur/users
- Voyant Tools
- Individual Voyant tool descriptions and links can be found at docs.voyant-tools.org/tools
- The main URL for Voyant Tools in http://voyant-tools.org/, though there are other URLs that can be used:
http://temp.voyant-tools.org/ the primary server for this workshop, but it’s a temporary server, so please avoid linking to it http://voyant-tools.org/ the main server, content here usually persists for a month or longer (especially when accessed regularly) http://beta.voyant-tools.org/ a development server, less stable, so avoid linking to it http://voyeur.hermeneuti.ca/ a much older version of Voyant, more of a tourist attraction and last ditch solution
2.0 Preparing a text for a question
The first step in text analysis is to assemble a text to fit your question(s). What do you want to ask about? What sort of text would help you ask questions about an issue? How can you use the internet to build a text?
For this workshop lets assemble a text off the internet.
- Decide on some aspect of popular culture or computing culture well documented on the internet.
- Google keywords associated with the subject you want to study.
- Skim the results and then develop selection criteria for what you want to scrape.
- Scrape a set of texts using Google.
- Copy and paste the texts into a text file. Clean out the navigation information and irrelevant parts.
- Export a text file for text analysis.
For more see Appendix 1: Finding and Preparing an Electronic Text
3.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools.
Go to the Cirrus tool and load up your text: http://voyant-tools.org/tool/Cirrus and load the text.
There are a number of ways to load a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain text, HTML, or XML texts
- Upload a PDF (and Voyant will try to extract the text)
The Cirrus tool shows you a word cloud of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That’s what we will look at next.
Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won’t work that well. Try your text in different tools.
4.0 Using a Reading Skin
Voyant Tools can also be composed into “skins” that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:
Go to Voyant and load your text into the Reading Skin: http://voyant-tools.org
To learn about using the full Reading skin you can go to
- Getting Started from Voyant Tools Documentation: http://docs.voyant-tools.org/start/
- Voyant/Voyeur Tools Introduction – http://cwrc.cs.ualberta.ca/index.php/General:Voyeur
- Quick Guide to Voyant/Voyeur – http://hermeneuti.ca/voyeur/users
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud. Then click on a text in the Word Trends and play with the KWIC.
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
5.0 Other Stuff
Here are some links to other tools, different corpora and skins for specialized tools:
- TAPoR 2.0 is a searchable database of text analysis tools. You can get an account and then enter comments about the tools you explore.
- Try your text with the Rezoviz network analysis tool: http://voyeurtools.org/tool/RezoViz/ (Use Chrome or Safari, not Firefox)
- Try your text with the Correspondence analysis tool: http://temp.voyant-tools.org/?stopList=stop.en.taporware.txt&skin=scatter&incontext=workshop
- Other corpora in Voyant:
- Humanist discussion list (21 years) in a skin with a Correspondence Analysis tool: http://bit.ly/VoyantHumanistScatterStop [temp, main, beta]
6.0 More Information
- U of Virginia: http://etext.lib.virginia.edu/collections/subjects/
- Gutenberg: http://www.gutenberg.org/
- Arts-Humanities.net: http://arts-humanities.net/
- DRAPier: http://dho.ie/drapier/
Aggregating and Cleaning Texts:
- TAPoRware Aggregator: http://taporware.ualberta.ca/~taporware/otherTools/aggregator.shtml
- TAPoRware Cleaner: http://taporware.ualberta.ca/~taporware/betaTools/webcleaner.shtml
7.0 Other Tools
What other tools are there out there? See TAPoR 2.0 for a growing list of tools.
- TAPoRware Tools mentioned above: http://taporware.ualberta.ca/
- AntConc (local App) http://www.antlab.sci.waseda.ac.jp/software.html
- Berkeley Wordseer http://wordseer.berkeley.edu/
- WordHoard (Java Webstart) http://wordhoard.northwestern.edu/
- Many Eyes (Visualization) http://www-958.ibm.com/software/data/cognos/manyeyes/
- HyperCities http://hypercities.ats.ucla.edu/
- R (programming language) http://cran.r-project.org/index.html
- Mathematica http://www.wolfram.com/
- Monk Project http://monkproject.org/
- Seasr http://seasr.org/
- Alchemy API http://www.alchemyapi.com
- Illinois Named Entity Recognizer Demo http://cogcomp.cs.illinois.edu/demo/ner/?id=8