Principles of Voyant Tools

Voyant is scholarly project focused on interpreting of texts and design of tools in the humanities. You too can use Voyant to analyze your own texts, to write essays with emebedded hermeneutical panels generated by Voyant, and you can adapt the code to create your own versions of tool.

What you can do with Voyant

Voyant is a  type of text analysis tool that you can use across the research cycle. You can:

  • Use it to learn how computers-assisted analysis works. Check out our examples that show you how to do real academic tasks with Voyant.
  • Use it to study texts that you find on the web or texts that you have carefully edited and have on your computer. We don’t keep your texts, except temporarily to make your analysis run better. When you are finished we discard your texts and the indexes.
  • Use it to add functionality to your online collections, journals, blogs or web sites so others can see through your texts with analytical tools.
  • Use it to add interactive evidence to your essays that you publish online. Add interactive panels right into your research essays (if they can be published online) so your readers can recapitulate your results.
  • Use it to develop your own tools using our functionality and code.

Design Principles

Although text analysis tool developers might choose to highlight different aspects for their purposes (such as stand-alone software as opposed to web-based software), here are some of the primary design principles for Voyant, as gleaned from other tools:

  • modularity: tools should be able to fit together in various configurations
  • generalization: tools should be designed to address a variety of types of text and uses
  • domain sensitivity: tools need to be sensitive to the ways in which textual scholars think of and interact with digital texts
  • flexibility: tools should be able to work with local or network sources in different formats
  • internationalization: tools should allow users to work in different languages
  • performance: tools should be reasonably responsive in order to function in a web-based context
  • separation of concerns: it may be best to separate back-end analytic procedures from front-end interface concerns
  • extensibility: it should be easy to create new tools and adapt existing ones, especially for the purposes of experimentation
  • interoperability: tools should provide public APIs so that they can interact with other tools on the web
  • skinnability: tools should be able to present themselves differently for different user needs and preferences
  • scalability: tools should provide functionality both for a small corpus (like a book) or a large corpus (like many books)
  • simplicity: at least one view of the tools should be maximally simple in its interface
  • ubiquity: tools should lend themselves to being embedded in content elsewhere on the web
  • referenceability: tools and their results should lend themselves to being referenced and cited as academic resources

Though they have existed before to varying degrees in different tools, Voyant is an attempt to pull together these design principles into a single a package. In some cases the the principles may in fact be contradictory in practice (for instance, supporting large-scale immediate analysis) and compromises must be found. Working through those challenges is one of the aspects that make Voyant a worthy intellectual challenge.

HyperPo and TAPoRware are the tools with the strongest affinities to Voyant. but we have devoted considerable thought and attention to improving existing web-based tools in ways further described below.

Scalability. Whereas HyperPo and Taporware can readily handle book-length texts for micro-analysis, both reach their practical limits when corpora grow to beyond a couple of megabytes. In contrast, Voyant is designed to handle much larger corpora (dozens of megabytes and beyond). There is still a practical (though undefined) limit to the size of corpora for Voyant given that it seeks to enable immediate micro-analysis, but the Voyant architecture is desiged with scale in mind. There will always be a tension between indexing speed and retrieval speed: the more time is available for indexing, the faster retrieval tends to be. As such, text analysis tools that require pre-indexing (Philologic, Monk, etc.) will almost always operate faster because pre-processing can be done over the course of hours or even days (building very large relational databases, for instance). In contrast, Voyant seeks to strike a balance between indexing and retrieval speed: ideally both should happen in a timeframe that seems reasonable in a web-based context. The ever-evolving pace of computing power and the promise of high performance computers obviously make the actual capabilities a moving target.

Ubiquity. As useful as text analysis tools like HyperPo and Taporware may be, we recognize a need to allow content providers and producers (like bloggers) to quickly and easily integrate functionality into their own space. The previous model was limited to users bringing their own texts to our tools, we now wish to also allow users to also bring our tools to their texts. In some cases users will wish to have static results, in which case we can provide a mechanism for easily copying and pasting results that can be directly embedded in other content. However, much of the most compelling functionality of Voyant is interactive and requires considerable client-side scripting: our current approach is to provide a tiny snippet of HTML that is essentially an IFRAME that contains the necessary HTML elements. This approach allows Voyant code to remain separate from its host while satisfying security limitations of cross-browser scripting. There are of course other challenges inherent to code embedded elsewhere, including version management (supporting legacy syntax) and cacheing of data (both the corpus and results).

Referenceability. The status of text analysis tools as academic resources has been a point of debate over the years. Scholars feel compelled to cite ideas and texts that come from other authors, but they are much less likely to recognized tools that have contributed to their work (and we would probably not want every scholar to cite search engines such as Google that have been used during research). We feel strongly that text analysis tools can represent a significant contributor to digital research, whether they were used to help confirm hunches or to lead the researcher into completely unanticipated realms. In any case, we have designed Voyant to be conducive to citation in various ways, including a general citation to Voyant and citations for static or dynamic results. An important component of academic knowledge is reproducibility, and providing scholars with more information on the processes followed during research – including the use of text analysis tools – is sure to be useful.

Ultimately, Voyant is an attempt to learn from the strengths and weaknesses of past tools, to recognize current user needs (ex: working with much larger corpora), and to anticipate future practices (ex: referencing text analysis tools and results). We believe that the potential for tools in the interpretive process merits continual rethinking of tool design and functionality, and as such, Voyant is of course a work in progress.