Data Mining with Criminal Intent

Tagged with: , , ,
Posted in Cultural Archives & Curation, Projects, Tools Research, Visualization


Image via

Image via

London’s Old Bailey was a courthouse that saw at least 197,000 trials between 1674 and 1913. The records of these trials have been preserved digitally in an archive called The Proceedings of Old Bailey, one of the world’s largest online bodies of accurately transcribed historical texts.

While the impressive contents of this online archive are freely and fully searchable, and are undoubtedly an invaluable scholarly resource, how we search is not always optimal. We are used to searching single terms, and the website even suggests that we start by searching our own names. But using digital tools that are designed to search ‘big data’ brings the possibility of unrealized connections and research paths. For an archive of such scope, digital tools powerfully boost the way that we search.

Data Mining with Criminal Intent (DMCI) used a customized version of Voyant Tools to organize, visualize, and analyze data from the Proceedings of Old Bailey. Voyant Tools is the creation of McGill’s Stéfan Sinclair (Associate Professor of Digital Humanities) and Geoffrey Rockwell (Professor of Philosophy and Humanities Computing, University of Alberta).

Voyant Tools is one of the three digital resources that make DMCI possible. The first is the UK archive itself, the Proceedings of Old Bailey, and the second is Zotero, which is a personal online library for gathering any and all sources in one searchable interface. Fred Gibbs (Roy Rosenzweig Center for History and New Media, George Mason University) was behind the creation of a specialized Zotero that would send the organized search results directly to Voyant Tools as a URL.

Among the interesting results that the data mining turned up was the vicinity of the noun ‘poison’ to ‘drank’ and ‘coffee.’ Seeing which words regularly appear in close context to one another is among the extremely useful possibilities of Voyant Tools, and in this case it revealed in seconds a historical connection that a human search would not have noticed without a lengthy and thorough process.

In terms of the criminal demography, the project noticed that women, who were once in equal number to male defendants, became outnumbered 10 to 1 by the 1800s. There was, however, a rise in the number of bigamy cases brought against women, concurrent with a diminishment in the punishment and handling of such cases (the proceedings were previously very harsh and public).

The broader goal of DMCI was to encourage ‘ordinary working historians’ and scholars in all fields to embrace the research methods made possible by digital tools. And among those learning to use Voyant Tools and Zotero are classes of McGill students, under the professorship of Stéfan Sinclair.

DMCI is a collaborative project by the Roy Rosenzweig Center for History and New Media (home of Zotero), TAPoR (home of Voyant Tools), and Old Bailey Online. The project is funded by Digging into Data (a joint initiative of the National Endowment for the Humanities, the National Science Foundation, the Social Science and Humanities Research Council, and the Joint Information Systems Committee).

For those looking to read more about the project, visit the criminal intent website (

Check out the online PDF of DMCI

Or for a more concise read, the New York Times article on the project:

(Both the PDF and the New York Times article were sources for this write-up)

Data Mining with Criminal Intent