What is eResearch in the Arts and Humanities

This is the start of a ‘white paper’ on eResearch in the Arts and Humanities. Comments are most welcome (I do admittedly rely a little too much on Susan Hockey’s wonderful history of Digital Humanities in ‘A Companion to Digital Humanities).1

…by its very nature, humanities computing has had to embrace “the two cultures”, to bring the rigour and systematic unambiguous procedural methodologies characteristic of the sciences to address problems within the humanities that had hitherto been most often treated in a serendipitous fashion (Susan Hockey)

What are the Digital Humanities?

The disciplines and sub-fields that make up the humanities have a long interdisciplinary relationship with computing. Since the Italian Jesuit Priest, Father Roberto Busa approached Thomas. J Watson of IBM in 1949 to assist him in indexing some 11 million words of Medieval Latin, numerous humanities scholars have had productive if not at times challenging relationships with computing. Some of the early computing tasks set by humanities scholars included verification of authorship of disputed texts, automating the laborious task of creating concordances on seminal texts, and encoding and defining document structures for digital publication and analyses. Literature and linguistics were the forerunners of computing in the humanities, spreading out to other disciplines at later stages depending on the specific needs and questions of the disciplines and the capabilities of digital technologies.

The term ‘Digital Humanities’ is a banner term that encompasses all the disciplines in the humanities and the meaningful use of computing within them. As a field it is interdisciplinary by nature and although its definition is hotly disputed, it is generally agreed that ‘humanities computing’ or ‘digital humanities’ is an attitude towards computing encompassing theoretical sophistication and an applied technical know-how. It is this balance between the needs of the humanities and the needs of applied computing that is the most taxing aspect of the field. Accordingly the institutional arrangements of the field differ vastly from applied computing centres to full academic departments. The knowledge in the field is communicated through established journals and conferences as well as through a plethora of digital means.

What is eResearch?

The broader eResearch agenda, largely driven by the need to store and re-use the vast amounts of data produced by modern research, provides another set of challenges and opportunities for the humanities. eResearch, commonly referred to ‘Cyberinfrastruture’ in the US or ‘eScience’ in Europe, is largely an infrastructure movement to support ‘big science’. eResearch may be understood as a response to the pressing needs for large scale, interdisciplinary and trans-national collaborations using important data sets and analytical tools to address some of the most pressing questions facing humankind. The planets diminishing energy resources, stressed atmosphere and rising temperatures are problems too large to be dealt with by one discipline, one university or indeed one nation state. Large scale problems require large scale research collaborations and the accompanying infrastructure to support them. Climate data sets, agricultural crop data, emissions measurements, and historical data may be combined, collaborated upon, and communicated in such a way to create new knowledge and thus new approaches.

On a less monumental scale, eResearch enables researchers to address all sort of problems associated with the management of data, the citation of data, the location of data, and the communication of data. Although the humanities do not have the same set of challenges in terms of ‘the data deluge’ as the sciences, the humanities do produce (and need to manage) data in the form of oral interviews, image databases, text resources, and other varied accounts of the human condition. Humanities data is often laborious and expensive to produce, yet highly reusable in subsequent research contexts.

What is Data?

For the humanities, the term ‘data’ is rarely used to describe the apparatus of the research process, except perhaps in terms of those disciplines that engage in gathering data through ‘field work’ in social studies or empirical archival investigations. However, in the digital domain, where seminal corpuses, libraries, literature, and language resources are increasingly in digital form, almost any resources that helps scholars understand the human condition may be understood as ’data’. Records of the Old Bailey, newspapers, parliamentary papers, and court records are not only digital facsimiles of their original published online, but are also to all intents and purposes, ‘data’ that can be holistically analysed, compared and contrasted, and utilised as evidence in a similar way to a scientist understands data. Placing a million books online is a notable exercise in distribution, but the more remarkable attribute of a million books in digital form is that when viewed as data, they may be extracted in such a way to construct meaning that helps us understand new knowledge about these books that is beyond the scope of traditional scholarly labour.

What is architecture?

To take advantage of some of the computing infrastructures being built within the broader eResearch agenda, the ‘computing architecture’ must be built in such as way to take account of researchers working practices. In the humanities, the context of the ‘data’ is important as it is through context that humanities scholars establish the veracity of the resources and its subsequent meaning. Humanities scholars often require sophisticated anthologies to establish how knowledge ‘came into being’ (and its relationships), so that it can be built upon though monographs and articles. It must also have the ability to be cited so that its original location can be verified; of similar importance to the repeatability of the scientific method in science. Well designed Humanities architectures are a mix of more generic ‘services’ common to humanities practices; often containing tools and services more specific to disciplines and research questions.

The challenges and opportunities of eResearch in the Arts and Humanities

Perhaps the greatest benefit of the eResearch within the arts and humanities, beyond the many useful services and resources already produced, is that it allows humanities scholars to engage with advanced computing and imagine what is possible. We may not always get this right; it is an interdisciplinary experiment of methods and approaches, of tool development and application which promise to augment the humanities critical, analytics and speculative skills, or if driven by the wrong impulses, abate them. eResearch in the arts and humanities is a something that the humanities themselves must grasp and lead.

1. Susan Hockey, ‘The History of Humanities Computing” A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.

Victorian eResearch Review

The State Government of Victoria (Australia) has invested a reasonable sum in eResearch activities here in Victoria over recent years. The Government is undertaking a review; the discussion paper is available online with 36 Key Questions (and some of them are really hard like ‘how can the progress and uptake of eResearch be measured’.

The document is online and responses are due by March 25th (link).

Leaked climate change emails scientist ‘hid’ data flaws

This is one of the reasons we have eScience and citable, re-usable (and verifible) data.

Phil Jones, the beleaguered British climate scientist at the centre of the leaked emails controversy, is facing fresh claims that he sought to hide problems in key temperature data on which some of his work was based.

A Guardian investigation of thousands of emails and documents apparently hacked from the University of East Anglia’s climatic research unit has found evidence that a series of measurements from Chinese weather stations were seriously flawed and that documents relating to them could not be produced (link to Guardian).

Open Science and Data

As part of JISC’s ‘Research 3.0 – driving the knowledge economy’ activity
which launches at the end of November, a new Open Science report released
today trails key research trends that could have far-reaching implications for
science, universities and UK society.

The report written by UKOLN at the University of Bath and the Digital Curation
Centre, identifies open-ness, predictive science based on massive data
volumes and citizen involvement as being important features of tomorrow’s
research practice.

It is hoped that this document will stimulate and contribute to community
discussion in the UK, which is ranked second in the world for its output of
quality research, but also fuel the open science debate on the global stage.
Continue reading “Open Science and Data”

What to do with 30 million books?


(Posted to that wonderful Digital Humanities list, Humanist).

Date: Wed, 14 Oct 2009 18:22:57 +0100
From: Jockers Matthew <mjockers@stanford.edu>
Subject: Possible Text Mining Opportunity at Stanford


As I’m sure many of you already know, Stanford has been closely
involved with Google’s book scanning project, and we (Stanford) are
currently preparing a proposal for the creation of a text mining /
analysis Center on campus. The core assets of the proposed Center
would include all of the Google data (approx. 30 million books) plus
all of our Highwire data and all of our licensed content. We see a
wide range of research opportunities for this collection, and we are
envisioning a Center that would offer various levels of interaction
with scholars. In particular we envision a “tiered” service model
that would, on one hand, allow technically challenged researchers to
work with Center staff in formulating research questions and, on the
other, an opportunity for more technically advanced scholars to write
their own algorithms and run them on the corpus. We are imagining the
Center as both a resource and as a physical place, a place that will
offer support to both internal and external scholars and graduate
students. We are looking at creating fellowship opportunities and
post docs as well as other ways of encouraging and supporting

I am writing to you specifically because I think this will be
something you are interested in but also because at this stage of the
proposal we are looking for some external validation that this corpus
would be of value and that the research it would support would inspire
new questions and new knowledge. I have already polled our Stanford
faculty, and the response (especially in the humanities and social
sciences) has been very enthusiastic. My hope is that you might be
able to send a few words (at most a short paragraph) that I could add
to a section of our proposal that is titled “Scholarly Interest and
Research Potential”.

Hope you are all well and getting your abstracts polished for London
in 2010.


Matthew Jockers
Stanford University