Geo-referencing Digitised Collections

georeferencing
There are a couple of projects underway here at the Centre for eReseach (CeRch) and the Centre for Computing in the Humanities (CCH) about ‘Geo-referencing’. Geo-referencing is a way of ‘tagging’ digital collections so they can be searched by geographical place names or mapped.  Dr Claire Grover of the Language Technology Group, School of Informatics, University of Edinburgh is working on text-mining methods for extracting geographical information from unstructured text (ie. not encoded).  She is talking here next week. If you would like to come;  just send me an email.

There are vast quantities of textual information which people
typically access through standard search queries. Many collections
have added value in metadata associated with texts but this is costly
and time-consuming to generate by hand. Researchers in the field of
natural language processing (NLP) have been been working for the past
couple of decades on technologies for information extraction (aka text
mining) that will allow for the automatic extraction of structured
information that currently resides in unstructured text. In this talk
I will describe the NLP system that we have been developing to extract
‘who, where and when’ metadata from textual content. The primary focus
of the system is geo-referencing so that the place names in a text can
be recognised and grounded to a gazetteer entry to provide lat/long
information. In addition the system recognises person names as well as
dates and other temporal expressions.

System development was previously funded as part of EDINA’s
GeoCrossWalk project and we are currently refining it further for use
in the GeoDigRef project where we are geo-referencing three digitised
collections, Histpop, parliamentary records from BOPCRIS and metadata
from the British Library’s Archival Sound Recordings. In a parallel
project we are geo-referencing the Stormont Papers. I will discuss the
issues that arise from these different collections and will use them
to illustrate the difficulties in trying to develop a general purpose
tool that can be useful across different text types.

Posted

Comments

Leave a Reply