We’re pleased to be working again with the Imperial War Museums on their ambitious ‘Permanent Digital Memorial.’
The memorial will use data migrated from the Lives of the First World War platform which holds records for almost 8 million people associated with the First World War. Since 2014 members of the public have been enhancing these records, uploading images and other digital artefacts to capture the wartime experiences of the people who served.
In summer 2019, public access to the Lives of the First World war data will change to be via the Permanent Digital Memorial. It will also be the platform which is used to preserve the ‘Life Stories’ for future generations. The public view is being built by Surface Impression and we are tasked with migrating the huge volume of collected content into its new, permanent home.
Analyse, enhance and migrate
Our CIIM middleware forms a key part of the Imperial War Museum’s infrastructure, and so we already have in-depth understanding of the museum’s data and infrastructure.
An initial goal for us is to agree the core data model for the digital memorial and to give the team at Surface Impression an early view of the underlying json data model. This will feed into their initial user consultation and design tasks.
Once the core data model has been agreed, we will explore how the existing data might be enhanced. Our aim is to harness the power of CIIM to ingest, analyse and cleanse the data before it is transferred. We hope to be able to improve it in several ways.
- Make new links between records by finding and matching patterns in the text.
- Correct common faults in the data, such as making date formats consistent across all records. We will apply increasingly precise algorithms to make several rounds of improvements, using CIIM’s reporting tools to inform each iteration.
- Build new controlled vocabularies and apply them in place of uncontrolled tags (for example, place names), by using a visualisation platform.
- Mine text for Named Entity Extraction using both CIIM’s internal Natural Language Processing engine (based on the GATE toolkit) and external services (such as Geonames, Watson and OpenCalais).
- Link to other Imperial War Museum services, such as Collections Online, War Memorials Register and thesaurus terms. Regular expressions will be used to identify URLs, accession numbers, etc. within the Lives of the First World War data set, for turning into links with other museum online collections.
Evaluate and learn
The Imperial War Museum’s Permanent Digital Memorial will be an invaluable source of data for researchers. Our final task in the project will be to report on the improvements we’ve made to the data, and also to recommend how the data could be further enhanced to support future research. We will review the effectiveness of our approach and the tools used. Richard Leeming of Golant Media Ventures will assist us in our evaluation.