Technical Discovery: Project Alpha

In this article I will outline the technical discovery work done since the end of October 2019 for The National Archives’ Project Alpha, which was introduced in ‘Building an archive for everyone‘:

Working in an archive we have long memories. Taking inspiration from alpha.gov.uk (which led to GOV.UK) we’re aiming to build and test, in public, a prototype of a new website for The National Archives. It will be shaped by user needs, follow service design principles and make the best use of modern technologies.

We’re taking a ‘blank sheet of paper’ approach to Project Alpha, challenging ourselves to look beyond our current technological and cultural limitations, to define what a modern, accessible archive should and could be for all our users.

Digirati are collaborating with The National Archives, as part of an integrated team with National Archives staff, to help with the discovery process for Project Alpha and the building of the Project Alpha prototypes.

This article covers the initial on-boarding and discovery phases of the project, which run from October 2019 to early January 2020, and focuses on the technical side of discovery. Ideation and solution design, followed by a rapid phase of prototyping and user testing – the alpha phase proper, following the GDS Agile Delivery Framework – will follow the discovery phase in mid-January 2020.

Colour photograph of four rows of six draws in a wooden filing system.

Photo by Jan Kolar (www.kolar.io) on Unsplash

The Problem

Project Alpha is taking a blank sheet approach to solving some problems that users of archives encounter when they first visit The National Archives website, and websites of similar memory institutions.

Tom Crane, Digirati’s Technology Director, has written a series of Medium.com posts addressing the problems that users who are Baffled by Archives encounter. These include:

Users have a mental model of how websites work and how their information is stored and presented, based on their experience with e.g. search engines like Google; online encyclopedias like Wikipedia; news media; commercial information providers, or library catalogues.
However, this mental model breaks down when they encounter an archive for the first time.
Archives are curated intellectual structures, rather than the output of some algorithm like Google’s PageRank.
These curated structures are deeply and intrinsically hierarchical.
Understanding information at one level of an archival hierarchy often requires understanding of information that is present at other, more general or higher, levels of the hierarchy.
The hierarchy itself often conveys information about the provenance and intellectual history of The National Archives’ collection(s).
This model of the archive as a curated intellectual hierarchy is not necessarily one that users bring with them to The National Archives website.
Nor is it a model that the website conveys clearly to users through user experience and information architecture. As a result:
Users often do not understand how to access the information they need to answer the questions they come to The National Archives with, or even whether The National Archives website is the correct place to ask these questions.
Users may not understand that often the page they are on is a catalogue page about some object in The National Archives’ collection, rather than a page for that object, where they might get direct access to it.

Project Alpha is about helping to solve these problems, and others, through innovative user experience and design combined with cutting edge technological solutions.

However, the goal of Project Alpha is not to produce a finished public-facing replacement for The National Archives website. Instead, the goal is to identify innovative and genuinely useful approaches to presenting The National Archives’ collection online through rapidly prototyping, testing, and evaluating solutions to the problems that users face when visiting The National Archives.

Process

From the start, Project Alpha has been based around identifying the most pressing and important problems to address in prototyping through two complimentary processes:

User research and user experience work based on design thinking
Articulation and understanding of the business goals of The National Archives and insights derived from the knowledge and experience of National Archives staff and the vision outlined in Archives for Everyone

In addition, there is a technical strand of the discovery phase focused on articulating and understanding the possibilities and the limitations afforded by the data sources that exist for The National Archives collections.

‘Technical Discovery’ is primarily about ensuring that when ideation, solution design, and rapid prototyping begins for Project Alpha, we have:

Built a shared understanding of The National Archives’ collection across the team
A clear understanding of where, across many potential sources, data exists
An understanding of in what form that data exists
Where we can prototype at scale based on real data
Where we should prototype on mocked or simulated data

We want to ensure all of the above so that we are ready to begin work immediately, and productively, on design, sketching, and writing code in mid-January 2020.

Context for Technical Discovery

Technical Discovery work is not taking place in a vacuum. In advance of the ideation and solution design phase of Project Alpha, there are still some background goals and assumptions that can help focus technical exploration, without closing off entirely different possibilities that might emerge during solution design.

Goals for Project Alpha

Workshops at The National Archives (30 and 31 October) identified some key user-research led goals and directions of travel:

How users find us / How users encounter relevant content

(from) Harder – FINDING THINGS IN THE ARCHIVE – (to) Easier:

We will help people find The National Archives on the web. Wherever users start their journey, searching for a historical event or an individual, we will connect them with relevant and useful National Archives content, providing meaningful context to encourage serendipitous exploration.

How users understand ‘National Archives content’

(from) Harder – UNDERSTANDING THINGS IN THE ARCHIVE – (to) Easier:

We will help users to make sense of what they have found and empower them to navigate the rest of their journey confidently.

Help users engage with The National Archives

(from) Less – ENGAGEMENT WITH THINGS IN THE ARCHIVE – (to) More:

We will enable users to become an active participant in National Archives activities, through contributions and engagement, and the sharing of experience and expertise.

These goals, along with the business goals and the vision of The National Archives, will be the key factors used to prioritise and assess areas for prototyping in the alpha phase of the project, and have helped shape the investigations carried out in the Technical Discovery phase of the project.

Initial suggestions for technical approach

There are a number of suggestions for how solving the problems for Project Alpha might be approached. These are not set in stone, but provide useful pointers for further investigation:

Adopt as light a tech stack as possible
Fix the National Archives URLs to make these citable, hackable, logical to traverse, and human readable
Make smarter use of search input and broader results from National Archives data sources to improve search
Explore the possibility of building a comprehensive domain model from The National Archives’ and other data sources

Existing hypotheses and assumptions

Staff at The National Archives already have a number of hypotheses and assumptions – some well-supported by user testing and other evidence, and some untested at this stage – that might help explain why users struggle, and how they might be helped.

The statements below are loosely paraphrased or adapted from statements made by National Archives staff in the on-boarding workshops.

Making the easy things easier, and supporting people better with the hard things, is the key to success.
We should break free of the constraints imposed by a model of the catalogue as just an online version of a paper catalogue.
The catalogue cannot, and never will, conform to the Google model of information discovery and presentation.
Helping users acquire a better mental model of archives, through visualisation, content enrichment, and cutting edge user interfaces will help make the things – both easy, and hard – easier.
Better and more thorough use should be made of expert assistance, of the type already captured by National Archives staff in detailed research guides.
If we provide a more contextually-rich experience at the moment of discovery users are more likely to understand what they’ve found, and more likely to continue using The National Archives catalogue.
The catalogue metadata is largely a fixed point about which the project has to orbit. There’s no time or money to re-catalogue many millions of complex archival objects. But, better use could be made of the metadata The National Archives already has.
Users often expect to find the things, not the metadata about the things.
Manage users’ expectations, so they clearly understand: what they can do (easily); what they can do (with experience and persistence); and, what they cannot do. Not all tasks will be easy, but the website can build confidence in users that their efforts will be rewarded.

First thoughts on user journeys

Based on The National Archives’ hypotheses and assumptions, and the user journeys and personas identified by the user experience work in Discovery, there are a number of obvious areas that might be explored in Project Alpha:

Mental model: see Tom Crane’s Baffled by Archives post. Can we make the archival model easier for users to understand and navigate, through innovative information architecture, design, and user experience?
Rich record pages: The archival model, in which the information required to understand a specific item is attached to the higher levels of the archival structure, works well for users who understand this model, and who have arrived at an object by descending the hierarchy from the top level. When many users enter directly via Google, or do not understand the archival model, would they be helped if more of the metadata at higher levels in the catalogue were bubbled down into the item record display?
Data enrichment: can we enrich the data already available in the catalogue or other data sources? For example, generating horizontal links between objects linked via common named entities – people, places, dates, organisations – or linking via authorities, or sources of linked open data?
Visualisation: can we use non-verbal or visual sources of information to represent or enrich archival objects? Hierarchies of archival objects? Can we approach this in a way that is accessible to users?

The areas that will actually be explored in Project Alpha will emerge from ideation workshops in January 2020, so the list above is just initial thoughts to help guide exploration of data sources.

Data Sources

With the goals and assumptions of the discovery phase in mind, the team began to look at the available sources of data that The National Archives has about their collections. These include:

The catalogue
The document ordering database
Sources for information about digitised objects, which span multiple databases and data sources
Historical Manuscripts Commission data
The document database that drives the Discovery website
The rich set of research guides
Educational materials
Exhibition websites
Blog posts
A/V materials

Team members – from The National Archives and from Digirati – were each assigned a service to:

Explore and understand the data held by that service
Explore and understand whether that data might be useful to Project Alpha
Potentially prototype, or plan for prototyping, (as a proof of concept) simple tools for exposing this data to be built in the alpha phase of this project
Plan for how this data might be mocked in the event that a mocked version of the service data is all that is required in Project Alpha

Alongside this service understanding phase of Technical Discovery we also began work on:

Understanding how we might normalise URLs for these services, so that we can interconnect them via shared identifiers in Project Alpha
Dev-ops processes for continuous integration and continuous deployment of prototypes developed during Project Alpha

What we have done to date

Since the commencement of the project in October we have:

Set up and configured dedicated Amazon Web Services (AWS) infrastructure for hosting data and for delivering prototypes.
Deployed copies of key National Archives data sources (see above) to the AWS infrastructure, with personally identifying information removed or anonymised as appropriate.
Set up dev-ops processes for quickly and easily deploying new prototypes to AWS.
Developed a simple Flask application for exposing catalogue data from ILDB on the web, as a tool for Digirati staff and others to begin understanding The National Archives’ collection and archival data model.
Developed an algorithmic URL parser to attempt to resolve National Archives identifiers from citable, hackable URLs.
Confirmed that an algorithmic approach will not work due to the structure and format of the National Archives catalogue reference.
Built and tested an Elasticsearch-based URL parser which successfully implements parts of the proposed new URL scheme for The National Archives’ catalogue to provide a common service which can be used by prototypes built during Project Alpha to resolve identifiers.
Built and tested an Elasticsearch-based endpoint to return siblings and children of objects at any level of the archival hierarchy.
Begun building a shared understanding of development process and standards across a combined team of National Archives and Digirati developers.
Produced a prioritised backlog of service endpoints to expose National Archives data to proof of concept services built during Project Alpha, and assigned these to specific developers for further backlog refinement and exploration.

Next Steps

The next steps, to begin in January 2020, will be driven by areas for investigation and prototyping that emerge from the ideation workshops and from The National Archives’ business goals. The Technical Discovery work to date has laid the ground for rapidly building prototypes and deploying them for testing.

Matt McGrattan is Head of Digital Library Solutions at Digirati.

As we progress with Project Alpha, we’re looking to test some of the concepts with people new to The National Archives. If that sounds like you, we’d welcome your help! Register your interest here.