BBC RES data modelling


Presentations and notes from a workshop for BBC RES partners / content providers hosted at the British Library in January 2017. The BBC Research and Education Space (RES) platform is a partnership project between Jisc, Learning on Screen and the BBC that aims to make it easier for teachers, students and academics to discover, access and use material held in the public collections of broadcasters, museums, libraries, galleries and publishers. Content providers are required to publish records in a compatible linked open data format so they can be indexed by the platform. This is often new to cultural heritage organisations, so this workshop was convened to discuss some of the processes involved in turning collection data or articles into linked open data records.

 

Attendees

Mia Ridge (Digital Curator, BL; notes below), Elliot Smith (BBC contractor, worked on PRISM online library catalogue; on RES works with external orgs), Terry Panagoulis (BBC contractor, working on BBC collections into linked data), Phil Carlisle (Historic England, 'builds thesaurus for a living', working on Getty ARCHES project), Corine Deliot (metadata analyst, BL), Richard Palmer (cobbling data for V&A, looking to standardise collections data they publish with common vocabularies; share without having to hand data over to another db), Vicky Phillips (NLW, sharing shipping records), Richard Light (worked with Elliot on Shakespeare stuff), Kay Hanson (People's Collection, Wales - repository for national institutions collections alongside user-generated content)

 

RES WORKSHOP 2017-01-10 / follow up materials / Elliot Smith

 

Presentations from the event:

 

RES'S RULEBASE

The rulebase file which RES uses to map from input RDF to output RDF in its index:

https://github.com/bbcarchdev/spindle/blob/develop/twine/rulebase.ttl

As mentioned at the workshop, this is extended with additional mappings as necessary (e.g. if an organisation uses a class which doesn't map to one of RES's categories).

 

ARTICLE ABOUT HOW RES INDEXES RDF

Elliot Smith wrote an article about how RES indexes the RDF it crawls:

http://res-project.tumblr.com/post/154464075083/what-does-res-do-with-your-data

This might give you some idea of what RES is looking for in a dataset.

 

Below are some links to resources and RDF samples (including a more complex VoID file) which may be useful.

 


 

INDEXES OF LINKED DATASETS/ONTOLOGIES

Sites where you can find existing data and ontologies.

 

LINKED DATASETS

These are existing datasets which can be linked to from other datasets, primarily to serve as objects for your own statements. DBPedia and Geonames are very widely used.

 

ONTOLOGIES

These provide classes and relationships for RDF statements. Most of them have some relevance to galleries, libraries, archives and museums.

 

 

EXAMPLES OF RDF

 

OTHER REFERENCES

 

 


 

Mia's workshop notes

It still contains lots of jargon but hopefully provides some additional context for the presentations above.

 

Jake, providing context: 6 or 7 orgs already in RES. Prioritising those with interesting data with clear educational or research potential, already getting data RES compatible.

Elliot's presentation 'Names and things' on converting data into RES-style data they can ingest offered a series of choices. This was a really useful overview and could possibly be converted into a flowchart or decision tree to help people get started with modelling their data as linked open data.

First choice - are you talking about the thing in the world or the image of the thing in the world? RES talk about real world objects rather than the digital artefacts. e.g. the tower, photo of the tower is attached to the tower.

Second choice - is description a caption for the photo or a description of the tower?

Third choice - how to give things names represented by strings/URIs [URIs as something like accession numbers or shelfmarks]

The convention is to use hash URIs for things in the real world or things that can't be communicated over the web

Literal vs resource - e.g. labels or captions - 'literals' simple strings, or create a resource and link to its URI

Only create your own resources when you have something to add. e.g. something more about Stamford, Lincolnshire cf Geonames version


Terry, on Reconciliation 101
Two ways to reconcile resources - directly reconcile literal values against LOD services via a web service, or build a custom SKOS vocab for your data, reconcile SKOS concepts against authorative sources. Can use DBPedia Spotlight (free), TextRazor, OpenCalais.

If you have a hierarchy in your vocab, good for SKOS.

 

Discussion

 Question re schema.org

RES is trying to break the catch-22 of no-one using linked data because no-one publishes it and vice versa

Discussion about the internal (e.g. curatorial) need for accuracy triggered by example of 'middle ages' defined as 1066-1484 (year before Richard III). Resistance to publishing rough data and iterating from there can stem from concern that data will be taken on face value and lead to misinterpretation of objects or history.

Discussion of CIDOC-CRM - can be useful but have to buy into it. Can be seen as too abstract, removed from practical work of encoding records.

MR suggested decision-tree model for helping people model things, design templates, based on presentations and discussions today. e.g. are you modelling things in the world, or knowledge/articles. Useful to talk people through things like 'what are you modelling, the thing in the world or the record about the thing' and how that effects fields like 'creator' (e.g. who created the record vs who created the object). Thinking of data modelling as a series of decisions might help break it into achievable parts, particularly if the project is willing to suggest defaults (e.g. long-standing debate on # URIs vs 303)

MR questions: Is there a way of seeing which vocabs, terms are the most commonly used in RES to help orgs decide how to model their data or help developers see what's there? Elliot says it can be done, but there's a backlog of requests. How to get feedback from developers using data? Work in progress, early discussions still.