Culture Grid Profile


Culture Grid and Metadata – Object/ Item records

 

[Mia: I'm adding a bit of context cos not everyone will be familiar with the Culture Grid... As an indication of the type of content that's available through the Culture Grid, I've copied this text from some of their about pages: "It contains over 1 million records from over 50 UK collections, covering a huge range of topics and periods.  Records mostly refer to images but also text, audio and video resources and are mostly about museum objects with library, archive and other kinds of collections also included."  So, that's:

 

 

A bit of history …

An application profile of Dublin Core was produced by UKOLN, in consultation with Knowledge Integration and MLA, in 2005 for use with the Peoples Network ‘Discover’ Service.  Conformance to this profile, known as PNDS DCAP, was (in theory at least) mandated for all collections which received MLA funding to make their metadata available to PNDS.  The method of supply was also stipulated to be OAI-PMH.

As time progressed, this simple vision became somewhat diluted and the purity of the vision compromised.  Many projects encountered problems.  Implementing OIA-PMH, producing valid XML, producing well-formed XML and populating mandatory elements were all issues.  At the aggregator level, it was decided to be as lenient as possible to allow the maximum number of records to be ingested.  We also developed ‘data push’ interfaces to allow access to contributors who could not support OAI-PMH. As a result of ‘political’ pressure, it was decided to incorporate existing data sets that did not conform to PNDS DCAP.  It was also decided to allow some extensions to the profile to support a project (Exploring 20th Century London) which wanted to use the aggregation platform but required additional metadata elements.

We therefore reached the position where we had a large corpus of metadata records but the quality of the records was extremely variable.  The decision to use the platform as the national aggregator platform for Europeana meant that there was now a requirement to map some of this metadata to the Europeana Semantic Elements metadata profile (ESE).

 

What we do now

At the moment we have handlers for oai_dc and pnds_dc formatted metadata.  These are invoked on ingest whether the metadata is supplied via oai-pmh or via upload.  These handlers transform the metadata to our canonical format for storage in a database but also create alternate representations of the metadata (e.g. ESE) as required.  The advantage of creating these alternate representations on ingest, rather than ‘on the fly’, is that it vastly improves system performance when the grid is presenting search results or acting as an OAI-PMH target.

In addition to these standard handlers, we also have custom handlers for some data providers.  These are used to correct errors in the supplied data and sometimes to add missing values.  The creation of custom handlers is very resource intensive and is not something we wish to continue in future.

One thing we learnt at an early stage was never to throw away any data submitted to us or harvested by us.  Re-harvesting collections is such a hit-and-miss process that we always preserve a copy of the records in their original (raw) format.  This means that we can go back and re-process records if mappings change or if a new transformation is required.Whilst this is a useful safety net, it is also resource intensive and should only be used as a last resort, not as part of a general strategy for coping with new requirements.

 

Why Change?

Our experience of ingesting more than 1 million metadata records from over 50 collections has highlighted several issues with PNDS DCAP.  A few of these issues are intrinsic to the profile itself but many are related to inconsistent interpretation of the requirements by data providers.  Also, the increasing requirement to create ESE representations of supplied data has highlighted the incompatibilities between these profiles.  The difficulty in transforming PNDS DCAP records into ESE is increasing with each new version of ESE.  Indeed, the fact that Europeana’s requirements are such a moving target is a problem which must be shared by other aggregators.

Amongst the main issues with the current use of the Grid are:

 

Options

If we were to make a change to the recommended profile for submitting metadata to the Grid there are 4 main options:

 

1     Adopt ESE

Adopting ESE as the preferred format for submitting metadata would obviously aid onward transmission to Europeana (although the submitted records would still need to be processed to add the europeana:provider element).  Whilst there is certainly a case for making this one of the acceptable formats for data submission, making it the preferred format would have some disadvantages.  These include:

 

2     Adopt a more comprehensive metadata standard

A lot of people have pointed out the inherent problems in using simple Dublin Core based profiles to convey detailed metadata.  DC based profiles tend to be focussed on resource discovery metadata whereas some data providers wish to make more detailed curatorial metadata also available.  Based on the principle that it easier to map from a detailed representation of metadata to a simpler one than vice versa, there is a case for making a more comprehensive metadata format, such as CDWA, CIDOC-CRM or LIDO, the preferred submission format.

The main disadvantages of this approach are:

 

3     Extend PNDS DCAP, making it backwardly compatible

This would, in effect, be formalising the ad hoc arrangements that have already been made to cope with E20CL and similar projects.  The advantage of this approach is that existing metadata supplied in PNDS DCAP format would still be valid against the old profile.  The disadvantage is that it would not allow some of the problems with the original profile, such as the use of dc:identifier, to be addressed easily.

 

4     Produce a new profile and deprecate PNDS DCAP

This option leaves the maximum freedom to address the concerns of current and potential data providers, get real input from practitioners and address the problems found to date.  However, it does have the disadvantage that it adds further to the complexity of the situation with there being an even greater range of profiles to choose from.  For this reason, it is recommended that this option is only pursued if there is real buy-in from the community.

 

Recommendation

So long as there is community support, we feel that producing a new profile would provide the best platform for moving the Grid forward.  PNDS DCAP is around 6 years old and has served its initial purpose.  A forward looking strategy should not necessarily be constrained by it.  We therefore recommend that new profile is developed in consultation with the community and that the Grid’s ingest processing be updated to add a handler for records in this format.  To complement this, though, we would also recommend that the following policies be adopted:

Assuming that the above recommendations are accepted, we present a ‘straw man’ recommendation for the elements to be included within such a profile, along with some indication of the obligation requirements that should be placed against these elements.  Note that this is NOT intended to be a profile in itself.  We are looking to the community for input in defining the data model and producing XML bindings, schemas, conformance test, etc.

 

Annex 1       Comparison of PNDS & ESE

PNDS

ESE

Comments

Required/ Strongly Recommended Elements

dc:identifier

Additional

 

dc:title

Strongly Recommended

 

dc:description

Recommended

 

dc:subject

Recommended

 

dc:type

Recommended

specific encoding scheme

dcterms:license

Not Present

 

dcterms:rightsHolder

Not Present

 

Not Present

dcterms:alternative

 

Not Present

dc:date

 

Not Present

dcterms:created

 

Not Present

dcterms:issued

 

Recommended Elements

dc:creator

Strongly Recommended

 

dc:contributor

Strongly Recommended

 

dc:publisher

Recommended

 

dc:language

Recommended

 

dcterms:spatial

Recommended

specific encoding scheme

dcterms:temporal

Recommended

specific encoding scheme

dcterms:audience

Not Present

specific encoding scheme

dcterms:isPartOf

Recommended

 

pnsterms:thumbnail

Not Present

 

Not Present

dc:coverage

 

Not Present

dc:source

 

E20CL Extension Elements

dc:relation

Additional

used for related object

e20cl:materials

Not Present

= dcterms:medium?

e20cl:size

Not Present

 

e20cl:creditLine

Not Present

 

e20cl:relatedSubject

Not Present

 

e20cl:relatedPerson

Not Present

 

e20cl:relatedOrganisation

Not Present

 

Optional/ Additional Elements

dc:format

Additional

 

dcterms:spatial

Recommended

specific encoding schemes

dcterms:temporal

Recommended

specific encoding schemes

Not Present

dcterms:extent

 

Not Present

dcterms:medium

= e20cl:materials?

Not Present

dcterms:rights

similar to dcterms:license

Not Present

dcterms:provenance

 

Not Present

dcterms:conformsTo

 

Not Present

dcterms:hasFormat

 

Not Present

dcterms:isFormatOf

 

Not Present

dcterms:isVersionOf

 

Not Present

dcterms:hasPart

 

Not Present

dcterms:isReferencedBy

 

Not Present

dcterms:references

 

Not Present

dcterms:isReplacedBy

 

Not Present

dcterms:replaces

 

Not Present

dcterms:isRequiredby

 

Not Present

dcterms:requires

 

Not Present

dcterms:tableOfContents

 

Europeana Extension Elements

Not Present

europeana:country

 

Not Present

europeana:dataProvider

 

Not Present

europeana:hasObject

 

Not Present

europeana:isShownAt

 

Not Present

europeana:isShownBy

 

Not Present

europeana:language

different to dc:language

Not Present

europeana:object

 

Not Present

europeana:provider

 

Not Present

europeana:rights

different to dc:rights

Not Present

europeana:type

different to dc:type

Not Present

europeana:unstored

 

Not Present

europeana:uri

 

Not Present

europeana:userTag

 

Not Present

europeana:year

 

 

 

Annex 2       Straw Man proposal for data elements in CG profile

 

Element

Repeatable?

Ordered?

Comments

Required/ Mandatory

dc:identifier*

N

-

only one of these elements should be populated.  Remove dc:identifier altogether?

europeana:isShownAt *

N

-

europeana:isShownBy *

N

-

dc:title

Y

Y

if multiple titles, first one used in display

dc:description

Y

N

 

dc:type

N

-

Align encoding scheme with ESE?

dcterms:license

N

-

not sure if should be mandatory?

dcterms:rightsHolder

N

-

not sure if should be mandatory?

Recommended

dc:creator

Y

N

 

dc:subject

Y

N

 

dc:date

Y

N

 

dc:contributor

Y

N

 

dc:publisher

Y

N

 

dc:language

Y

N

 

dc:coverage

Y

N

 

dc:source

Y

N

 

dc:relation

Y

N

use for related person, subject, organisation & object via XML attribute

dcterms:spatial

Y

N

which encoding schemes?

dcterms:temporal

Y

N

which encoding schemes?

pnsterms:thumbnail

Y

Y

1st specified is used in hit list display

Optional

dc:format

Y

N

 

dcterms:alternative

Y

N

 

dcterms:audience

Y

N

 

dcterms:isPartOf

Y

N

 

dcterms:created

Y

N

 

dcterms:issued

Y

N

 

dcterms:extent

Y

N

 

dcterms:medium

Y

N

 

dcterms:rights

Y

N

 

dcterms:provenance

Y

N

 

dcterms:conformsTo

Y

N

 

dcterms:hasFormat

Y

N

 

dcterms:isFormatOf

Y

N

 

dcterms:isVersionOf

Y

N

 

dcterms:hasPart

Y

N

 

dcterms:isReferencedBy

Y

N

 

dcterms:references

Y

N

 

dcterms:isReplacedBy

Y

N

 

dcterms:replaces

Y

N

 

dcterms:isRequiredby

Y

N

 

dcterms:requires

Y

N

 

dcterms:tableOfContents

Y

N

 

e20cl:materials

Y

N

 

e20cl:size

Y

N

 

e20cl:creditLine

Y

N

 

europeana:dataProvider

Y

N

 

europeana:isShownAt

Y

N

 

europeana:isShownBy

Y

N

 

europeana:object

Y

N

 

europeana:rights

Y

N