[Mia: I'm adding a bit of context cos not everyone will be familiar with the Culture Grid... As an indication of the type of content that's available through the Culture Grid, I've copied this text from some of their about pages: "It contains over 1 million records from over 50 UK collections, covering a huge range of topics and periods. Records mostly refer to images but also text, audio and video resources and are mostly about museum objects with library, archive and other kinds of collections also included." So, that's:
An application profile of Dublin Core was produced by UKOLN, in consultation with Knowledge Integration and MLA, in 2005 for use with the Peoples Network ‘Discover’ Service. Conformance to this profile, known as PNDS DCAP, was (in theory at least) mandated for all collections which received MLA funding to make their metadata available to PNDS. The method of supply was also stipulated to be OAI-PMH.
As time progressed, this simple vision became somewhat diluted and the purity of the vision compromised. Many projects encountered problems. Implementing OIA-PMH, producing valid XML, producing well-formed XML and populating mandatory elements were all issues. At the aggregator level, it was decided to be as lenient as possible to allow the maximum number of records to be ingested. We also developed ‘data push’ interfaces to allow access to contributors who could not support OAI-PMH. As a result of ‘political’ pressure, it was decided to incorporate existing data sets that did not conform to PNDS DCAP. It was also decided to allow some extensions to the profile to support a project (Exploring 20th Century London) which wanted to use the aggregation platform but required additional metadata elements.
We therefore reached the position where we had a large corpus of metadata records but the quality of the records was extremely variable. The decision to use the platform as the national aggregator platform for Europeana meant that there was now a requirement to map some of this metadata to the Europeana Semantic Elements metadata profile (ESE).
At the moment we have handlers for oai_dc and pnds_dc formatted metadata. These are invoked on ingest whether the metadata is supplied via oai-pmh or via upload. These handlers transform the metadata to our canonical format for storage in a database but also create alternate representations of the metadata (e.g. ESE) as required. The advantage of creating these alternate representations on ingest, rather than ‘on the fly’, is that it vastly improves system performance when the grid is presenting search results or acting as an OAI-PMH target.
In addition to these standard handlers, we also have custom handlers for some data providers. These are used to correct errors in the supplied data and sometimes to add missing values. The creation of custom handlers is very resource intensive and is not something we wish to continue in future.
One thing we learnt at an early stage was never to throw away any data submitted to us or harvested by us. Re-harvesting collections is such a hit-and-miss process that we always preserve a copy of the records in their original (raw) format. This means that we can go back and re-process records if mappings change or if a new transformation is required.Whilst this is a useful safety net, it is also resource intensive and should only be used as a last resort, not as part of a general strategy for coping with new requirements.
Our experience of ingesting more than 1 million metadata records from over 50 collections has highlighted several issues with PNDS DCAP. A few of these issues are intrinsic to the profile itself but many are related to inconsistent interpretation of the requirements by data providers. Also, the increasing requirement to create ESE representations of supplied data has highlighted the incompatibilities between these profiles. The difficulty in transforming PNDS DCAP records into ESE is increasing with each new version of ESE. Indeed, the fact that Europeana’s requirements are such a moving target is a problem which must be shared by other aggregators.
Amongst the main issues with the current use of the Grid are:
If we were to make a change to the recommended profile for submitting metadata to the Grid there are 4 main options:
Adopting ESE as the preferred format for submitting metadata would obviously aid onward transmission to Europeana (although the submitted records would still need to be processed to add the europeana:provider element). Whilst there is certainly a case for making this one of the acceptable formats for data submission, making it the preferred format would have some disadvantages. These include:
A lot of people have pointed out the inherent problems in using simple Dublin Core based profiles to convey detailed metadata. DC based profiles tend to be focussed on resource discovery metadata whereas some data providers wish to make more detailed curatorial metadata also available. Based on the principle that it easier to map from a detailed representation of metadata to a simpler one than vice versa, there is a case for making a more comprehensive metadata format, such as CDWA, CIDOC-CRM or LIDO, the preferred submission format.
The main disadvantages of this approach are:
This would, in effect, be formalising the ad hoc arrangements that have already been made to cope with E20CL and similar projects. The advantage of this approach is that existing metadata supplied in PNDS DCAP format would still be valid against the old profile. The disadvantage is that it would not allow some of the problems with the original profile, such as the use of dc:identifier, to be addressed easily.
This option leaves the maximum freedom to address the concerns of current and potential data providers, get real input from practitioners and address the problems found to date. However, it does have the disadvantage that it adds further to the complexity of the situation with there being an even greater range of profiles to choose from. For this reason, it is recommended that this option is only pursued if there is real buy-in from the community.
So long as there is community support, we feel that producing a new profile would provide the best platform for moving the Grid forward. PNDS DCAP is around 6 years old and has served its initial purpose. A forward looking strategy should not necessarily be constrained by it. We therefore recommend that new profile is developed in consultation with the community and that the Grid’s ingest processing be updated to add a handler for records in this format. To complement this, though, we would also recommend that the following policies be adopted:
Assuming that the above recommendations are accepted, we present a ‘straw man’ recommendation for the elements to be included within such a profile, along with some indication of the obligation requirements that should be placed against these elements. Note that this is NOT intended to be a profile in itself. We are looking to the community for input in defining the data model and producing XML bindings, schemas, conformance test, etc.
PNDS |
ESE |
Comments |
Required/ Strongly Recommended Elements |
||
dc:identifier |
Additional |
|
dc:title |
Strongly Recommended |
|
dc:description |
Recommended |
|
dc:subject |
Recommended |
|
dc:type |
Recommended |
specific encoding scheme |
dcterms:license |
Not Present |
|
dcterms:rightsHolder |
Not Present |
|
Not Present |
dcterms:alternative |
|
Not Present |
dc:date |
|
Not Present |
dcterms:created |
|
Not Present |
dcterms:issued |
|
Recommended Elements |
||
dc:creator |
Strongly Recommended |
|
dc:contributor |
Strongly Recommended |
|
dc:publisher |
Recommended |
|
dc:language |
Recommended |
|
dcterms:spatial |
Recommended |
specific encoding scheme |
dcterms:temporal |
Recommended |
specific encoding scheme |
dcterms:audience |
Not Present |
specific encoding scheme |
dcterms:isPartOf |
Recommended |
|
pnsterms:thumbnail |
Not Present |
|
Not Present |
dc:coverage |
|
Not Present |
dc:source |
|
E20CL Extension Elements |
||
dc:relation |
Additional |
used for related object |
e20cl:materials |
Not Present |
= dcterms:medium? |
e20cl:size |
Not Present |
|
e20cl:creditLine |
Not Present |
|
e20cl:relatedSubject |
Not Present |
|
e20cl:relatedPerson |
Not Present |
|
e20cl:relatedOrganisation |
Not Present |
|
Optional/ Additional Elements |
||
dc:format |
Additional |
|
dcterms:spatial |
Recommended |
specific encoding schemes |
dcterms:temporal |
Recommended |
specific encoding schemes |
Not Present |
dcterms:extent |
|
Not Present |
dcterms:medium |
= e20cl:materials? |
Not Present |
dcterms:rights |
similar to dcterms:license |
Not Present |
dcterms:provenance |
|
Not Present |
dcterms:conformsTo |
|
Not Present |
dcterms:hasFormat |
|
Not Present |
dcterms:isFormatOf |
|
Not Present |
dcterms:isVersionOf |
|
Not Present |
dcterms:hasPart |
|
Not Present |
dcterms:isReferencedBy |
|
Not Present |
dcterms:references |
|
Not Present |
dcterms:isReplacedBy |
|
Not Present |
dcterms:replaces |
|
Not Present |
dcterms:isRequiredby |
|
Not Present |
dcterms:requires |
|
Not Present |
dcterms:tableOfContents |
|
Europeana Extension Elements |
||
Not Present |
europeana:country |
|
Not Present |
europeana:dataProvider |
|
Not Present |
europeana:hasObject |
|
Not Present |
europeana:isShownAt |
|
Not Present |
europeana:isShownBy |
|
Not Present |
europeana:language |
different to dc:language |
Not Present |
europeana:object |
|
Not Present |
europeana:provider |
|
Not Present |
europeana:rights |
different to dc:rights |
Not Present |
europeana:type |
different to dc:type |
Not Present |
europeana:unstored |
|
Not Present |
europeana:uri |
|
Not Present |
europeana:userTag |
|
Not Present |
europeana:year |
|
Element |
Repeatable? |
Ordered? |
Comments |
Required/ Mandatory |
|||
dc:identifier* |
N |
- |
only one of these elements should be populated. Remove dc:identifier altogether? |
europeana:isShownAt * |
N |
- |
|
europeana:isShownBy * |
N |
- |
|
dc:title |
Y |
Y |
if multiple titles, first one used in display |
dc:description |
Y |
N |
|
dc:type |
N |
- |
Align encoding scheme with ESE? |
dcterms:license |
N |
- |
not sure if should be mandatory? |
dcterms:rightsHolder |
N |
- |
not sure if should be mandatory? |
Recommended |
|||
dc:creator |
Y |
N |
|
dc:subject |
Y |
N |
|
dc:date |
Y |
N |
|
dc:contributor |
Y |
N |
|
dc:publisher |
Y |
N |
|
dc:language |
Y |
N |
|
dc:coverage |
Y |
N |
|
dc:source |
Y |
N |
|
dc:relation |
Y |
N |
use for related person, subject, organisation & object via XML attribute |
dcterms:spatial |
Y |
N |
which encoding schemes? |
dcterms:temporal |
Y |
N |
which encoding schemes? |
pnsterms:thumbnail |
Y |
Y |
1st specified is used in hit list display |
Optional |
|||
dc:format |
Y |
N |
|
dcterms:alternative |
Y |
N |
|
dcterms:audience |
Y |
N |
|
dcterms:isPartOf |
Y |
N |
|
dcterms:created |
Y |
N |
|
dcterms:issued |
Y |
N |
|
dcterms:extent |
Y |
N |
|
dcterms:medium |
Y |
N |
|
dcterms:rights |
Y |
N |
|
dcterms:provenance |
Y |
N |
|
dcterms:conformsTo |
Y |
N |
|
dcterms:hasFormat |
Y |
N |
|
dcterms:isFormatOf |
Y |
N |
|
dcterms:isVersionOf |
Y |
N |
|
dcterms:hasPart |
Y |
N |
|
dcterms:isReferencedBy |
Y |
N |
|
dcterms:references |
Y |
N |
|
dcterms:isReplacedBy |
Y |
N |
|
dcterms:replaces |
Y |
N |
|
dcterms:isRequiredby |
Y |
N |
|
dcterms:requires |
Y |
N |
|
dcterms:tableOfContents |
Y |
N |
|
e20cl:materials |
Y |
N |
|
e20cl:size |
Y |
N |
|
e20cl:creditLine |
Y |
N |
|
europeana:dataProvider |
Y |
N |
|
europeana:isShownAt |
Y |
N |
|
europeana:isShownBy |
Y |
N |
|
europeana:object |
Y |
N |
|
europeana:rights |
Y |
N |
|