The Aggregators' Handbook: a practical guide to adding content to Europeana for prospective and current aggregators

 

 

Case study: Polish Digital Libraries Federation

Marcin Werla

Marcin Werla, Poznań Supercomputing and Networking Center (PSNC)

PSNC, in which Marcin is the leader of the Digital Libraries Team http://dl.psnc.pl/, acts as the hub of the Polish Digital Libraries Federation, aggregating metadata from Poland's regional and institutional digital libraries. They also act as the national co-ordinator for Europeana Local, and late last year became the first Europeana Local co-ordinator to deliver metadata to Europeana's central index.

Marcin describes the steps towards integrating the Polish content.

'The first step towards Europeana was the analysis of the metadata that we aggregate to see how consistent it was with the Europeana Semantic Elements [ESE] and its mapping guidelines.

'Our metadata was Dublin Core simple which mapped to ESE with some normalization (e.g. we used ISO 639-2 for the DC language element).
Another issue were fields specific in the ESE like 'Europeana type', referring to the nature of the content - text, image, video and audio.

'It was important to establish the standard that would map precisely to ESE across all Polish institutions that contribute data to the Digital Libraries Federation. We gave several presentations to our providers to advise them how to clean and augment their metadata. A group was set up to develop the new metadata schema, and some of our 45 contributing digital libraries had to do some minor modifications to their Dublin Core records.

'Cleaning up the metadata was complicated, but once done, allowed for automated transfer. We manually prepared the mappings for 'type' rather than expect our providers to do it. The remaining mappings from Dublin Core plus extras to ESE were prepared and we ran the modified data on the system each night to update the information from our content providers.

We then ran the data on the Europeana Content Checker. We exported XML files from our OAI-PMH interface and uploaded these to the Content Checker. Throughout the process we worked closely with Lizzy Komen, Europeana Local's liaison officer in the Europeana office.

The Content Checker shows us how records are displaying in a test version of the Europeana interface. We were able to share this display and get feedback from our providers. Some were concerned about how their multilingual records were showing; others wanted to check the display of information about the rights in the objects; yet others wanted to be sure that their name would be clearly visible in the metadata.

The naming issue is important in an aggregation hierarchy. The Polish Digital Libraries Federation aggregates content from around 50 regional and institutional digital libraries, representing in all some 300 individual content holders. It's important that the original content holders as well as the data providers are correctly identified.

We gave ourselves a month for checking. We had 2 people who spent 2 weeks uploading and modifying the code, then 2 weeks getting feedback from all our source libraries. If the display wasn't right we modified the code then repeated the process several times until the data display suited all parties.

The next step was for the Europeana office to test our OAI-PMH interface so they could harvest the records. On the 24th November we completed our work in the Content Checker; on 27 November we had confirmation from Europeana that they were ready to start downloading data from the interface.

Then it was out of our hands: Europeana harvested the 257,000 records and completed the internal processing. This involved normalising the records and indexing them (caching the thumbnails was not possible at this time in our case). Then the Europeana office let us know as soon as they were ready to go live, which they did on 11 December 2009.