Global digital collections | Digital Collections Programme

talk

Deborah Paul presenting on the Global Biodiversity Information Facility

Ben Price and Douglas Russell blogged recently about presentations by Museum colleagues at the Annual Meeting of the Society for the Protection of Natural History Collections (SPNHC) in Berlin, noting that delegates were passionate about the potential of digitisation to help us illustrate, research and understand our changing world. As well as presenting, we learned a lot from the other presenters and attendees, picking up some themes which are particularly relevant to our Digital Collections Programme (DCP).

DCP ultimately aims to digitise all 80 million of the Museum’s specimens, releasing this as open data for research and wider use. The clearest message from SPNHC is that we are not aloneDeborah Paul and others talked about the Global Biodiversity Information Facility’s (GBIF) survey of natural history museums and collections worldwide, which is finding that plans or projects to digitise specimens are almost universal, albeit at different stages.

A key question for us and other collections is how to prioritise this digitisation effort. It was good to see an emerging consensus from the prioritisation symposium at SPNHC that scientific research priorities should be a key driver – for example, which collections can fill real gaps in current research, or contribute to our knowledge in relation to key challenges such as climate change or sustainable food crops? Alongside this, there was widespread recognition of something we’ve learned from the first phase of DCP – that practicalities also play a big part. Some specimens are simply more straightforward to digitise in bulk – we’ll tackle microscope slides, for instance, more readily than large mammals.

digitisation

Microscope slides being scanned for bulk digitisation using SatScan

The next big theme after prioritisation was how to manage the data created by digitisation most effectively, to maximise use and impact – Vince Smith opened a ‘one collection’ symposium at SPNHC, and these themes of data and integration also appeared across other conference sessions. Data created in the course of DCP appears on the Museum’s Data portal, where it can be viewed and downloaded directly. Data on the Portal is also used by major biodiversity data aggregators such as GBIF, who collect data from a wide variety of sources for broad coverage.

While the Data Portal represents excellent linked open data, continued work is needed by us and others around the world to ensure that data can be used with the greatest flexibility, and that barriers to use by non-experts in data science and IT are reduced as far as possible. Data standards and policies – particularly when these cross institutional or geographic boundaries – help to ensure that data from one source can be used effectively alongside data from elsewhere. In addition, more work is needed on how, as well as putting data out, we can allow for data enrichment – we saw some great examples at SPNHC, for instance, around the use of text sources to enrich information about specimens.

plant

A Herbarium sheet showing the challenges associated with handwritten labels

Communities are key to solving or resolving these issues. SPNHC itself is one great forum for sharing what we learn as we digitise collections. The broader communities of citizen science, and in particular opportunities for crowdsourcing transcription of data from specimen labels, were another key theme at this year’s meeting. When we create digital images of specimens, such as pressed plants (herbarium sheets), key data is often still ‘locked’ in hand-written labels, which can be very old, small and sometimes hard to decipher – getting the public to help us transcribe what these say really deepens and speeds up the digitisation process. Talks about Notes from Nature and others showcased increasingly successful tools and results, harnessing the widest possible participation in science and digitisation.

Finally, there was recognition at SPNHC that we need to get better at understanding the benefits of digital collections, and using this to advocate for the importance of collections in global science. For DCP, as we enter the second phase of our programme we’re thinking about how best to measure what we do, so that we can continue to improve our digitisation practice as well as demonstrating how our digitised collections can be and are being used.

Anyone interested in finding out more from SPNHC can watch the talks online thanks to IDigBio. To find specific talks, refer to the talk times in the full conference programme.

Helen Hardy
Digital Collections Programme Manager