Quantifying “dark data” in fossil collections is a call to arms; heralds a digital revolution
SAN FRANCISCO (September 20, 2018) – Days after a fire tore through Brazil’s National Museum and destroyed specimens of irreplaceable heritage, a team of scientists has quantified the vast number of fossils that sit unstudied in natural history collections. Based on their findings, the team estimates only 3 to 4 percent of recorded fossil locations from across the globe are currently accounted for in published scientific literature. This means any shelved specimens that have never been published or documented digitally remain vulnerable to loss. Researchers from the California Academy of Sciences, University of California Museum of Paleontology (UCMP), and partner institutions are working to preserve these “dark data” in online databases, highlighting the need for underfunded museums around the world to invest in the digital preservation of their collections. The three-year-old project’s preliminary results were published in Biology Letters earlier this month.
“The fossil record offers invaluable insight into our planet’s ecological and evolutionary past,” says co-author Dr. Peter Roopnarine, the Academy’s Curator of Invertebrate Zoology and Geology. “Yet published literature only documents a fraction of the fossils housed in museum collections. Digitizing specimens preserves valuable data and makes it readily accessible to researchers everywhere.”
Fossil-finding long predates the digital age, leaving modern paleontologists with the Herculean task of compiling enough data by hand to address large-scale questions of planetary change. The first digital revolution for fossil collections began in the 1990s, when the scientific community launched several still-growing online databases based on published literature, the most comprehensive being the Paleobiology Database (PBDB).
Today, a second digital revolution is underway. Led by UCMP, ten institutions are digitally cataloging fossil specimens from their collections that have never been cited in published literature. The new database, known as EPICC (Eastern Pacific Invertebrate Communities of the Cenozoic), compiles marine invertebrate fossils that span the past 66 million years and hail from Chile to Alaska.
The study’s co-authors compared the number of locations represented by fossils in the literature-based PBDB to the number of locations tallied in the new EPICC database for the states of Washington, Oregon and California. They found that for every fossil-bearing location recorded in the scientific literature, 23 more exist on shadowy museum shelves. This finding informed the team’s global estimate for all fossil types: Of the fossil-bearing locations known to exist across the globe, only 3 to 4 percent are accounted for in published literature.
“What this means is that within most of the great museums of the world there are specimens that have not been fully utilized to understand the nature of our planet, how ecosystems responded to climate change in the past, and how they’ll respond moving forward,” says lead author Dr. Charles Marshall, Director of UCMP and Fellow of the Academy. “We need that perspective to forecast the future.”
So far, modern digital technologies have already allowed the team to harness the collective power of hundreds of thousands of specimens for coherent analysis. The research potential is vast: Teams continue to make new-to-science discoveries by simply delving deeper into their collections. Digitization also supports the enormous, upfront investment that museums have already made to collect and steward natural history specimens.
Marshall says the paper’s coincidental publication shortly after Brazil’s National Museum fire is a call to arms. “In the wake of the fire, my reaction was one of heartbreak, dismay, and shock. As scientists, seeing a fire like this is akin to learning your parent’s house has just burnt to the ground. It’s time for government and funding agencies to step up investment in the digitization of natural history collections and preserve our world heritage for decades to come.”
EPICC is a partnership of ten natural history museums united to digitize marine invertebrate fossils found in the eastern Pacific, including the California Academy of Sciences, John D. Cooper Center, National Museum of Natural History, Natural History Museum of Los Angeles County, Paleontological Research Institution, University of Alaska Museum, University of California Museum of Paleontology, University of California Riverside Earth Science Museum, University of Oregon Museum of Natural and Cultural History, and University of Washington Burke Museum. EPICC is funded through the National Science Foundation’s Advancing Digitization of Biological Collections program and affiliated with Integrated Digitized Biocollections (iDigBio).
The Institute for Biodiversity Science and Sustainability at the California Academy of Sciences is at the forefront of efforts to understand two of the most important topics of our time: the nature and sustainability of life on Earth. Based in San Francisco, the Institute is home to more than 100 world-class scientists, state-of-the-art facilities, and nearly 46 million scientific specimens from around the world. The Institute also leverages the expertise and efforts of more than 100 international Associates and 400 distinguished Fellows. Through expeditions around the globe, investigations in the lab, and analysis of vast biological datasets, the Institute’s scientists work to understand the evolution and interconnectedness of organisms and ecosystems, the threats they face around the world, and the most effective strategies for sustaining them into the future. Through innovative partnerships and public engagement initiatives, they also guide critical sustainability and conservation decisions worldwide, inspire and mentor the next generation of scientists, and foster responsible stewardship of our planet.