Defining geosciences research data through metadata reuse:

a case study of PANGEA data repository

Authors

DOI:

https://doi.org/10.5195/biblios.2024.1233

Keywords:

Research data, Research data reuse, Metadata, Research data repository, Data Web Scraping, Geosciences

Abstract

Objective. Research data refers to factual records used as primary scientific research resources. Reusing research data metadata provides a new perspective, allowing the presentation of new tests, hypotheses, and new research developments. This study aims to identify the nature of the types of Geosciences research data based on the reuse of metadata from the PANGEA Data Publisher for Earth and Environmental Science available at (https://www.pangaea.de/). The research question to be analyzed is “Can the processes of analyzing and manipulating PANGEA research data metadata be used to define a concept of Geosciences research data?” To address this question, we considered data specification attributes used by data journals to describe the nature of research data: domain of specialization, accessibility, language, data type, acquisition, source location, specific subject area, and related publications.

Method. The methodology in question involved collecting, analyzing, and visualizing PANGEA research data metadata. In total, (426,272) records were downloaded from the data repository and compared to the data specifications used by data journals to describe the nature of research data in data papers. The methodology required the application of techniques and technologies used for descriptive analysis, information retrieval, data manipulation, and visualization of Dublin Core metadata. These techniques were implemented using the Python programming language and other data manipulation software, including OpenRefine and VOSviewer.

Results. The results of our analysis suggest a detailed examination of the metadata for (137,218) research data records from (6) six Geosciences collections. The number of records in the Geochemistry collection is (73,992), in the Atmospheric Sciences collection it is (32,314), in the Paleontology collection it is (25,903), in the Oceanography collection it is (22,287), in the Geophysics collection it is (4,175), and in the Hydrology collection, it is (834). PANGEA's (6) six research data metadata collections allow for the discussion of a concept of Geosciences research data as a type of data on studies related to the Earth, atmosphere, and oceans, across different geo-disciplines. The data come from a range of disciplines, including geochemistry, atmospheric science, paleontology, oceanography, geophysics, and hydrology, using technologies such as satellites, electronics microscopes, climate sensors, ships, computer modeling, and others. In addition, the data are augmented by other sources related to the study of the Earth and its processes.

Conclusions. In conclusion, research data metadata are domain-specific objects that serve as valuable research resources, regardless of their usage timing, purpose, data characteristics, or user. Geosciences research data combine laboratory and fieldwork techniques, utilizing technologies like satellites and climate sensors to study Earth’s processes. PANGEA metadata defines Geosciences research data as including observations, experiments, and modeling. Geosciences research data support replication, reinterpretation, and new research across disciplines, showcasing various facets of data reuse in scientific research.

Author Biographies

Alexandre Ribas Semeler, Federal University of Rio Grande do Sul

I currently work as a data librarian at the Institute of Geosciences of the Federal University of Rio Grande do Sul in Brazil. As an independent researcher and data librarian, I have an interdisciplinary interest in data Librarianship. I believe in the fourth paradigm of sciences (e-science and digital humanities) and see the current digital data technologies as great transformation drivers in academic lbraries.

Luana Farias Sales, Instituto Brasileiro de Informação em Ciência e Tecnologia

PhD in Information Science from the Graduate Program at IBICT/UFRJ (2011-2014). Master's in Information Science from the UFF/IBICT agreement (2004-2006), Degree in Library Science and Documentation from the Fluminense Federal University (2003). Productivity scholarship holder Pq-B. Young Scientist of the State of Rio de Janeiro. She is a C&T Analyst at MCTI/IBICT, teaching in the Postgraduate Program in Information Science under the IBICT-UFRJ agreement and at DIECI - Scientific Publishing Division. She is the General Coordinator of the GO FAIR Brazil office.

Adilson Luiz Pinto, Universidade Federal de Santa Catarina

Graduated in Library Science from PUC-Campinas (2000), Master in Information Science from PUC-Campinas (2004) and in Audiovisual Documentation from Universidad Carlos III de Madrid (2006); PhD in Documentation from Universidad Carlos III de Madrid (2007). Member of LEMME Lab and Leader of Metric Studies in Data Librarianship and Geosciences; Editor of the Iberoamerican Journal of Science Measurement and Communication.

Roberta Pereira da Silva de Paula, Instituto Brasileiro de Informação em Ciência e Tecnologia

PhD student in Information Science at PPGCI - IBICT/UFRJ (Start 2020). Master's in Information Science from the IBICT/UFF Agreement (2007). Graduated in Librarianship (2004) and Specialist in Knowledge Organization for Information Retrieval (2005) from UNIRIO. She is currently Head of Library at the Geological Survey of Brazil.

Valquer Cleyton Paes Gandra , Instituto Brasileiro de Informação em Ciência e Tecnologia

Master's student in Information Science at PPGCI IBICT-UFRJ. Postgraduate in UI and UX Digital Product Design from UNOPAR. Bachelor in Library Science from UNIRIO. Postgraduate student in Data Science at UNOPAR. Qualification in Access to Scientific and Technological Health Information from ICICT-FIOCRUZ.

Heloisa Costa, Universidade Federal de Santa Catarina

She has a degree in Library Science from the Federal University of Santa Catarina (UFSC), a specialization in Information Unit Management from the State University of Santa Catarina (UDESC), a PhD and a Master's degree in Information Science from UFSC, in the Postgraduate Program in Information Science (PGCIN-UFSC). She is a substitute lecturer in the Department of Information Science at the Federal University of Santa Catarina. She has experience as a consultant in the management of documentary and bibliographic collections and in the field of Information Science, with an emphasis on the management of information units and document management. She works as a proofreader of documents and academic papers, including ABNT standardization.

References

Backus, G. E. (1996). Foundations of Geophysics. Cambridge University Press.

Bienhold, C.; & Boetius, A. (2015). Porosity in sediment sores from the Central Arctic Ocean during POLARSTERN cruise ARK-XXVII/3 from August-September 2012 [Dataset]. PANGAEA. https://doi.org/10.1594/PANGAEA.849054

Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059-1078; 2012. https://doi.org/10.1002/asi.22634.

Clarkke, F. W. (1924). The data of Geochemistry (5th ed.). United States Geological Survey, Washington Government Printing Office. https://pubs.usgs.gov/bul/0770/report.pdf

Daniels, M. G. (2014). Data reuse in museum contexts: Experiences of archaeologists and botanists [Dissertation]. University of Michigan. http://hdl.handle.net/2027.42/108953

Federer, L.; Lu, Y.; Joubert, D.; Welsh, J. & Brandys, B. (2015, june). Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff. PLOS One. https://10.1371/journal.pone.0129506

Felden, J.; Möller, L.; Schindler, U. et al. (2023). PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data, 10(347). https://doi.org/10.1038/s41597-023-02269-x

Fetter, C. W. (1994). Applied hydrogeology (3rd ed.). Prentice Hall.

Foote, M.; & Miller, A. (2007). Principles of paleontology (3rd ed.). Freeman and Company.

Garrison, T. (2017). Fundamentos de oceanografia. Cengage.

Gastaldello, M.; Agnini, C., Westerhold, T.; Drury, A.; & Alegret, L. (2024). Age model, carbonate mass accumulation rates and benthic foraminifera from ODP Site 175-1085 [Dataset bundled publication]. PANGAEA. https://doi.org/10.1594/PANGAEA.962075

Giertz, S.; & Diekkrüger, B. (2003). Discharge data derived from five water level gauges and discharge measurements in the Aguima and Niaou catchment [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.831196

Gould, S. J. (2002). The structure of evolutionary theory. Belknap Press. https://archive.org/details/TheStructureOfEvolutionaryTheory

Jiao, C.; & Darch, P. T. (2020). The role of the data paper in scholarly communication. Proc Assoc Inf Sci Technol, 57, e316. https://doi.org/10.1002/pra2.316

Jiao, H.; Qiu, Y.; Ma, X.; & Yang, B. (2024). Dissemination effect of data papers on scientific datasets. Journal of the Association for Information Science and Technology, 75(2), 115-131. https://doi.org/10.1002/asi.24843

Jones, P.; Wheeler, D.; Können, G.; Koek, F.; Prieto, M.; & García-Herrera, R. (2007). Climatological observations from ship logbooks between 1750 and 1854 (release 2.1) [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.611088

Jones, R. W. (2011). Applications of paleontology: Techniques and case studies. Cambridge University Press.

Kaleschke, L.; & Müller, G. (2022). Sea ice drift from autonomous measurements from 15 buoys, deployed during the IRO2/SMOSIce field campaign in the Barents Sea March 2014 [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.941334

Keller, E. A.; & Devecchio, D. (2019). Introduction to Environmental Geology. Pearson.

Kim, J. (2020). An analysis of data paper templates and guidelines: Types of contextual information described by data journals. Science Editing, 7(1), 16-23.

Köppen, W. (1931). Grundriss der Klimakunde: Outline of climate science. Walter de Gruyter & Co. https://api.pageplace.de/preview/DT0400.9783111667751_A40793869/preview-9783111667751_A40793869.pdf

Li, K.; & Jiao, C. (2022). The data paper as a sociolinguistic epistemic object: A content analysis on the rhetorical moves used in data paper abstracts. Journal of the Association for Information Science and Technology, 73(6), 834-846. https://doi.org/10.1002/asi.24585

Lyell, C. (1853). Principles of Geology: The modern changes of the earth and its inhabitants (9th ed.). Little, Brown and Company. https://archive.org/details/principlesgeolo00lyelgoog/page/n5/mode/2up

O’Nions, R. K.; Hamilton, P. J.; & Evensen, N. M. (1977). Nd- and Sr- isotope ratios of oceanic basalts [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.721776

Pampel, H. et al. (2013, november 4). Making research data repositories visible: The re3data.org registry. PLOS One. https://doi.org/10.1371/journal.pone.0078080

Phillips, M. (2013). Metadata Analysis at the Command-Line. Code4Lib, 19. https://journal.code4lib.org/articles/7818

Rice, R.; & Southall, S. (2016). The data librarian’s handbook. Facet Publishing.

Rohli, R.; & Viega, A. (2008). Climatology. Jones and Bartlett.

Semeler, A. R. (2024). Reuse of metadata Pangea Data Publisher for Earth & Environmental Science Repository [Dataset]. OSF. osf.io/3bsx2

Shutsko, A.; & Stock, W. (2023). Information scientists’ motivations for research data sharing and reuse. Libri, 73(4), 307-320. https://doi.org/10.1515/libri-2023-0052

Tarbuck, E. J.; Lutgens, F. K.; & Tasa, D. (2015). Earth Science. Pearson.

Tenopir, C. et al. (2015). Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLOS One, 10(8), e0134826. https://doi.org/10.1371/journal.pone.0134826

Uzwyshyn, R. (2016, april). Research data repositories: The what, when, why, and how. Computers In Libraries, 36(3), 18-21. https://www.researchgate.net/publication/304780954_Online_Research_Data_Repositories_the_What_When_Why_and_How

Van de Sandt, S.; Dallmeier-Tiessen, S.; Lavasa, A.; & Petras, V. (2019). The definition of reuse. Data Science Journal, 18(1), Article 22, 1-19. https://doi.org/10.5334/dsj-2019-022

Walters, W. H. (2020). Data journals: incentivizing data access and documentation within the scholarly communication system. Insights: the UKSG journal, 33, Article 18, 1-20. https://doi.org/10.1629/uksg.510

White, W. M. (2013). Geochemistry. Wiley-Blackwell.

Downloads

Published

2025-02-07

How to Cite

Semeler, A. R., Sales, L. F., Pinto, A. L., Paula, R. P. da S. de, Gandra , V. C. P., & Costa, H. (2025). Defining geosciences research data through metadata reuse: : a case study of PANGEA data repository. Biblios Journal of Librarianship and Information Science, (87), e009. https://doi.org/10.5195/biblios.2024.1233