Defining geosciences research data through metadata reuse:
a case study of PANGEA data repository
DOI:
https://doi.org/10.5195/biblios.2024.1233Keywords:
Research data, Research data reuse, Metadata, Research data repository, Data Web Scraping, GeosciencesAbstract
Objective. Research data refers to factual records used as primary scientific research resources. Reusing research data metadata provides a new perspective, allowing the presentation of new tests, hypotheses, and new research developments. This study aims to identify the nature of the types of Geosciences research data based on the reuse of metadata from the PANGEA Data Publisher for Earth and Environmental Science available at (https://www.pangaea.de/). The research question to be analyzed is “Can the processes of analyzing and manipulating PANGEA research data metadata be used to define a concept of Geosciences research data?” To address this question, we considered data specification attributes used by data journals to describe the nature of research data: domain of specialization, accessibility, language, data type, acquisition, source location, specific subject area, and related publications.
Method. The methodology in question involved collecting, analyzing, and visualizing PANGEA research data metadata. In total, (426,272) records were downloaded from the data repository and compared to the data specifications used by data journals to describe the nature of research data in data papers. The methodology required the application of techniques and technologies used for descriptive analysis, information retrieval, data manipulation, and visualization of Dublin Core metadata. These techniques were implemented using the Python programming language and other data manipulation software, including OpenRefine and VOSviewer.
Results. The results of our analysis suggest a detailed examination of the metadata for (137,218) research data records from (6) six Geosciences collections. The number of records in the Geochemistry collection is (73,992), in the Atmospheric Sciences collection it is (32,314), in the Paleontology collection it is (25,903), in the Oceanography collection it is (22,287), in the Geophysics collection it is (4,175), and in the Hydrology collection, it is (834). PANGEA's (6) six research data metadata collections allow for the discussion of a concept of Geosciences research data as a type of data on studies related to the Earth, atmosphere, and oceans, across different geo-disciplines. The data come from a range of disciplines, including geochemistry, atmospheric science, paleontology, oceanography, geophysics, and hydrology, using technologies such as satellites, electronics microscopes, climate sensors, ships, computer modeling, and others. In addition, the data are augmented by other sources related to the study of the Earth and its processes.
Conclusions. In conclusion, research data metadata are domain-specific objects that serve as valuable research resources, regardless of their usage timing, purpose, data characteristics, or user. Geosciences research data combine laboratory and fieldwork techniques, utilizing technologies like satellites and climate sensors to study Earth’s processes. PANGEA metadata defines Geosciences research data as including observations, experiments, and modeling. Geosciences research data support replication, reinterpretation, and new research across disciplines, showcasing various facets of data reuse in scientific research.
References
Backus, G. E. (1996). Foundations of Geophysics. Cambridge University Press.
Bienhold, C.; & Boetius, A. (2015). Porosity in sediment sores from the Central Arctic Ocean during POLARSTERN cruise ARK-XXVII/3 from August-September 2012 [Dataset]. PANGAEA. https://doi.org/10.1594/PANGAEA.849054
Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059-1078; 2012. https://doi.org/10.1002/asi.22634.
Clarkke, F. W. (1924). The data of Geochemistry (5th ed.). United States Geological Survey, Washington Government Printing Office. https://pubs.usgs.gov/bul/0770/report.pdf
Daniels, M. G. (2014). Data reuse in museum contexts: Experiences of archaeologists and botanists [Dissertation]. University of Michigan. http://hdl.handle.net/2027.42/108953
Federer, L.; Lu, Y.; Joubert, D.; Welsh, J. & Brandys, B. (2015, june). Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff. PLOS One. https://10.1371/journal.pone.0129506
Felden, J.; Möller, L.; Schindler, U. et al. (2023). PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data, 10(347). https://doi.org/10.1038/s41597-023-02269-x
Fetter, C. W. (1994). Applied hydrogeology (3rd ed.). Prentice Hall.
Foote, M.; & Miller, A. (2007). Principles of paleontology (3rd ed.). Freeman and Company.
Garrison, T. (2017). Fundamentos de oceanografia. Cengage.
Gastaldello, M.; Agnini, C., Westerhold, T.; Drury, A.; & Alegret, L. (2024). Age model, carbonate mass accumulation rates and benthic foraminifera from ODP Site 175-1085 [Dataset bundled publication]. PANGAEA. https://doi.org/10.1594/PANGAEA.962075
Giertz, S.; & Diekkrüger, B. (2003). Discharge data derived from five water level gauges and discharge measurements in the Aguima and Niaou catchment [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.831196
Gould, S. J. (2002). The structure of evolutionary theory. Belknap Press. https://archive.org/details/TheStructureOfEvolutionaryTheory
Jiao, C.; & Darch, P. T. (2020). The role of the data paper in scholarly communication. Proc Assoc Inf Sci Technol, 57, e316. https://doi.org/10.1002/pra2.316
Jiao, H.; Qiu, Y.; Ma, X.; & Yang, B. (2024). Dissemination effect of data papers on scientific datasets. Journal of the Association for Information Science and Technology, 75(2), 115-131. https://doi.org/10.1002/asi.24843
Jones, P.; Wheeler, D.; Können, G.; Koek, F.; Prieto, M.; & García-Herrera, R. (2007). Climatological observations from ship logbooks between 1750 and 1854 (release 2.1) [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.611088
Jones, R. W. (2011). Applications of paleontology: Techniques and case studies. Cambridge University Press.
Kaleschke, L.; & Müller, G. (2022). Sea ice drift from autonomous measurements from 15 buoys, deployed during the IRO2/SMOSIce field campaign in the Barents Sea March 2014 [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.941334
Keller, E. A.; & Devecchio, D. (2019). Introduction to Environmental Geology. Pearson.
Kim, J. (2020). An analysis of data paper templates and guidelines: Types of contextual information described by data journals. Science Editing, 7(1), 16-23.
Köppen, W. (1931). Grundriss der Klimakunde: Outline of climate science. Walter de Gruyter & Co. https://api.pageplace.de/preview/DT0400.9783111667751_A40793869/preview-9783111667751_A40793869.pdf
Li, K.; & Jiao, C. (2022). The data paper as a sociolinguistic epistemic object: A content analysis on the rhetorical moves used in data paper abstracts. Journal of the Association for Information Science and Technology, 73(6), 834-846. https://doi.org/10.1002/asi.24585
Lyell, C. (1853). Principles of Geology: The modern changes of the earth and its inhabitants (9th ed.). Little, Brown and Company. https://archive.org/details/principlesgeolo00lyelgoog/page/n5/mode/2up
O’Nions, R. K.; Hamilton, P. J.; & Evensen, N. M. (1977). Nd- and Sr- isotope ratios of oceanic basalts [Dataset publication series]. PANGAEA. https://doi.org/10.1594/PANGAEA.721776
Pampel, H. et al. (2013, november 4). Making research data repositories visible: The re3data.org registry. PLOS One. https://doi.org/10.1371/journal.pone.0078080
Phillips, M. (2013). Metadata Analysis at the Command-Line. Code4Lib, 19. https://journal.code4lib.org/articles/7818
Rice, R.; & Southall, S. (2016). The data librarian’s handbook. Facet Publishing.
Rohli, R.; & Viega, A. (2008). Climatology. Jones and Bartlett.
Semeler, A. R. (2024). Reuse of metadata Pangea Data Publisher for Earth & Environmental Science Repository [Dataset]. OSF. osf.io/3bsx2
Shutsko, A.; & Stock, W. (2023). Information scientists’ motivations for research data sharing and reuse. Libri, 73(4), 307-320. https://doi.org/10.1515/libri-2023-0052
Tarbuck, E. J.; Lutgens, F. K.; & Tasa, D. (2015). Earth Science. Pearson.
Tenopir, C. et al. (2015). Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLOS One, 10(8), e0134826. https://doi.org/10.1371/journal.pone.0134826
Uzwyshyn, R. (2016, april). Research data repositories: The what, when, why, and how. Computers In Libraries, 36(3), 18-21. https://www.researchgate.net/publication/304780954_Online_Research_Data_Repositories_the_What_When_Why_and_How
Van de Sandt, S.; Dallmeier-Tiessen, S.; Lavasa, A.; & Petras, V. (2019). The definition of reuse. Data Science Journal, 18(1), Article 22, 1-19. https://doi.org/10.5334/dsj-2019-022
Walters, W. H. (2020). Data journals: incentivizing data access and documentation within the scholarly communication system. Insights: the UKSG journal, 33, Article 18, 1-20. https://doi.org/10.1629/uksg.510
White, W. M. (2013). Geochemistry. Wiley-Blackwell.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Alexandre Ribas Semeler, Luana Farias Sales, Adilson Luiz Pinto, Roberta Pereira da Silva de Paula, Valquer Cleyton Paes Gandra , Heloisa Costa
![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons Attribution 4.0 International License or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a prepublication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.
Revised 7/16/2018. Revision Description: Removed outdated link.