Integration strategy for impact metrics of institutional academic output through a Data Warehouse
a case study with OpenAlex, OpenAIRE, and COAR
DOI:
https://doi.org/10.5195/biblios.2025.1348Keywords:
Institutional academic output, Persistent identifiers, Responsible metrics, Data Warehouse, Data VaultAbstract
Objective. This article proposes a strategy for integrating data from multiple sources on academic output, facilitating informed decision-making. The approach is adaptable to various organizations, regardless of the number or type of sources involved. Method. An integration system was designed based on open-source tools and a scalable hybrid data model. It combines Data Warehouse techniques (Kimball & Ross) to optimize analysis, and Data Vault 2.0 to manage heterogeneity and ensure traceability, enabling flexible integration. Results. Data from OpenAIRE, OpenAlex, and COAR were integrated into a unified academic publications table, consolidating key metrics such as citations, views, and downloads. The table includes relevant information such as title, DOI, publication type and year, as well as open access status. Conclusions. Data integration enables a more comprehensive view of the impact of institutional scientific output. This approach supports the implementation of responsible metrics.
References
Add seeds to your DAG. (2025, Abril 3). dbt Developer Hub. Recuperado el Abril 4, 2025, de https://docs.getdbt.com/docs/build/seeds
Aghassibake, N., Castello, O. G., Gujilde, P., & Rabun, S. (2023). Visualizing institutional activity using persistent identifier metadata. Information Services & Use, 43(3-4), 335–342. https://doi.org/10.3233/ISU-230218
Albuquerque, P. C. (2024a). PabloDeAlbu/dbt-scholar [Software]. GitHub. https://github.com/PabloDeAlbu/dbt-scholar
Albuquerque, P. C. (2024b). PabloDeAlbu/kedro-scholar [Cuaderno Jupyter]. GitHub. https://github.com/PabloDeAlbu/kedro-scholar
Albuquerque, P. C., Villarreal, G. L., & De Giusti, M. R. (2021, Junio 22–25). Proposal of a data warehouse for scholarly institutions built on institutional repositories [Objeto de conferencia]. IX Jornadas de Cloud Computing, Big Data & Emerging Topics, La Plata, Buenos Aires, Argentina. http://sedici.unlp.edu.ar/handle/10915/125161
Albuquerque, P. C., Villarreal, G. L., & De Giusti, M. R. (2022, Octubre 3-7). WebID como base para el desarrollo de una marca personal en repositorios institucionales [Objeto de conferencia]. XI Conferencia Internacional de Bibliotecas y Repositorios Digitales (BIREDIAL-ISTEC), Costa Rica. http://sedici.unlp.edu.ar/handle/10915/145739
Albuquerque, P. C., Villarreal, G. L., & De Giusti, M. R. (2023, Octubre 18-20). Modelo dimensional para la medición de la producción académica [Objeto de conferencia]. XII Conferencia Internacional de Bibliotecas y Repositorios Digitales (BIREDIAL-ISTEC), Montevideo, Uruguay. http://sedici.unlp.edu.ar/handle/10915/161906
Apache Superset. (2025). Apache Superset™ is an open-source modern data exploration and visualization platform. Recuperado el Abril 4, 2025, de https://superset.apache.org/
Bollini, A., Knoth, P., Perakakis, P., Rodrigues, E., Shearer, K., Sompel, V. de, & Walk, P. (2017). Next generation repositories: Behaviours and technical recommendations of the COAR Next Generation Repositories Working Group (Version 2) [Original report]. Zenodo. https://doi.org/10.5281/zenodo.8077381
Cabezas-Clavijo, A., & Torres-Salinas, D. (2021). Bibliometric reports for institutions: Best practices in a responsible metrics scenario. Frontiers in Research Metrics and Analytics, 6, Article e696470. https://doi.org/10.3389/frma.2021.696470
Carletti, E., Rucci, E., & Villarreal, G. L. (2024, Octubre 22-24). HERA 2.0: Más funcionalidad para la evaluación de recursos académicos [Objeto de conferencia]. XIII Conferencia Internacional de Bibliotecas y Repositorios Digitales (BIREDIAL-ISTEC), Santiago de Chile, Chile. http://sedici.unlp.edu.ar/handle/10915/177287
Ciuciu-Kiss, J. T., & Garijo, D. (2024, May 27). Assessing the overlap of science knowledge graphs: A quantitative analysis [Conference paper]. International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs, Hersonissos, Crete, Greece. In G. Rehm, S. Dietze, S. Schimmler, & F. Krüger (Eds.), Natural scientific language processing and research knowledge graphs, Lecture Notes in Computer Science (Vol. 14770, pp. 171-185). Springer. https://doi.org/10.1007/978-3-031-65794-8_11
Cuartas, G. V., Tirado, A. U., Restrepo-Quintero, D., Gutiérrez, J. O., Pallares, C., Gómez-Molina, H. F., Suárez-Tamayo, M., & Calle, J. (2019). Hacia un modelo de medición de la ciencia desde el Sur Global: Métricas responsables. Palabra Clave, 8(2), Artículo e068. https://doi.org/10.24215/18539912e068
Data catalog. (2025). Kedro. Recuperado el Julio 22, 2025, de https://docs.kedro.org/en/1.0.0/catalog-data/introduction/
Dhaouadi, A., Bousselmi, K., Gammoudi, M. M., Monnet, S., & Hammoudi, S. (2022). Data warehousing process modeling from classical approaches to new trends: Main features and comparisons. Data, 7(8), Article 113. https://doi.org/10.3390/data7080113
Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021, September). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070
Filtering search results. (2025). OpenAIRE Graph Documentation. Recuperado el Julio 22, 2025, de https://graph.openaire.eu/docs/10.3.0/apis/graph-api/searching-entities/filtering-search-results/
Harder, R. (2024, June). Using Scopus and OpenAlex APIs to retrieve bibliographic data for evidence synthesis: A procedure based on Bash and SQL. MethodsX, 12, Article 102601. https://doi.org/10.1016/j.mex.2024.102601
Hogan, A., Blomqvist, E., Cochez, M., D’Amato, C., Melo, G. de, Gutierrez, C., Kirrane, S., Gayo, J. E. L., Navigli, R., Neumaier, S., Ngomo, A. C. N., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys, 54(4), Article 71. https://doi.org/10.1145/3447772
Kimball, R., & Ross, M. (2013). The data warehouse lifecycle toolkit (3rd ed.). John Wiley & Sons.
Linstedt, D., & Olschimke, M. (2015). Building a scalable data warehouse with Data Vault 2.0 (1st ed.). Morgan Kaufmann.
Manghi, P., Bardi, A., Atzori, C., Baglioni, M., Manola, N., Schirrwagen, J., & Principe, P. (2019). The OpenAIRE research graph data model (Version 1.3) [Original report]. Zenodo. https://doi.org/10.5281/zenodo.2643199
Öztürk, O., Kocaman, R., & Kanbach, D. K. (2024). How to design bibliometric research: An overview and a framework proposal. Review of Managerial Science, 18, 3333-3361. https://doi.org/10.1007/s11846-024-00738-0
Priem, J., Piwowar, H., & Orr, R. (2022, May 4). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts [Preprint arXiv]. Submitted to the 26th International Conference on Science, Technology and Innovation Indicators (STI 2022), Granada, Spain. arXiv. https://doi.org/10.48550/arXiv.2205.01833
Searching entities. (2025). OpenAIRE Graph Documentation. Recuperado el Julio 22, 2025, de https://graph.openaire.eu/docs/apis/graph-api/searching-entities/
Silva, V. S., Matas, L., Moreira, T., & Segundo, W. C. (2022). An ETL strategy for integrating the LA Referencia platform and VIVO for the Brazilian CRIS. Procedia Computer Science, 211, 111-117. https://doi.org/10.1016/j.procs.2022.10.182
Tomczyńska, A., Ostrowska, S., Protasiewicz, J., & Podwysocki, E. (2023, June 15). Beyond CRIS: A research and higher education information system in Poland [Paper]. EUNIS 2023 Annual Conference, Vigo, Spain. http://hdl.handle.net/11366/2477
Universidad Nacional de La Plata. (2025). OpenAlex. Recuperado el Abril 4, 2025, de https://openalex.org/institutions/i874386039
Use a Jupyter notebook for Kedro project experiments. (2024). Kedro. Recuperado el Abril 4, 2025, de https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks.html
Works overview: Schema reference for Works entities. (2025). OpenAlex. Recuperado el Abril 4, 2025, de https://docs.openalex.org/api-entities/works/work-object
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Pablo César de Albuquerque, Gonzalo Luján Villarreal

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons Attribution 4.0 International License or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a prepublication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.
Revised 7/16/2018. Revision Description: Removed outdated link.



