Experiment of terminology extraction oriented to the construction of a library science thesaurus
DOI:
https://doi.org/10.5195/biblios.2021.969Keywords:
Automatic term extraction, Manual terminology extraction, Terminology, ThesaurusAbstract
Objective. The objective of this artcicle is to evaluate two terminology extraction techniques: manual terminology extraction and automated terminology extraction, to assess the effectiveness of each process in obtaining useful terms for the construction of a library thesaurus.
Method. The methodology used was exploratory-quantitative and was based on two terminology extraction experiments: (1) manual extraction and (2) automated extraction. The manual terminology extraction process was carried out by a professional with multidisciplinary academic training, while the automated terminology extraction process was carried out using WordStat program. Both, manual and automated extraction processes were based on the same corpus, consisting of 283,585 words corresponding to 59 articles about library and information science that were published in the journal Investigación Bibliotecológica during the years 2019 and 2020.
Results. The results show that: manual terminology extraction provided excellent results, 82% of the terms were useful and were established as viable descriptors for the thesaurus. In comparison, automated extraction was a time-consuming process, but only 12% of the terms proved useful and were established as viable descriptors for the thesaurus.
Conclusions. It was found that each of the terminology retrieval techniques was useful but presented differences. While manual extraction required a high investment of human resources and time, its results also showed high effectiveness. In contrast, automated extraction required less human investment and was fast in time, but its results in this experiment were less accurate and useful. It is concluded that experimentation with various terminology extraction techniques is important, associated with the terminology base that is the cornerstone of any controlled vocabulary.References
ABBAGNANO, Nicola. Diccionario de filosofía. México: Fondo de Cultura Económica, 1963.
ARNTZ, Reiner y PICHT, Heribert. Introducción a la terminología. Madrid: Fundación Germán Sánchez Ruipérez, 1995.
AUGER, Pierre y ROSSEAU, Lois Jean. Metodología de la investigación terminológica. Málaga: Universidad de Málaga, 2003.
BARITÉ, Mario. Garantía literaria y normas para construcción de vocabularios controlados: aspectos epistemológicos y metodológicos. Scire: Representación y organización del conocimiento, v. 15, n. 2, pp. 13-24, 2009.
BENAVENT, Paloma y PARRILLA, Sara. Análisis de la extracción automática de términos con el programa informático ExtraTerm. Fòrum de Recerca, n. 12, pp. 1-10, 2006.
BRÄSCHER, Marisa. Semantic relations in knowledge organization systems. Knowledge Organization, v. 41, n.2, pp. 175-180, 2014.
CABRÉ, María Teresa. La teoría comunicativa de la terminología, una aproximación lingüística a los términos. Dans Revue française de linguistique appliquée, v. 14, pp. 9 -15, 2009.
CABRÉ, María Teresa. La terminología: representación y comunicación elementos para una teoría de base comunicativa y otros artículos. Barcelona: Universitad Pompeu Fabra, 1999.
CURRAS, Emilia. Tesauros: manual de construcción y uso. Madrid: Kaher II, 1998.
CHU, Heting. Information representation and retrieval in the digital age. Medford, New Jersey: Information Today, 2010.
CHUNG, Teresa. A corpus comparison approach for terminology extractión. International Journal of Theoretical and Applied Issues in Specialized Communication, v. 9, pp. 221- 246, 2003.
ESTOPÁ, Rosa. Los extractores de terminología logros y escollos. En ALCINA CAUDET, María Amparo, (coord.). Terminología y sociedad del conocimiento. España: Bern: Peter Lang, 2009, pp. 117-146,
GOLUB, Koraljka; TUDHOPE, Douglas.; ZENG, Marcia y ŽUMER, Maja. Terminogy Registries for knowledge organizatión systems: functionality, use, and attributes. Journal of the association for information science and technology, v. 65. n.9, pp. 1901-1016, 2014.
GUINCHAT, Claire y MENOU, Michel. Introducción general a las ciencias y técnicas de la información y de la documentación. París: UNESCO, 1983.
HJØRLAND, Birger. Semantics and knowledge organization. Annual Review of Information Science and Technology, v. 41, n.1, pp. 367-405, 2007.
HODGE, Gail. Systems of knowledge for digital libraries: beyond traditional authority files. Washington: Council on Library and Information Resources, 2000.
INSTITUTO DE INVESTIGACIONES BIBLIOTECOLÓGICAS Y DE LA INFORMACIÓN. Investigación Bibliotecológica [en línea]. Disponible en: http://rev-ib.unam.mx/ib/index.php/ib (Recuperado el 16 de junio 2020).
INTERNATIONAL FEDERATION OF LIBRARY ASSOCIATIONS AND INSTITUTIONS. Functional requeriments for subject autority data (FRSAD). A conceptual model. Washington: IFLA, 2010.
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION. Information and documentation-Thesauri and interoperability with other vocabularies-Part 1: Thesauri for information retrieval. Ginebra, Suiza: ISO, 2011.
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION. ISO:2788-1986.Documentation-Guidelines for the establishment and development of monolingual thesauri. Ginebra, Suiza: ISO, 1986.
KUMBHAR, Rajendra. Library classification in the 21 century. Oxford: Chandos Publishing, 2012.
LAGUENS GARCÍA, José Luis. Tesauros y lenguajes controlados en Internet. Anales De Documentación, v. 9, n.9, pp. 105-121, 2006.
LANCASTER, Frederich. W. El control del vocabulario en la recuperación de la información. Valencia: Universidad de Valencia, 2002.
LÓPEZ MATEO, Coral y OLMO CAZEVIEILLE, Françoise. Metodología para la extracción e identificación de candidatos a términos en el ámbito de la bioquímica. Terminàlia, n.16, pp. 18-28, 2017.
LUNA TRAIL, Elizabeth, VIGUERAS ÁVILA, Alejandra y BAEZ PINAL, Gloria. Diccionario básico de lingüística. México: Universidad Nacional Autónoma de México, 2005.
LUO, Zhiwei, XIE, Rong, CHEN, Wen y YE, Zatao. Automatic domain terminology extraction and its evaluation for domain knowledge graph construction. Web Intelligence, v. 16, n.3, pp. 173-185, 2018.
MARQUES CINTRA, Anna, GONÇALVES MOREIRA TÁLAMO, María, LOPES GINEZ DE LARA, Matilda y YUMIKO KOBASHI, Nair. Para entender as linguagens documentárias. São Paulo: Polis, 2002.
NAUMIS PEÑA, Catalina. Los tesauros documentales y su aplicación en la información impresa, digital y multimedia. Buenos Aires: Alfagrama, 2007.
PROVALIS RESEARCH. WordStat: software de análisis de contenido y minería de textos [en línea]. Disponible en: https://provalisresearch.com/es/products/software-de-analisis-de-contenido/ (Recuperado el 16 de junio 2020).
SAGER, Juan. Curso práctico sobre el procesamiento de la terminología. Madrid: Fundación Germán Sánchez Ruipérez: Pirámide, 1993.
SINCLAIR, John y SINCLAIR; Les. Corpus, concordance, collocation. Oxford: Oxford University Press, 1991.
SMIRAGLIA, Richard. Domain analysis for knowledge organization. Nueva York: Chandos, 2015.
STRZALKOWSKI, Tomek. Natural language information retrieval. Kluwer Academic, 1999.
SUÁREZ SANCHEZ, Adriana. Ontologías: fundamentos y aplicaciones, una aproximación desde la perspectiva bibliotecológica. Ciudad de México: Universidad Nacional Autónoma de México, 2018.
VIVALDI, Jorge y RODRÍGUEZ, Horacio. Improving term extraction by commbining differents techniques. Terminology, v. 7, n. 1, pp. 31-48, 2001.
--
Received-Recibido-Recibido: 2021-04-14
Accepted-Aceptado-Aceitado: 2022-12-09
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons Attribution 4.0 International License or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a prepublication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.
Revised 7/16/2018. Revision Description: Removed outdated link.