Text mining in the classification of digital documents

Authors

DOI:

https://doi.org/10.5195/biblios.2016.309

Keywords:

Text mining, Classification, Automated classifier, Bibliographic material

Abstract

Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC). In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.

Author Biography

Marcial Contreras Barrera, Universidad Nacional Autónoma de México – UNAM

Técnico Académico, Subdirección de Informática, Departamento de Producción, Dirección General de Bibliotecas, Universidad Nacional Autónoma de México – UNAM, México.

References

Abbott, D. (10 de Julio de 2013). Introduction to Text Mining. Recuperado el 17 de 6 de 2014, de http://www.vscse.org/summerschool/2013/Abbott.pdf

Abdullah Muhammad, A. (2014). Medical Document Classification Based on MeSH. 2014 47th Hawaii International Conference on System Sciences (págs. 2571 - 2575). Waikoloa, HI: I EEE.

Ananiadou, S., Kell, D. B., & Tsujiii, J.-i. (October de 2006). Text mining and its potential applications in systems biology. (ELSEVIER, Ed.) Trends in Biotechnology, 24(12), 9.

Arkaitz Zubiaga, V. F. (2009). Comparativa de aproximaciones a SVM semisupervisado multiclase para clasificación de páginas Web. Recuperado el 16 de 10 de 2015, de Dialnet: http://dialnet.unirioja.es/servlet/articulo?codigo=2973575

Dey, L., Rastogi, A. C., & Kumar, S. (2006). Generating Concept Ontologies Through Text Mining. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (págs. 23 - 32 ). Hong Kong : IEEE.

Katerina Frantzi, S. A. (August de 2000). Automatic recognition of multi-word terms:. the C-value/NC-value method. (S. Link, Ed.) International Journal on Digital Libraries, 3(2), 115-130.

LAN, Q. (2010). Extraction of News Content for Text Mining Based on Edit Distance. Journal of Computational Information Systems, (págs. 3761-3777).

Lee, S., Baker, J., Song, J., & Wetherbe, J. C. (2010). An Empirical Comparison of Four Text Mining Methods . Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010 (págs. 1-10). Hawaii : IEEE.

Lévano, G. L. (12 de 06 de 2011). Clasificación de colecciones. Recuperado el 12 de 08 de 2013, de http://www.ugel05.edu.pe/

M.Sukanya, S. (2012). Techniques on Text Mining. 2012 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), (págs. 269-271). Ramanathapuram .

Maggini, M., Rigutini, L., & Turchi, M. (2004). Pseudo-Supervised Clustering for Text Documents. Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on (págs. 363 - 369 ). IEEE .

Mahdi Shafiei, S. W. (2007). Document Representation and Dimension Reduction for Text Clustering. Workshop on Text Data Mining and Management (TDMM) in conjuction with 23rd IEEE conference (págs. 770-778). Turquia: IEEE.

Maowen, W., Caidong, Z., Weiyao, L., & QingQiang, W. (2012 ). Text Topic Mining Based on LDA and Co-occurrence Theory. Computer Science & Education (ICCSE), 2012 7th International Conference on (págs. 525 - 528 ). Melbourne, VIC : IEEE .

Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic keyword extraction from individual documents. En J. K. Michael W. Berry, Text mining : applications and theory. New Jersey: Mic hael W. Berry and Jacob Kogan.

Salton, G. (1989). Automatic text processing : The transformation, analysis, and retrieval of information by computer. E.U.A: Eddison Wesley.

Salton, G., & Mcgill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

Swanson, D. R. (1991). Complementary structures in disjoint science literatures. In Proceedings of the 14th Annual International ACM/SIGIR Conference, 280-289.

Swanson, D., & Smalhaiser, N. (1994). Assessing a gap in the biomedical literature: magnesium deficiency and neurologic disease. Neuroscience research communications, 15, 1-9.

Verma, V. K., Ranjan, M., & Mishra, P. (2015). Text mining and information professionals: Role, issues and challenges . Emerging Trends and Technologies in Libraries and Information Services (ETTLIS), 2015 4th International Symposium on (págs. 133 - 137 ). Noida : IEEE .

Wang, Z. (2010). Document Classification Algorithm Based on Kernel Logistic Regression. Industrial and Information Systems (IIS), 2010 2nd International Conference on (Volume:1 ) (págs. 76 - 79 ). Dalian : IEEE.

Wei, W., & Barnaghi, P. M. (23 de sep de 2013). University of Surrey. Recuperado el 15 de octubre de 2015, de http://epubs.surrey.ac.uk/533646/

Xiu-Li, P., Feng, Y.-Q., & Jiang, W. (2007). An improved document classificaction approach with maximum entropy and entropy feauture selection. 2007 International Conference on Machine Learning and Cybernetics (págs. 3911-3915). Hong Kong: IEEE.

Zhang, Y., & Gu, H. (2011). Text Mining with Application to Academic Libraries. En Computer Science for Environmental Engineering and EcoInformatics (págs. 200-205). Springer Berlin Heidelberg.

Published

2016-11-21

How to Cite

Contreras Barrera, M. (2016). Text mining in the classification of digital documents. Biblios Journal of Librarianship and Information Science, (64), 33–43. https://doi.org/10.5195/biblios.2016.309

Issue

Section

Original