Open Access Open Access  Restricted Access Subscription Access

Challenges Faced by Institutional Repositories in Managing Indian Language Content

Mohanan A, Sreelatha K

Abstract


Institutional Repositories (IRs) play a critical role in preserving and disseminating scholarly output generated within academic institutions. In India’s multilingual environment, managing content in Indian languages presents significant technical and institutional challenges. This paper examines issues related to Optical Character Recognition (OCR), font encoding, metadata creation, user access, and standardization in languages such as Hindi, Malayalam, Sanskrit, Tamil, and Bengali. Using the Institutional Repository of the University of Calicut as a case study, the paper analyzes practical difficulties encountered in archiving theses and scholarly documents in Indic scripts. Many born-digital theses in regional languages rely on non-Unicode fonts, limiting discoverability, indexing, and retrieval. The study outlines strategies adopted to address these problems, including scanning, OCR processing, document cleaning, rasterization, and conversion into Unicode-compliant formats using open-source tools. It also highlights best practices in metadata standardization, copyright management, and repository workflows to improve long-term accessibility and discoverability of Indian language scholarly resources.

Keywords


Metadata and Retrieval;Setting Service Quality Standards

Full Text:

PDF

References


Arora, Jagdish, and Kuldeep Trivedi (2010). Institutional Repositories in India: A Preliminary Study. DESIDOC Journal of Library & Information Technology 30 (6): 10–17. https://doi.org/10.14429/djlit.30.6.615.

Berry, K (2019). LuaTeX: Unicode and advanced typographic control. TUGboat, 40(2), 212–218.

Das, Anup Kumar (2008). Digital Repositories in India: Status and Future. In ICDL 2008 Proceedings. New Delhi: TERI.

Gupta, Richa, Gaurav Sikka, and Gurpreet Kaur (2021). Sanskrit OCR: A Review. Journal of King Saud University – Computer and Information Sciences.

https://doi.org/10.1016/j.jksuci.2021.03.007.

Kumar, V., and M. Doraswamy (2016). Institutional Repositories in Indian Universities: A

Study. Annals of Library and Information Studies 63: 157–64.

Madhusudhan, Margam, and Sangeeta Aggarwal (2011). Open Access Institutional Digital

Repositories in Indian Universities: A Study. International Journal of Information

Dissemination and Technology 1 (4): 193–99.

Roy, Mita, and Umapada Pal (2018). Script Recognition and OCR of Indian Scripts: A Survey. ACM Transactions on Asian and Low-Resource Language Information Processing 17 (1):1–35. https://doi.org/10.1145/3158667.

Saxena, Shivani, and Suman Sanyal (2017). Performance Analysis of OCR Engines for Indian Scripts. International Journal of Computer Applications 169 (6): 34–39.

Soman, K. P., and Pramod, K V (2013). Challenges in OCR of Malayalam Documents. In: Proceedings of the International Conference on Advances in Computing, communications and Informatics (ICACCI). https://doi.org/10.1109/ICACCI.2013.6637211.

Sutradhar, B (2013). Design and Development of Institutional Repository Using DSpace: A Case Study of the Indian Statistical Institute. Program: Electronic Library and Information Systems 47 (4): 345–62. https://doi.org/10.1108/0033033131138732.

Unicode Consortium (2023). The Unicode Standard (Version 15.1) https://www.unicode.org/


Refbacks

  • There are currently no refbacks.


License URL: https://creativecommons.org/licenses/by/4.0/

Informatics Studies |  ISSN: 2583-8954 (Online), 2320-530X (Print)