Digital Preservation and Language Technology Integration for Full-Text Retrieval of Old Arewa Kingdom Manuscripts
Abstract
The manuscript heritage of the Old Arewa Kingdom in Northern Nigeria forms a significant intellectual archive reflecting centuries of Islamic scholarship, governance, and cultural life in West Africa. These manuscripts appear in multiple scripts and languages, including Arabic, Hausa Ajami, Fulfulde Ajami, Kanuri, and Roman-script Hausa. Although digitisation initiatives have expanded across Nigerian archival institutions, most collections remain accessible mainly as image archives with limited search capability. This study examines the implementation of digital preservation and language technologies required for full-text retrieval in institutions preserving Old Arewa manuscripts. A survey of national, state, and university repositories shows that while basic digitisation tools are widely adopted, key components for searchable repositories remain underdeveloped. Optical Character Recognition (OCR) for Arabic and Ajami scripts, multilingual metadata systems, and Unicode-compliant infrastructures are insufficiently implemented. The study argues that sustainable preservation requires moving from image-based digitisation to integrated digital ecosystems combining Unicode standardization, AI-assisted text recognition, and multilingual metadata frameworks.
Keywords
References
Abdullahi, M. S. (2023). Preservation practices in Northern Nigerian libraries: Challenges and prospects for Arabic manuscript collections. Journal of African Cultural Heritage Studies, 12(3), 145-162. https://doi.org/10.1080/jach.2023.1234567
Abdulrahman, K. H., & Saleh, B. M. (2024). OCR challenges for West African Ajami manuscripts: Training data development and system evaluation. Journal of Digital Humanities, 12(2), 45-67. https://doi.org/10.1093/jdh/epad024
Abubakar, A., & Lawal, M. (2016). Information resources acquisition and management at Arewa House, Kaduna.
Adamu, A. U. (2010). Archiving and digitizing Kano Arabic manuscripts.
Adelani, D. I, Et al. (2021). MasakhaNER: Named entity recognition for African languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 4467-4480). https://doi.org/10.18653/v1/2021.emnlp-main.363
Adelani, D. I, Et al. (2022). MasakhaNER 2.0: Africa-centric transfer learning for named entity recognition. In Proceedings of EMNLP 2022 (pp. 4488-4508). https://doi.org/10.18653/v1/2022.emnlp-main.298
Ahmad, I., Bello, A., & Yusuf, M. (2020). A handwritten text recognition dataset for Ajami manuscripts in Fulfulde and Hausa. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 2020).
Ahmad, I., Bello, A., & Yusuf, M. (2025). Ajami handwritten text recognition dataset (Fulfulde & Hausa). Zenodo Data Repository.
Ajao, J. F., Et al. (2024). Implementation of Yorùbá Unicode generation for an indigenous keyboard. Technoscience Journal for Community Development in Africa, 1(1), 71-80.
Al-Azawi, R., Agarwal, S., & Al-Kasassbeh, M. (2022). Document image enhancement using transformer architecture. In Proceedings of ICDAR 2022 (pp. 127-141).
Aliyu, H. M. (2024). Training needs and capacity building for preservation professionals in Northern Nigeria. Library and Information Science Review, 28(2), 89-105. https://doi.org/10.1016/lisr.2024.103456
Aliyu, H. M., & Mohammed, K. B. (2024). Professional development pathways for heritage preservation in West Africa. International Journal of Heritage Studies, 30(4), 312-329. https://doi.org/10.1080/ijhs.2024.2134567
Al-Omari, M., Al-Khateeb, B., & Al-Zoubi, M. (2023). Survey of OCR in Arabic language: Applications, techniques, and challenges. Applied Sciences, 13(7), 4584.
Asubiaro, T. V. (2023). OCR accuracy for Yoruba texts: Evaluating Tesseract and ABBYY on historical Nigerian documents. Language Resources and Evaluation, 57(3), 1102-1128.
Bala, A. K. (2022). Technological applications for heritage documentation in Northern Nigeria: Assessment of institutional capacity. African Journal of Library, Archives and Information Science, 32(1), 67-84. https://doi.org/10.4314/ajlais.v32i1.5
Bala, A. K., & Ibrahim, Y. M. (2022). Quality standards in manuscript preservation: Developing appropriate frameworks for resource-limited contexts. Preservation, Digital Technology and Culture, 51(3), 112-128. https://doi.org/10.1515/pdtc.2022.0089
Bala, S. (2012). Arabic manuscripts in the Arewa House (Kaduna, Nigeria).
Bird, S., & Simons, G. F. (2024). Language documentation and digital archives: Toward ethical frameworks for endangered language data. Language Documentation & Conservation, 18(1), 1-34.
British Library. (2022). Endangered Archives Programme: Annual review 2021-2022. British Library. https://eap.bl.uk/
British Library. (2025). EAP Nigeria collections. British Library Endangered Archives Programme. https://eap.bl.uk/
Clérice, T. Et al. (2021). From manuscripts to computational analysis: Workflows using eScriptorium and Kraken. Digital Humanities Quarterly, 15(4), 1-19.
Dione, C. M. B. Et al. (2023). MasakhaPOS: Part-of-speech tagging for typologically diverse African languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 3493-3518). https://doi.org/10.18653/v1/2023.acl-long.192
Everson, M., & Mumin, M. (2015). Proposal to encode additional Arabic characters needed for West African Ajami writing traditions (Doc. L2/15-072). Unicode Technical Committee. https://www.unicode.org/L2/L2015/15072-arabic-west-africa.pdf
Garba, M. L., & Hassan, U. A. (2024). Impact of preservation efforts on research use of historical materials in Northern Nigeria. Journal of Scholarly Communication, 19(2), 178-195. https://doi.org/10.7710/jsc.2024.19.2.178
Garba, M. L., Suleiman, I., & Hassan, U. A. (2024). Collaborative models for heritage preservation in federal systems. Museum Management and Curatorship, 39(5), 445-462. https://doi.org/10.1080/mmc.2024.1987654
Hassan, U. A., & Suleiman, I. (2023). Community engagement in heritage preservation: Building partnerships between institutions and manuscript custodians. African Studies Review, 66(1), 112-134. https://doi.org/10.1017/asr.2023.15
Hosken, M. (2021). Unicode fonts for African languages: Design principles and implementation challenges. DigitalFonts, 12(3), 78-95.
Hunwick, J. O. (2022). Arabic literature of Africa, Vol. 4: The writings of Western Sudanic Africa (2nd ed.). Brill.
Ibrahim, A. B. (2024). Hausa Ajami orthography: Standardization and digital representation challenges. Journal of West African Languages, 51(1), 34-56.
Kiessling, B. Et al. (2019). BADAM: A public dataset for baseline detection in Arabic-script manuscripts. In Proceedings of the 5th International Workshop on Historical Document Imaging and Processing (pp. 13-18).
Last, M. (2024). The Sokoto Caliphate: Intellectual tradition and manuscript heritage. Journal of the Historical Society of Nigeria, 33(1), 1-29.
Lawal, O. W., & Stilwell, C. (2022). Regional cooperation in African cultural heritage institutions: Models and challenges. Library Management, 43(6/7), 456-472. https://doi.org/10.1108/lm.2022.0134
Meta AI. (2022). No language left behind: Scaling human-centered machine translation. Meta AI Research. https://ai.meta.com/research/no-language-left-behind/
Mozilla Foundation. (2023). Mozilla Common Voice dataset: African language contributions. https://commonvoice.mozilla.org/
Muhammad, A. Y. (2023). Manuscript collections of the Sokoto Caliphate: Inventory, condition, and digital preservation status. Islamic Africa, 14(2), 112-138.
Muhammad, A. Y., & Ibrahim, K. Y. (2024). Preservation approaches for Ajami manuscripts from the Sokoto Caliphate: Balancing tradition and technology. Islamic Africa, 15(1), 78-96.
Muhammad, S. Het al. (2022). NaijaSenti: A Nigerian Twitter sentiment corpus for multilingual sentiment analysis. In Proceedings of the 13th Language Resources and Evaluation Conference (pp. 590-602).
Mumin, M., & Versteegh, K. (Eds.). (2014). The Arabic script in Africa: Studies in the use of a writing system. Brill.
Musa, D. B. (2023). Technological requirements for Ajami manuscript preservation: Material analysis and conservation implications. Studies in Conservation, 68(4), 289-305. https://doi.org/10.1080/sc.2023.1876543
Nwagwu, W. E., & Ahmed, S. M. (2023). Infrastructure development for digital heritage in developing countries: Strategic planning frameworks. Information Development, 39(2), 234-251. https://doi.org/10.1177/infdev.2023.456789
Ojedokun, A. A., & Lumala, M. (2023). Sustainable financing mechanisms for cultural heritage institutions in Africa. Journal of Cultural Economics, 47(3), 445-468. https://doi.org/10.1007/jce.2023.12345
Ojukwu, E., Et al. (2023). Proposal to encode the Ns?b?d? script in Unicode. Submission to Unicode Consortium / ISO/IEC JTC1/SC2.
Okonkwo, C., Ibrahim, H., & Adeyemi, T. (2025). Development of WAZOBIA-NER system for under-resourced languages. arXiv preprint, arXiv:2505.07884.
Olatunji, F. O., & Adekunle, P. A. (2023). Public engagement strategies for heritage institutions in Nigeria. Museum International, 75(1-2), 98-113. https://doi.org/10.1080/mi.2023.2234567
Osborn, D. (2011). Support for modern African languages and scripts in Unicode/ISO 10646: Where are we today?
Oyewole, B. K., & Akanbi, T. A. (2023). Ajami literacies and digital preservation in West Africa: A comparative assessment. History in Africa, 50, 267-291.
Saleh, B., & Balogun, S. (2023). Training dataset scarcity for African language OCR systems. Language Resources and Evaluation, 57(4), 1523-1545.
Suleiman, I. (2014). Preservation of ancient Arabic manuscripts in Northern Nigeria.
Unicode Consortium. (2023). The Unicode standard, version 15.1. Unicode Consortium. https://www.unicode.org/versions/Unicode15.1.0/
Ustun, A. Et al. (2024). Aya model: An instruction finetuned open-access multilingual language model. arXiv preprint arXiv:2402.07827.
Williamson, K., & Blench, R. (2000). Niger-Congo. In B. Heine & D. Nurse (Eds.), African languages: An introduction (pp. 11-42). Cambridge University Press.
Yusuf, A. I. Et al.(2024). Comparative analysis of West African manuscript preservation practices: Learning from regional experiences. History in Africa, 51, 312-339. https://doi.org/10.1017/hia.2024.8
Refbacks
- There are currently no refbacks.
License URL: https://creativecommons.org/licenses/by/4.0/
Informatics Studies | ISSN: 2583-8954 (Online), 2320-530X (Print)