India Advances AI Development for Pali Language Through BHASHINI Workshop
India’s Ministry of Electronics and Information Technology (MeitY) has convened a workshop focused on using artificial intelligence to support the preservation and digital accessibility of the Pali language. Held on 10 April 2026 at the University of Delhi, the event brought together government officials, researchers, and language experts to discuss dataset creation, digitisation, and AI model development. According to the official announcement, the initiative forms part of the national BHASHINI programme aimed at expanding multilingual digital services and strengthening support for low-resource and heritage languages.
Focus on AI Support for a Classical Buddhist Language
The BHASHINI Sanchalan/Seva Workshop on Pali Language Preservation and Digital AI Model Development was organised by the Digital India BHASHINI Division under MeitY in collaboration with the Centre for Advanced Studies in Buddhist Studies at the University of Delhi. The programme examined how AI tools can help safeguard and expand access to Pali, an ancient Middle Indo-Aryan language used in early Buddhist texts including the Tripitaka.
Pali holds significant historical and literary importance in South and Southeast Asian Buddhist traditions. However, in modern AI development it is considered a low-resource language, meaning that the datasets required to train language models are limited. Addressing this gap requires structured digitisation of texts, curated datasets, and expert linguistic validation.
The workshop aligns with wider national efforts to expand inclusive language technologies through the BHASHINI multilingual AI platform, which aims to enable digital services and knowledge access across India’s diverse linguistic landscape.
Building Language Datasets and Community Participation
Sessions during the workshop addressed the linguistic relevance of Pali as well as the practical requirements for developing AI models. Discussions focused on creating structured datasets, including text corpora, audio recordings, and digitised manuscripts. These resources are necessary for training translation, speech recognition, and language processing systems.
Participants were introduced to the BhashaDaan platform, which allows language experts and speakers to contribute data and support validation processes. The platform is designed to facilitate community participation in building reliable datasets while maintaining quality assurance frameworks for AI development.
Academic collaboration played a central role in the programme. Scholars from the Centre for Advanced Studies in Buddhist Studies provided subject expertise to ensure linguistic accuracy and appropriate cultural context in the digitisation and modelling processes.
Demonstrating Multilingual AI Applications
The BHASHINI team also demonstrated several language technologies that form part of the platform’s broader ecosystem. These included tools designed for translation, speech processing, and multilingual content adaptation across digital platforms.
Demonstrations included:
- Anuvaad: Text-to-text translation across multiple Indian languages
- Vaanianuvaad: Real-time speech-to-speech and speech-to-text translation
- Lekhaanuvaad: Document translation and digitisation across languages
- Chitraanuvaad: AI-enabled translation and multilingual adaptation of video content
Additional demonstrations included the BHASHINI mobile application, which provides real-time speech understanding and translation capabilities, and a translation plugin designed to enable multilingual functionality on websites and digital platforms. Granthika, another tool showcased during the workshop, supports multilingual processing of parliamentary and institutional documents.
Expanding Multilingual Access Across Digital Public Infrastructure
The workshop also highlighted potential applications of BHASHINI technologies across governance platforms, education systems, and digital public infrastructure. Real-time inferencing, scalable deployment, and application programming interface (API) integration were demonstrated to illustrate how the tools can be incorporated into existing digital services.
These initiatives reflect India’s broader approach to responsible and inclusive AI development, which emphasises multilingual access to knowledge and digital services for diverse communities, as highlighted in broader discussions on inclusive AI adoption. Similar efforts are also underway to apply AI technologies to the preservation of classical and minority languages, including projects focused on AI-based preservation of Pali and heritage languages.
By combining academic expertise, community participation, and government-backed digital infrastructure, the initiative aims to expand the availability of language resources and strengthen AI capabilities for historically underrepresented languages.