04/05/23

Pāṇini’s Sanskrit for AI

exploring NASA scientist’s 1985 paper on Knowledge Representation in Sanskrit and AI

“Sanskrit in a manner that is identical not only in essence but in form with current work in Artificial Intelligence.”

 – Rick Briggs’ statement on Sanskrit language

NASA has been researching Sanskrit for AI for more than two decades. Here is an excerpt:

My history with Sanskrit

My interest in Sanskrit started when I was a child – reciting mantras with my dad in the mornings, or religious texts with my mum at the temple in the evenings. I grew up a traditional hindu-punjabi in India, learning Devanāgarī – brahmic family of scripts of India, Nepal, Tibet, and Southeast Asia. I grew up studying, learning, and writing in Hindi and Punjabi, so it gave me the base to read a bunch of other languages. Fast forward to undergrad in Canada, I chose to take two semesters of Sanskrit in my last year. Partly because it had been a long time since I had learned something outside of a western-centric perspective, and partly because my parents formally studied Sanskrit in school and I wanted to be able to talk to them about it.

Who’s Pāṇini

Pāṇini (c. 4th century BCE) was an ancient Indian grammarian and linguist who is widely regarded as one of the most important figures in the development of Sanskrit language and grammar. He is best known for his work "Ashtadhyayi" (Eight Chapters), which is considered to be the most comprehensive and systematic exposition of Sanskrit grammar ever written.

Pāṇini's Ashtadhyayi consists of 3,976 rules and aphorisms that define the morphology and syntax of Sanskrit language. The rules are organized in a highly logical and structured manner, and the work is considered a masterpiece of linguistic analysis and systematization.

Pāṇini's work had a profound influence on the development of not only Sanskrit, but also on other Indian languages, as well as on the study of linguistics and grammar in general. He is widely considered to be one of the greatest grammarians of all time, and his work continues to be studied and admired by scholars around the world.

Why Sanskrit is cool

My old Sanskrit homework demonstrating its grammar complexity

Sanskrit has a unique structure and complexity that sets it apart from other languages. It is a highly inflected language, meaning that words can be modified through the use of prefixes, suffixes, and infixes to convey grammatical information such as tense, case, and gender. This allows for great flexibility in sentence construction and word usage, making Sanskrit an incredibly expressive and poetic language.

In addition to its linguistic complexity, Sanskrit is also known for its philosophical depth. Many important religious and philosophical texts were written in Sanskrit, such as the Upanishads, which explore the nature of reality and the self. Sanskrit is also the language of yoga, and many of the foundational texts and concepts of yoga, such as the Yoga Sutras of Patanjali, were written in Sanskrit. (I’ll write about the commodification of Yoga another time) 

Sanskrit's accuracy in expressing scientific and mathematical concepts is also widely recognized. The language has precise and consistent definitions for many scientific and mathematical terms, making it a valuable language for study in fields such as computer science, linguistics, and artificial intelligence. For example, the concept of zero, which is essential to modern mathematics, was first introduced in Sanskrit.

Why it’s considered good for computer programming

  1. Structured grammar: Sanskrit has a highly structured grammar with a well-defined set of rules for sentence construction, word formation, and sound combinations. This makes it an ideal language for writing computer programs, which also rely on a structured and logical approach.

  2. Concise expression: Sanskrit is known for its ability to convey complex ideas in a concise and precise manner. This is an important quality for programming languages, which must be able to express complex algorithms and processes in a compact and efficient way.

  3. Flexibility: Sanskrit is a highly flexible language that allows for a great deal of word order variation without changing the meaning of a sentence. This is important for programming languages, which often require flexibility in syntax to accommodate different programming styles and approaches.

  4. Unambiguousness: Sanskrit has a highly unambiguous syntax, which is important for programming languages that rely on precise instructions and commands.

“Knowledge Representation in Sanskrit and Artificial Intelligence” 

NASA scientist Rick Briggs wrote this paper in 1985 in AI Magazine. He argues that the belief that natural languages are not suitable for transmitting logical data is false. He goes on to suggest that Sanskrit, a language with a long literary, philosophical, and grammatical tradition, can be used as an artificial language. The grammarians of Sanskrit developed a method for paraphrasing the language that is identical in form and essence to current work in artificial intelligence. The article outlines the Knowledge Representation Scheme using Semantic Nets, the method used by the grammarians to analyze sentences unambiguously, and demonstrates the parallelism between the two.

Deepak Kumar, a computer science professor at Bryn Mawr College, Pennsylvania, notes a trend in AI during the 1960s-1980s where efforts were made to capture the meaning of English. The author discovered this information and another reference to Rick Briggs in a follow-up article to his 1985 paper, which was not theoretical work but a “review of the First National Conference on Knowledge Representation and Inference in Sanskrit (KRIS).”

Briggs' 1985 paper was the origin of the KRIS conference. In December 1986, prominent figures in Indian computer science invited Briggs to Bangalore to discuss his ideas. Although Deepak Kumar did not attend, he co-authored a paper presented at the conference that was deemed significant because it explored the potential use of Sanskrit for NLP, according to Briggs' report.

Sanjana Ramachandran hunted down Kumar 

On our call, Kumar explained the ideas that sparked it all. “If I say ‘This pen is black’, any phone or computer will be able to translate that into another language,” he said. “But if I turned around and asked, ‘Now that you know that, what is the colour of this pen,’ it wouldn't be able to answer that. This is because the AI technology in question is only translating, without any conception of the meaning of the sentence.”

“The way machine translation is done right now is purely driven by statistical and machine learning techniques,” Kumar continued. Large amounts of data in the most commonly used languages are generated every second on the internet. With enough parallel data between two languages, ML-based AI can spot patterns to know that “black” in English, for instance, appears as “kaala” in Hindi, without having to know that these are colours or, further, that colours are properties of the entity ‘pen.’

Knowledge representation, on the other hand, is concerned with making machines understand and infer from information. Instead of learning patterns “bottom up” from big data, it aims to code rules so that the machine knows how to interpret information from the “top down.” This kind of rule-based engine belongs to a collection of methods in artificial intelligence called Symbolic AI. “Because Sanskrit had this formal language and formal grammar,” said Kumar, “whatever you say could fit into a knowledge representation system.”



Briggs' and Kumar's papers in the 1980s explored how Sanskrit's algorithmic grammar could be used to create a rule-based or Symbolic AI engine to process natural language, which was the prevailing theory at that time. However, with advancements in technology, the dominance of Symbolic AI has been replaced by data-driven approaches like Deep Learning and Neural Networks.

Paul Kiparsky explains that in Sanskrit, each sentence is perceived as a small play that involves an Agent, who is the doer, and other actors such as Recipient, Goal, Instrument, Location, and Source. These six categories represent the sentence's meaning and their relationships, regardless of the specific words used in the sentence.

Example of middle Dhātu (verbs) from my Sanskrit workbook

Pāṇini’s Ashtadhyayi with 3,976 rules provide an outline of how classical Sanskrit words are formed using approximately 2000 base verbal roots, also known as dhatus. The Ashtadhyayi functions as an algorithm, with each dhatu being generated from specific linguistic units, referred to as phonemes and morphemes. These linguistic units are consumed by the algorithm to produce words and sentences in classical Sanskrit.

Final thoughts

Pāṇini gave us an eternal gift with Sanskrit and its grammaticals, perpetually influencing the field of linguistics and computing. The rules and procedures that he developed have been used as a model for the development of programming languages – for example, the syntax and grammar of C++ and Java are based on Pāṇini’s grammar. In fact, the concepts of grammar, syntax, and parsing that are used in computer science are all rooted in Pāṇini’s work. His work has had a profound impact on the way we think about language and has provided a foundation for many of the developments in computing that we see today.

Sanskrit programming languages

Mayank Kumar created Om Lang, Sanskrit and multilingual programming language that supports 9+ languages – hindi, tamil, kannada, marathi, bengali, telugu, malayalam to name some).

Pt. Prashant Tripathi developed a programming language in Sanskrit called Vedic (taking inspiration from Christopher Nolan’s Interstellar): 

Sample code

Output

I’ll leave you with my favourite sanskrit verse from the Bhagavad Gita: 

Chapter 2, Verse 47—

कर्मण्येवाधिकारस्ते मा फलेषु कदाचन।
मा कर्मफलहेतुर्भूर्मा ते सङ्गोऽस्त्वकर्मणि॥ २-४७

In Roman scripts—

Karmanye vadhikaraste Ma Phaleshu Kadachana,
Ma Karmaphalaheturbhurma Te Sangostvakarmani

The meaning of the verse is—
You have the right to work only but never to its fruits.
Let not the fruits of action be your motive, nor let your attachment be to inaction.



This piece is 5/50 of my 50 days of learning/writing. Subscribe to hear about new posts.