Indo-Aryan languages

by George Cardona

Indo-Aryan languages, also called Indic languages, subgroup of the Indo-Iranian branch of the Indo-European language family. In the early 21st century, Indo-Aryan languages were spoken by more than 800 million people, primarily in India, Bangladesh, Nepal, Pakistan, and Sri Lanka.

General characteristics

Linguists generally recognize three major divisions of Indo-Aryan languages: Old, Middle, and New (or Modern) Indo-Aryan. These divisions are primarily linguistic and are named in the order in which they initially appeared, with later divisions coexisting with rather than completely replacing earlier ones.

Old Indo-Aryan includes different dialects and linguistic states that are referred to in common as Sanskrit. The most archaic Old Indo-Aryan is found in Hindu sacred texts called the Vedas, which date to approximately 1500 bce. There is a clear-cut difference between Vedic and post-Vedic Sanskrit in that the former has certain formations that the latter has eliminated. The grammarian Pāṇini (c. 5th–6th century bce) appropriately distinguishes between usage proper to the language of sacred texts (chandas, locative sg. chandasi)—that is, Vedic usage—and what occurs in the spoken language (bhāṣā, locative sg. bhāṣāyām) of his time. Other distinctions are also made within the language, so scholars speak of Classical Sanskrit and Epic Sanskrit. Despite differences in genre, however, the Sanskrit found in such works generally agrees with the language Pāṇini describes. So-called un-Pāṇinian forms not only reflect the influence of vernaculars but also continue a freedom of usage—referred to as ārṣaprayoga (usage of ṛṣis)—already to be seen in aspects of the living spoken language Pāṇini described.

Middle Indo-Aryan includes the dialects of inscriptions from the 3rd century bce to the 4th century ce as well as various literary languages. Apabhraṃśa dialects represent the latest stage of Middle Indo-Aryan development. Though all Middle Indo-Aryan languages are included under the name Prākrit, it is customary to speak of the Prākrits as excluding Apabhraṃśa.

Uncertainties regarding the course of Indo-Aryan migration make it difficult to determine the domain of Proto-Indo-Aryan, the ancestral language of all the known Indo-Aryan tongues, if indeed there was any such single region (see Indo-Iranian languages). All that can be said with certainty is that the Indo-Aryan speakers on the Indian subcontinent first occupied the area comprising most of present-day Punjab state (India), Punjab province (Pakistan), Haryana, and the Upper Doab (of the Ganges–Yamuna Doab) of Uttar Pradesh. The structure of Proto-Indo-Aryan must have been similar to that of early Vedic, albeit with dialect variations.

A wide variety of New Indo-Aryan languages are currently in use. According to the 2001 census of India, Indo-Aryan languages accounted for more than 790,625,000 speakers, or more than 75 percent of the population. By 2003 the constitution of India included 22 officially recognized, or Scheduled, languages. However, this number does not distinguish among many speech communities that could legitimately be considered distinct languages. For example, the Hindi census category includes not only Hindi proper (about 422,050,000 speakers in 2001) but also such languages as Bhojpuri (about 33,100,000), Magahi (about 13,975,000), and Maithili (more than 12,175,000).

Other Indo-Aryan languages that have been officially recognized in the constitution are as follows (the approximate numbers of speakers for each are drawn from the census report of 2001): Asamiya (Assamese, about 13,175,000 speakers), Bangla (Bengali, 83,875,000), Gujarati (46,100,000), Kashmiri (5,525,000), Konkani (2,500,000), Marathi (71,950,000), Nepali (2,875,000), Oriya (33,025,000), Punjabi (29,100,000), Sindhi (2,550,000), and Urdu (51,550,000).

Some of the Indo-Aryan languages are used by relatively few speakers; others are used as the media of education and of official transactions. Hindi written in the Devanāgarī script is one of two official languages of the Republic of India (the other is English). It is widely used as a lingua franca throughout northern India, including Haryana and Madhya Pradesh, and in parts of the South. Asamiya, Bangla, Oriya, Punjabi, Gujarati, and Marathi are the state languages of Assam, West Bengal, Orissa, Punjab, Gujarat, and Maharashtra, respectively. There are other Modern Indo-Aryan languages with large numbers of speakers in India, though they lack official recognition; examples include various languages spoken in Rajasthan (e.g., Marwari, Mewari); several Pahari languages, spoken in Himachal Pradesh, Uttarakhand, and Sindhi, spoken by Sindhis in various parts of India. Each of the major state languages has several dialects in addition to the standard dialect adopted for official purposes, and Hindi has not only dialects but also several varieties according to the mother tongue of the area; e.g., Bombay Hindi and Calcutta Hindi.

Many New Indo-Aryan languages also have official status outside India. Urdu written in Perso-Arabic script is the official language of Pakistan, where it is spoken by most of the population as either a first or a second language. Structurally and historically, Hindi and Urdu are one, although they are now official languages of different countries, are written in different alphabets, and have been developing in divergent manners. The term hindī (also hindvī) is known from as early as the 13th century ce. The term zabān-e-urdū ‘language of the imperial camp’ came into use about the 17th century. In the south, Urdu was used by Muslim conquerors of the 14th century.

Bangla is the official language of Bangladesh, where it has approximately 107 million native speakers—a figure that nearly doubles when those who speak Bangla as a second language are included. Nepali is the official language of Nepal, where there are approximately 11.1 million speakers, and Nepali is also spoken by 3 to 4 million speakers in the Himalayan region west of Nepal. Sinhala (Sinhalese) has approximately 13.5 million speakers in Sri Lanka, where it has been an official language since 1956.

Characteristics of Old Indo-Aryan texts

The most archaic stage of Old Indo-Aryan is represented by the Sanskrit of the Vedas, of which there are four major text groups called saṃhitās: the Ṛgveda (“The Veda Composed in Verses”), the Sāmaveda (“Knowledge of the Chants”), the Yajurveda (“Knowledge of the Sacrifice”), and the Atharvaveda (“Knowledge of the Fire Priest”). The Yajurveda is in turn divided into two main branches, the White (śukla) Yajurveda and the Black (ḳṛṣṇa) Yajurveda. All of these Vedic texts, however, are represented by different recitational traditions in what are called śākhās (branches) and which Western philologists refer to as recensions (see also Hinduism: Sacred texts).

The texts of the Black Yajurveda contain both verses used in rituals (called mantras) and prose sections that are explanatory in nature and that include legends, mythological explanations of rites and the objects and deities associated with these rites, and other matters, together with etymologies—accounts of the derivations of words—to explain why certain things bear particular names. These texts are known collectively as the Brāhmaṇas. Each Veda has one or more brāhmaṇa connected with it. In addition, there are more philosophical Vedic works, the Upaniṣads (“Sessions”) and the Āraṇyaka (“Books of the Forest”).

Also associated with the Vedas are ancillary works referred to as the six Vedāṅgas (“Limbs of the Veda”). Among these are texts generally referred to as kalpas (procedures), which are in turn made of several standard components. For instance, the principal aim of the components called Śrauta-sūtras (“Revelation sutras”) is to provide instructions about ritual performance. Works on astronomy (jyautiṣa) serve to assist in determining the appropriate times for ritual performances. Metrics (chandoviciti), the earliest work in which is ascribed to Piṅgala, describe metrical patterns, a knowledge of which is necessary for the proper understanding of the Vedic mantras.

The remaining three Vedāṅgas are more linguistic. The niruktas explain the etymology of words found in the Vedas by deriving them from verbal bases, thus showing how their meanings reflect association with particular actions. The earliest and most important of such works is the Nirukta of Yāska, commenting on sets of words in a collection called Nighaṇṭu (“Etymology”). The śikṣā (phonetics) deal with the proper pronunciation of Sanskrit. Details of speech production are also found in works called prātiśākhya, which deal with the classification of sounds into phonological classes and with phonological rules serving to derive the continuously recited versions (saṃhitāpāṭha) of the Vedas from posited analyzed texts (padapāṭha). The most ancient of these works are the Ṛgvedaprātiśākhya and Taittirīyaprātiśākhya, respectively associated with the Ṛgveda and the Taittirīyasaṃhitā (“Recension of the Black Yajurveda”); the Vājasaneyiprātiśākhya is associated with the Vājasaneyisaṃhitā (“Recension of the White Yajurveda”). The first two of these show no influence of Pāṇinian techniques and stand a good chance of being pre-Pāṇinian; the last is fairly certain to be post-Pāṇinian, at least in part.

Grammars (vyākaraṇas) concern the description of speech forms (śabda) considered to be correct (sādhu) through derivation and thereby serve to make understood the usage found in the Vedas. The grammar that was granted the status of a Vedāṅga is that of Pāṇini. This work is referred to in toto as a śabdānuśāsana (means of instruction of correct speech forms); since the core of Pāṇini’s work comprises the eight chapters of sūtras that serve to describe both the current language of his time and features particular to Vedic, it also bears the name Aṣṭādhyāyī (“Collection of Eight Chapters”).

The accepted cultivated speech of the contemporary language that Pāṇini describes in his Aṣṭādhyāyī must have coexisted with more vernacular varieties of speech in which there were features belonging to the Middle Indo-Aryan division of the language group. Several facts support this view. The earliest texts available already show evidence of Middle Indo-Aryan. For example, vikaṭa- ‘deformed,’ found in the Ṛgveda (vocative singular feminine vikaṭe), is to be explained as representing a Middle Indic development of earlier vikṛta-, with -aṭ- instead of -ṛt-. The spoken language Pāṇini describes also reflects Middle Indo-Aryan influence. For example, a word for ‘jackal’ has a mixed paradigm, with forms typical of -ṛ-stems of the type kartṛ- ‘doer’ in the nominative and accusative singular (kroṣṭā, kroṣṭāram, cf. kartā, kartāram) and dual (kroṣṭārau, cf. kartārau) and the nominative plural (kroṣṭāraḥ, cf. kartāraḥ), but an -u-stem in the accusative plural (kroṣṭūn) as well as before consonantal endings (e.g., instrumental-dative-ablative dual kroṣṭubhyām, instrumental plural kroṣṭubhiḥ), and forms of either stem alternatively in forms such as the instrumental singular (kroṣṭrā, kroṣṭunā) and others with vocalic endings (e.g., dative singular kroṣṭre, kroṣṭave). This reflects a Middle Indic development of ṛ to u, and forms such as kroṣṭunā are comparable to Pāli pitunā ‘father’ (instrumental singular), which also is part of a mixed paradigm.

The Pāṇinian commentator Kātyāyana (c. 3rd–4th century bce) knew of the coexistence of Middle Indic forms with earlier ones. There is a Pāṇinian rule that provides that verb bases listed in an appendix to the Aṣṭādhyāyī have the class name dhātu (verbal base, root). Kātyāyana discusses whether one could define verbal bases semantically and thereby possibly do without the verb list. He remarks that even if one defines a verbal base as denoting an action, the roots must be listed in order to preclude the possibility that constituents of terms such as āṇapayati/āṇavayati ‘commands’ be assigned the class name in question; āṇapayati/āṇavayati is a Middle Indic counterpart of Sanskrit ājñāpayati.

Commenting on what Kātyāyana said, Patañjali (mid-2nd century bce), adds the examples vaṭṭati and vaḍḍhati, which correspond to Sanskrit vartate ‘occurs, is’ and vardhte ‘grows’; these forms show the use of the active ending -ti instead of the middle ending -te as well as -ṭṭ- and -ḍḍh- for -rt- and -rdht-. Patañjali also explained that to speak flawless Sanskrit (as described by Pāṇini) one should imitate the correct speakers (called śiṣṭa ‘learned, educated, elite’) of Āryāvarta (‘Country of the Aryans’). Moreover, Patañjali noted that one should study grammar in order to learn not to correct words such as helayaḥ instead of herayaḥ (a phrase used in calling to people) or gāvī instead of gauḥ ‘cow’; gāvī is a Middle Indo-Aryan word. Such evidence lends support to the view that by the 6th or 5th century bce Sanskrit (as a medium of communication between members of a particular social stratum) coexisted with Middle Indo-Aryan dialects, and that depending on the circumstances either the higher or the more vernacular forms of speech were used. Further, the Pāli canon records that the Buddha enjoined his followers to use the vernaculars in communicating his teachings, and the Jaina canon identifies Ardhamāgadhī as the language to be employed for communicating the teachings of Mahāvīra. Similarly, Aśoka used Middle Indo-Aryan, not Sanskrit, in the inscriptions he ordered written throughout his kingdom; Sanskrit does not appear on inscriptions until the early centuries of the Common Era (e.g., Rudravarman’s inscription at Junagarh, about 150 ce). The coexistence of Old Indo-Aryan and Middle Indo-Aryan is thus to be accepted from the Vedic times onward.

The current language Pāṇini describes is very close in structure to the late Vedic found in certain Brāhmaṇa texts. As noted earlier, scholars have recognized other varieties of Sanskrit. Epic Sanskrit is so called because it is represented principally in the two epics, Mahābhārata (“Great Epic of the Bhārata Dynasty”) and Rāmāyaṇa (“Romance of Rāma”). In the latter the term saṃskṛta ‘adorned, cultivated, purified (by grammar)’ is encountered, possibly for the first time with reference to the language. The date of composition for the core of early Epic Sanskrit is considered to be in the centuries just preceding the Common Era.

The term Classical Sanskrit is generally used with reference to the language of major poetic works (kāvya), drama (nāṭaka)—in which both Sanskrit and Prākrits were used—as well as tales such as the Hitopadeśa (“Good Advice”) and Pañca-tantra (“Five Chapters”) and technical treatises on grammar, philosophy, and ritual. Not only was Classical Sanskrit used by the poet Kālidāsa and his predecessors Bhāsa, a dramatist, and Aśvaghoṣa, a Buddhist author, in the first centuries ce, but its use also continued long after Sanskrit was a commonly used mother tongue.

Sanskrit remains a language of learned treatises and commentaries. It is also used as a lingua franca among paṇḍitas (traditional scholars) from different areas of India, is recognized in the Eighth Schedule of the constitution of India, and is used by the country’s public broadcasting services, All India Radio and Doordarshan television. Within the census of India, Sanskrit is reported by increasing numbers of people as their mother tongue; for reasons that deserve further investigation, the number of speakers has increased in recent years: about 2,200; 6,100; 49,750; and 14,150 speakers, respectively, for 1971, 1981, 1991, and 2001.

Grammatical modifications

Linguistic developments in Old Indo-Aryan can be traced from the early Vedic forms of the Ṛgveda through the later saṃhitās on to the late Vedic forms of brāhmaṇa prose and sūtras, culminating in the language described by Pāṇini, which is tantamount to what has been called Classical Sanskrit. (In the remainder of this article, Classical Sanskrit refers to the language of the works noted in the previous paragraphs and also the refined spoken language current in Pāṇini’s time and described in the Aṣṭādhyāyī.)

As noted above, Old Indo-Aryan verb forms were subject to significant linguistic development. For example, the nominative plural form ending in -āsas (e.g., devāsas ‘gods’) was already less frequent than -ās in the Ṛgveda and continued to lose ground later; in the Brāhmaṇas, -ās (e.g., devās) is the normal form. There are numerous other changes evident. For example, the instrumental singular form of -a- stems ends both in -ā and -ena (originally a pronoun ending) in the Ṛgveda, with the latter form predominating; thus, vīryā ‘heroic might’ appears once, and vīryeṇa occurs 10 times. In later Vedic texts, -eṇa is the usual ending. All the early Vedic forms are expressly classed as belonging to the sacred language (chandas) by Pāṇini.

The verb also shows chronological and dialect differences. For example, the first person plural ending -masi (e.g., bharāmasi ‘we bear’) predominates over -mas in Ṛgvedic but not in the Atharvaveda; -mas becomes the normal ending later. Early Vedic texts distinguish between aorist, imperfect, and perfect tense forms; for example, the third singular active aorist, imperfect, and perfect forms of gam ‘go’ are agan or agamat, agacchat, and jagāma.

In the current language that Pāṇini describes, the aorist was used to speak of an action carried out at a past time and could include the day on which one spoke, as well as to assert simply that the act in question had taken place. The imperfect, on the other hand, was used with reference to an action that took place some time in the past excluding the day on which one spoke. The perfect was used under these conditions and one more: when the speaker was reporting a past act not directly witnessed. This use of these three preterit forms is also attested in narrations in later Vedic texts. In Vedic of all epochs, the aorist is used in the way described.

On the other hand, already in the Ṛgveda, the perfect and imperfect were used in narrating myths. In dialects reflected in certain other Vedic texts, such as the Taittirīyasaṃhitā, the usual form used in such narration is the imperfect. In addition, some perfect forms continued to be used in Vedic with reference to a state reached—e.g., bibhāya ‘is afraid’ (root bhī). Moreover, even such stative perfects as occurred were generally replaced later. For example, to the perfect bibhāya, a new preterit abibhet ‘was afraid’ was created, on the basis of which speakers formed a present bibheti ‘is afraid,’ and this replaced the older stative perfect, which was then shifted to the normal reporting use of perfect forms: bibhāya (also periphrastic bibhayāñ cakāra) ‘was afraid.’

From earliest Indo-Aryan there are also future forms, with -iṣya- and -sya- affixed to verb bases—e.g., dā-sya-ti ‘will give,’ kar-iṣya-ti ‘will do, make.’ In the current language Pāṇini describes, a future formation, originally composed of an agent noun of the type kar-tṛ- ‘doer’ followed, except in the third person, by forms of the verb as ‘be’ (e.g., kartāsmi [from kartā asmi] ‘I will do’), was used to refer to an action performed at a future time excluding the day on which one spoke. This formation occurs in early Vedic, but only rarely.

Early Vedic had a verb category that later went out of use: the injunctive, which was formally a form with secondary endings lacking the augment, a prefixed vowel—e.g., vadhīs instead of avadhīs ‘you slew’ (2nd sg. imperfect). The injunctive could be used to denote a general truth. A general truth could also be signified by the subjunctive, which is characterized by the vowel a affixed to the present, aorist, or perfect stem. Later Sanskrit retained the injunctive only in negative commands of the type mā vadhīs ‘do not slay.’ The subjunctive also diminished slowly until it was no longer used; for Pāṇini the subjunctive belonged to sacred literature. The functions of the subjunctive were taken over by the form called optative and the future form.

Noun forms incorporated into the verb system are numerous in early Indo-Aryan. Ṛgvedic has forms with affixes -ya and -tva functioning as future passive participles (gerundives)—e.g., vāc-ya- ‘to be said,’ kar-tva- ‘to be done.’ The Atharvaveda has, additionally, forms with -(i)tavya (parentheses indicate optional components of a form), as in hiṃs-itavya- ‘to be injured,’ and -anīya, as in upa-jīv-anīya- ‘to be subsisted upon.’ By late Vedic, the type with tva had been eliminated; Pāṇini recognized kārya-, kartavya-, karaṇīya- ‘to be done’ as the standard types.

In Indo-Aryan, from earliest Vedic down to New Indo-Aryan, particular forms—called absolutives (or gerunds) for Old and Middle Indo-Aryan—are used to denote the prior act of two or more actions performed (usually) by one agent: ‘having done…, he did…’—for example, pibā niṣadya ‘sit down (niṣadya ‘having sat down’) and drink.’ Ṛgvedic dialects use tvī, tvā, tvāya, -(t)ya to form absolutives, but these were later reduced to two: -tvā with a simple verb (e.g., kṛ-tvā ‘after doing, making’) or one compounded with the negative particle (e.g., akṛ-tvā ‘without doing, making’), and -ya with a verb compounded with a preverb (a preposition-like form), as in ni-ṣadya.

Early Indo-Aryan also used various case forms of action nouns in the capacity of what are generally called infinitives—e.g., dative singular -tave (dā-tave ‘to give’), and ablative-genitive singular -tos (dā-tos), both from a noun in -tu, which also supplies the accusative singular -tum (dā-tum). There are other types in early Vedic, but the nouns in -tu are particularly important; in late Vedic the accusative -tum and the genitive -tos (construed with īś ‘be able, capable’) became the norm. In the language Pāṇini describes, forms in -tum and dative singular forms of action nouns are equivalent variants: bhoktuṃ gacchati/ bhojanāya gacchati ‘he is going out to eat.’

That some forms fell into disuse in the course of Indo-Aryan is natural. The modifications noted above represent both chronological and dialectal modifications. Such change was recognized by Indian grammarians; e.g., Patañjali noted that perfect forms of the type ca-kr-a ‘you did’ (2nd person plural) were not in use at his time; instead, a nominal (participial adjective) form with a complex suffix-tavat was used—e.g., kṛ-tavant-as (nom. l. masc.). Indian grammarians also recognized the existence of different dialects. Pāṇini noted forms used by northerners (gen. pl. udīcām) and easterners (prācām), as well as various dialectal uses described by grammarians who preceded him.

Phonological modifications

Earlier documents also afford evidence for dialect variation in the realm of phonology; e.g., the early Vedic of the Ṛgveda is a dialect in which the Indo-European l sound was for the most part replaced by r—prā ‘fill,’ pūr-ṇa- ‘full.’ This change accords with Iranian—e.g., Avestan pərəna- ‘full.’ These forms contrast with Latin plenus and Gothic fulls, with l. Other dialects kept l and r distinct.

There are also doublets that have both r and l in words with Indo-European r: rohita-/lohita- ‘red.’ The variant with l can be assumed to belong to an eastern dialect. This variation accords with Middle Indo-Aryan evidence and the fact that such l forms become more numerous in the 10th book (maṇḍala) of the Ṛgveda, which is demonstrably more recent than the most ancient parts of the Ṛgveda, dating from a time when the Indo-Aryans had progressed farther east than their posited original location on the subcontinent. The development of retroflex ḷ- and ḷh- (sounds produced by curling the tip of the tongue upward toward the hard palate) from the retroflex sounds ḍ (nīḷa- ‘resting place, nest,’ īḷe ‘I praise, invoke,’ from nīḍa-, īḍe) and ḍh (mīḷha- ‘reward, prize,’ ūḷha- ‘transported,’ from mīḍha-, ūḍha-) when occurring between vowels is another feature characteristic of some dialects, including the major dialect of the Ṛgveda.

There is also evidence of dialectal differences in the accentual system of Old Indo-Aryan. In the earliest system attested a syllable has three basic tones: high (udātta), low (anudātta), and a combined tone (svarita) that starts high and drops to low. For example, the first and second syllables of agní- ‘fire, Agni’ are respectively low and high, and the syllable of svàr- ‘heaven, sun’ has a combination of these two pitches. Some svarita syllables result from historical changes that affected still earlier sequences with high and low pitches; e.g., nadyàs (nom. pl.) ‘rivers’ developed from earlier nadíyas.

Other tonal variations resulted from contextual modifications. Thus, a basic low-pitched syllable was pronounced at an extralow level if the following syllable was high-pitched or svarita. In addition, the first mora or first half of a svarita could be pronounced at a higher level than that of a basic high tone. But not all dialects raised the first part of a svarita syllable to such a level, and there were additional dialectal differences in just how a svarita was pronounced. Moreover, in some dialects the svarita was altogether eliminated, replaced by a simple high tone.

The accentual system in which only high and low tones contrasted, known traditionally as the bhāṣika system, is best represented in the Śatapatha Brāhmaṇa (“Vedic Exegesis of a Hundred Paths”). This development may plausibly be considered to represent an early step in the gradual elimination of pitch contrasts. The current language Pāṇini describes, however, still had a system of three basic pitch levels. According to one view prevalent in Western descriptions, Classical Sanskrit had a predictable accentual pattern: if the next to last syllable was heavy—that is, had a long vowel or a short vowel preceding a consonant cluster—it received the accent, while if not, the syllable preceding this one was accented.

Classical Sanskrit

Classical Sanskrit represents a development of one or more such early Old Indo-Aryan dialects. At this state, the archaisms noted above have been eliminated. For all this simplification, Classical Sanskrit is considerably more complex than Middle Indo-Aryan. In addition to the vowels a, i, and u (in both long and short varieties), it has ṛ and ḷ used as vowels. Clusters of dissimilar consonants occur freely, except in final word position, and the system of sound modification, called sandhi, is fully operative. Moreover, in its grammatical system Classical Sanskrit maintains the dual number, seven cases in addition to the vocative form (which marks the one addressed), and complex alternations. For example, the nominative singular form agni-s ‘fire,’ corresponds with the genitive singular agne-s ‘of fire,’ the nominative plural agnay-as ‘fires,’ and the instrumental plural agni-bhis ‘with, by means of fires,’ with differing vowels in the second syllable. There are also separate sets of nominal (noun) and pronominal (pronoun) endings. For example, the nominative plural of deva- ‘god’ is devās but the corresponding form of ta- ‘this, that’ is te. Similarly, the masculine singular dative, ablative, and locative and the genitive plural forms of deva- and ta- differ as follows: devāya, devāt, deve, and devānām as opposed to tasmai, tasmāt, tasmin, and teṣām. Some nominals have forms with pronominal endings—e.g., ekasmai, parasmai, dative singular masculine-neuter of eka- ‘one’ and para- ‘other.’

The verb system of Classical Sanskrit also maintains complex alternations. In the present tense of the type bhav-a-ti ‘becomes, is,’ the stem (bhav-a-) remains unchanged throughout the paradigm except for lengthening of the -a- to -ā- before v and m (1st dual bhavāvas ‘we two are,’ 1st plural bhavāmas ‘we are). But other verbs have vowel alternation—e.g., as-mi ‘I am,’ s-mas ‘we two are,’ s-mas ‘we are’; e-mi ‘I go,’ i-vas ‘we two go,’ i-mas ‘we go’; juho-mi ‘I offer an oblation,’ juhu-vas ‘we two offer an oblation,’ juhumas ‘we offer an oblation.’ A distinction is observed between active and mediopassive endings: as-mi ‘am,’ as-ti ‘is,’ jan-ay-a-ti ‘engenders’ with the active endings -mi and -ti, but ās-e ‘am seated,’ ās-te ‘is seated,’ jā-ya-te ‘is born,’ stū-ya-te ‘is praised,’ with the mediopassive endings -e and -te. Mediopassive verb forms are used for the passive, reflexive, and other meanings.

Classical Sanskrit also has a rich system of nominal and verbal derivatives. Compound words are of the following kinds: copulative (dvandva) compounds such as mātāpitarau ‘mother and father’ (also elliptic pitarau ‘parents’); the type such as rāja-puruṣa- ‘king’s servant,’ in which the first member is equivalent to a case form; the type nīlotpala- ‘blue (nīla-) lotus (utpala),’ in which the constituents are coreferential; the type bahu-vrīhi ‘much-rice,’ in which the object denoted is other than that of any of the members of the compound (bahur vrīhir yasya ‘he who has much rice’); and adverbial compounds (avyayībhāk̄a) of the type upāgni (upa-agni) ‘near the fire.’

In addition, there are derivatives with affixes that in the Sanskrit grammatical tradition are called taddhita and serve to form what Western grammarians call secondary derivatives. Examples include aupagava- ‘offspring of Upagu,’ bhrāṣṭra- ‘prepared in a frying pan,’ dādhika- ‘prepared in yogurt,’ and dantya- ‘dental.’ Also of this type are what in Western grammar are called comparatives and superlatives, formed with the suffixes -tara-, -īyas-, and -tama-, -iṣṭha-—for example, priya-tara- ‘very dear, dearer,’ gar-īyas- ‘very heavy, heavier,’ priya-tama- ‘most dear, dearest,’ and gar-iṣṭha- ‘most heavy, heaviest,’ from the adjectives priya- and guru-.

It is noteworthy that Old Indo-Aryan allowed such derivatives to be formed from elements other than adjectives, including finite verb forms—e.g., natarām ‘not…(for an additional reason),’ natamām ‘all the more not,’ jayatitarām ‘is exceedingly victorious.’ Pronouns have derivatives equivalent to case forms; e.g., tatas ‘from that, thence,’ yatas ‘from which, whence,’ kutas ‘from which, whence?’ and tatra ‘in that, there,’ yatra in which, where,’ and kutra ‘in which, where?’ are equivalent to locative forms such as tasmāt, yasmāt, kasmāt and tasmin, yasmin, kasmin. These can also be used without a noun.

The derivative verbal systems include the causative, the desiderative (‘desire to, wish to’), and the intensive (‘do repeatedly, intensely’). The first has an affix -i-/-ay- or, after certain roots (particularly those in -ā), -pi-/-pay-—e.g., gam-ay-a-ti ‘has go,’ kār-ay-a-ti ‘has do,’, sthā-pay-a-ti ‘sets in place,’ arp-ay-ati ‘causes to reach.’ The desiderative is formed with -sa- and reduplication (repetition of a part of the root): dī-dṛk-ṣa-te ‘desires to see’ (root dṛś). The desiderative also has an agent noun in -u: dī-dṛk-ṣ-u ‘who wishes to see.’ The intensive generally involves reduplication, with a suffix -ya- and medial inflection—e.g., pā-pac-ya-te ‘cooks repeatedly, cooks intently.’

Characteristics of Middle Indo-Aryan

The Sanskrit word prākṛta, whence the term Prākrit, is a derivative from prakṛti- ‘original, nature.’ Grammarians of the Prākrits generally consider the original from which these derive to be the Sanskrit language as described by grammarians going back to Pāṇini. Most modern scholars consider prākṛta to refer to the “natural” languages, the vernaculars, as opposed to Sanskrit, the polished language of the elite (śiṣṭa). This viewpoint is mentioned also by an earlier commentator, Nami Sādhu (11th century), and there is linguistic evidence in its favour. Some forms in the Prākrits are found in Vedic but not in Classical Sanskrit. As Classical Sanskrit is not directly derivable from any single Vedic dialect, so the Prākrits cannot be said to derive directly from Classical Sanskrit.


The most archaic literary Prākrit is Pāli, the language of the Buddhist canon (c. 5th century bce) and of the later stories and commentaries of Theravāda Buddhism. Pāli represents essentially a western Middle Indo-Aryan dialect, though there are sufficient easternisms in the canon to have led some scholars to the plausible view that the canon as it exists today is a recast of an original in an eastern dialect. To the Buddhist literature also belongs the Gāndhārī Dhammapada (“Way of Truth”), the only literary text written in a dialect of the northwest. The Niya documents, official documents written in Prākrit dating from the 3rd century ce, also belong to the northwest.

The earliest inscriptional Middle Indo-Aryan is that of the Aśokan inscriptions (3rd century bce). These are more or less full translations from original edicts issued in the language of the east (from the capital Pāṭaliputra in Magadha, near modern Patna in Bihār) into the languages of the areas of Aśoka’s kingdom. There are other Prākrit inscriptions up to the 4th century ce. Literary Prākrits other than Pāli were also used in independent works and in dramas along with Sanskrit.

According to Prākrit grammarians, as well as theoreticians of poetics such as Daṇḍin (c. 6th–7th century), Mahārāṣṭrī (‘[speech form] from the Mahārāshtra country’) is the Prākrit par excellence. It is the language of kāvyas (poetic works) such as the Rāvaṇavaha (“The Slaying of Rāvaṇa”; also called Setubandha, “The Building of the Bridge [to Laṅkā]”) from no later than the 6th century ce. Mahārāṣṭrī is also the language of lyrics in Rājaśekhara’s Karpūramañjarī (named after its heroine, Karpūramañjarī, c. 9th–10th century), the only extant drama written completely in Prākrit, and of verses recited by women in the classical drama of Kālidāsa (3rd–4th century) and his successors, though not earlier. Śaurasenī is the literary dialect used for conversation between higher personages other than the king and his captains in the drama, while other dialects are used by lower personages.

The language of the early Jaina canon, the final version of which was made in the 5th or 6th century ce, is called Ardhamāgadhī (‘half Māgadhī’); Jainas also used another literary dialect, called Jaina Māhārāṣṭrī by modern scholars, in noncanonical works. The oldest poetic work in this language is Vimala Sūri’s Paumacariya (c. 3rd century), a Jain Rāmāyaṇa. Of other Prākrit dialects mentioned by grammarians and poeticists, Paiśācī (or Bhūtabhāṣā, both meaning ‘language of demons’) is noteworthy; it is said to be the language of the original Bṛhatkathā of Guṇāḍhya, source of the Sanskrit book of stories Kathāsaritsāgara (“Ocean of Rivers of Tales”).

Buddhist works were also written in a language that has been called Buddhist Hybrid Sanskrit. Among these works is the Mahāvastu (“Great Story”), the core of which is thought to date from the 2nd century bce. This language is a Middle Indo-Aryan dialect of indeterminate origin and steadily became more Sanskritized in prose sections of later works. The view once maintained—that Buddhist Hybrid Sanskrit represents the result of translations from Middle Indic into imperfect Sanskrit—has been refuted on the basis of comparable linguistic features found in inscriptions.

The most advanced stage of Middle Indo-Aryan, Apabhraṃśa, was also used as a literary language. That there was literary creation in Apabhraṃśa by the 6th century is clear from an inscription of King Dharasena II of Valabhī, in which he praises his father as being adept in Sanskrit, Prākrit, and Apabhraṃśa composition. Moreover, in the fourth act of Kālidāsa’s drama Vikramorvaśīya (“Urvaśi Won Through Valour”), Apabhraṃśa is used. Because Kālidāsa probably lived in the 3rd or 4th century, literary composition in Apabhraṃśa is earlier than Dharasena’s time, although not all scholars accept that these passages are legitimate. There is a great deal of later literature, all poetry, in Apabhraṃśa, for the most part Jaina works—e.g., Paumacariu (8th–9th century; “The Life of Pauma” [Pauma is an epithet of Rāmā]) of Svayambhū, Harivaṃśapurāṇa (10th century; “Genealogy of Hari [Vishnu]”) of Puṣpadanta, and Sanatkumāracariu of Haribhadra (12th century).

Phonological modifications

Middle Indo-Aryan is generally characterized by the reduction of the complexities seen in Old Indo-Aryan. The vowel system was reduced by the merger of ṛ (and ḷ) sounds with other vowels and the change of the diphthongs ai and au to the monophthongs e and o—e.g., Pāli accha- ‘bear’ (Sanskrit ṛkṣa-), iṇa- ‘debt’ (Sanskrit ṛṇa-), uju- ‘straight’ (Sanskrit ṛju-), pucchati ‘asks’ (Sanskrit pṛcchati), mettī- ‘friendship’ (Sanskrit maitrī-), orasa- ‘legitimate’ (Sanskrit aurasa-). Moreover, -aya- and -ava- commonly contracted to -e- and -o-; e.g., Pāli jeti ‘conquers’ (Sanskrit jayati), odhi- ‘limit’ (Sanskrit avadhi-).

Final consonants were deleted, with the exception of -m, which developed to an -ṃ sound (traditionally pronounced as ŋ, a sound like that of the ng in sing) before which a vowel was shortened (Pāli bhāriyaṃ ‘wife’; Sanskrit bhāryām). Together with the trend toward replacing variable consonant stems by unchanging stems in -a-, this change had serious consequences for the grammar. Consonant stems steadily disappeared and were transformed to stems ending in vowels; e.g., Sanskrit śarad- ‘autumn,’ sarit- ‘stream,’ and sarpis- ‘butter’ correspond with Pāli sarada-, saritā, and sappi-.

Consonant clusters were also modified in Middle Indo-Aryan—e.g., Pāli khetta- ‘field’ (Sanskrit kṣetra-), Pāli dakkhiṇa- ‘right, south’ (Sanskrit dakṣiṇa), aggi- ‘fire’ (Sanskrit agni-), puṇṇa- ‘full’ (Sanskrit pūrṇa), and taṇhā- ‘thirst’ (Sanskrit ṭṛṣṇā-). The shortening of vowels before modified consonant clusters led to the use of short ĕ and ŏ sounds, which were unknown in Old Indo-Aryan except in particular Vedic recitations—e.g., Pāli sĕmha- ‘phlegm’ (Sanskrit śleṣman-), ŏṭṭha- ‘lip’ (Sanskrit oṣṭha-).

The above phenomena are not restricted to Pāli; they are pan-Middle Indo-Aryan. Differences between Pāli and Aśokan on the one hand and other Prākrits on the other include the retention of voiceless stops (i.e., p, t, k) between vowels in Pāli and Aśokan dialects; other Middle Indo-Aryan dialects modify them. The extreme development appears in literary Māhārāṣṭrī, in which unaspirated stops (pronounced without an accompanying audible puff of breath) other than retroflexes (ṭ, ḍ) and labials (p, b) were deleted, aspirated stops (pronounced with an audible puff of breath) were replaced by h, retroflexes (pronounced by curling the tongue upward toward the hard palate) became voiced, and labials were replaced by v—e.g., loa- ‘world’ (Sanskrit loka-), loaṇa- ‘eye’ (Sanskrit locana-), sāhā- ‘branch’ (Sanskrit śākhā-), paḍhai ‘recites, reads’ (Sanskrit paṭhati), and savaha- ‘oath, curse’ (Sanskrit śapatha-).

Essentially on the same level are the dialects of Jaina texts, but in these a y glide noted by grammarians occurs when a consonant is elided: vayaṇa- ‘face’ (Sanskrit vadana-); sayala- ‘whole’ (Sanskrit sakala-). In Śaurasenī, on the other hand, voiceless stops (e.g., p, t, k) between vowels are voiced (e.g., become b, d, g, respectively)—e.g., ido ‘hence,’ tadhā ‘thus,’ with voiced -d- and -dh- for voiceless -t- and -th- (Sanskrit itaḥ, tathā). Though Pāli and Aśokan are at an earlier level of development with respect to these changes, they share with the rest of the Middle Indo-Aryan dialects the replacement of voiced aspirated sounds between vowels by h: lahu- ‘light, unimportant’ from laghu-, dahati ‘gives’ (Sanskrit dadhāti). Similarly, they share the change of ty-, dy-, dhy- to c-, j-, jh- and, comparably, of intervocalic clusters -ty-, -dy-, -dhy- to -cc-, -jj-, -jjh-: Pāli cajati ‘lets loose’ (Sanskrit tyajati), Pāli jotati ‘shines’ (Sanskrit dyotate), Pāli jhāyati ‘meditates, thinks about’ (Sanskrit dhyāyati), Pāli paticca ‘originating’ (Sanskrit pratītya), Pāli ajja ‘today’ (Sanskrit adya), Pāli majjha- ‘middle’ (Sanskrit madhya-). Pāli and Aśokan, however, retain an initial y-, changed to j- in most other Prākrits—e.g., the pronoun ya- (feminine yā-), opposed to ja-.

The deletion of stop consonants noted above resulted in vowel sequences within words that were unknown to Old Indo-Aryan. Similarly, the extent of sandhi modification was restricted in Middle Indo-Aryan. The Middle Indo-Aryan vowels ī and ū do not change to y and v before dissimilar vowels in compounds—e.g., Māhārāṣṭrī rattīandhaa- ‘dark of night’ (Sanskrit rātryandhaka-). In addition, the first of two contiguous vowels in different words is subject to deletion—e.g., Pāli manas’icchasi (from manasā icchasi) ‘you wish in your mind.’

Middle Indo-Aryan shows evidence of dialectal differentiation. The earliest documents that allow one to determine roughly the dialect distribution are Aśoka’s inscriptions. These represent three major dialect areas: east, as in the inscriptions of Jaugaḍa, Dhauli, and Kālsī; west, in Girnār; and northwest, in Mānsehrā and Shāhbāzgaṛhī. Characteristic of the east dialect area is final -e, corresponding to -o in the west and -aḥ in Sanskrit; in the east dialect area l also regularly corresponds to r of the west and of Sanskrit.

Moreover, in the east dialect area there is a tendency to insert a vowel within consonant clusters, while in the west and northwest one of the consonants is assimilated to the other without an intervening vowel. For example, Sanskrit rājñaḥ ‘of the king’ corresponds with Girnār rañño, Shāhbāzgaṛhī raño, Jaugaḍa lājine. Northwest stands apart in retaining three spirant sounds, ś, ṣ, s, which merge to s elsewhere. Aśoka’s eastern dialect, from the Magadha country, shows an s sound for Old Indo-Aryan ś, ṣ, s rather than the ś sound typical of literary Māgadhī.

Grammatical modifications

In its grammatical system, Middle Indo-Aryan also reduced complexities. The dual number no longer exists as a separate category; corresponding to Sanskrit dvābhyām ‘by two,’ Prākrit has dohi(ṃ) (Pāli dvīhi), with the ending -hi(ṃ) equivalent to the instrumental plural -bhis of Old Indo-Aryan. Among other changes is the replacement of the dative case by the genitive except in particular usages—e.g., the use of forms corresponding to the Old Indo-Aryan dative to denote a purpose.

In Middle Indo-Aryan, nominal and pronominal forms are no longer strictly segregated; e.g., Aśokan vijitamhi ‘in the kingdom’ (also vijite) has a pronominal ending -mhi that derives phonetically from Old Indo-Aryan -smin.

In the verb system, the contrast between active (3rd sing. -ti) and mediopassive (3rd sing. -te) endings was obliterated. Further, the Old Indo-Aryan distinction between aorist, imperfect, and perfect forms was eliminated. With few exceptions, the sigmatic aorist (an aorist form with s) provides the only productive finite preterite forms of early Middle Indo-Aryan—e.g., Aśokan ni-kkhamisu ‘they set out’ (Sanskrit nir-a-kramiṣur). In later Prākrits verbally inflected preterites were generally eliminated, except in Ardhamāgadhī; in their place was used the past participle. For example, in Śaurasenī devi uva-visa, mahārāo vi ā-ado ‘sit down, my queen, the king also has arrived,’ the past participle ā-ado (Sanskrit ā-gataḥ) agrees with mahā-rāo ‘king’ (Sanskrit mahā-rājaḥ) in number and gender. If the verb is transitive, the participle agrees with the direct object, and the agent is denoted by an instrumental form: in Jaina Māhārāṣṭrī, teṇa vi savvaṃ siṭṭhaṃ ‘he has told everything,’ teṇa ‘by him’ refers to the agent, and siṭṭhaṃ ‘told’ (Sanskrit śiṣṭam) agrees with the neuter singular form savvaṃ (Sanskrit sarvam). When no object is denoted, the verb is in the neuter singular. Old Indo-Aryan used both the participial construction and the finite verb; thus, Prākrit so vi teṇa samaṃ gao ‘he also went with him’ could correspond with Sanskrit so’pi tena saha gataḥ or so’pi tena sahāgamat (saha agamat). The Middle Indo-Aryan development eliminated the latter construction.

Alternations of the Sanskrit type as-mi, s-mas were eliminated in Middle Indo-Aryan; the predominant type of present tense was formed from an unchanging vowel stem, as in Pāli e-ti, e-nti ‘go(es).’

Nominal forms of the verb system are of the same types as Old Indo-Aryan—e.g., the Pāli future passive participle (gerundive) kātabba- (Sanskrit kartavya-) ‘to be done,’ Śaurasenī karaṇia-; Ardhamāgadhī, Jaina Māhārāṣṭrī, and Māhārāṣṭrī karaṇijja- ‘to be done.’ The infinitive is commonly formed on the present tense stem, not on the root as in Old Indo-Aryan. Thus, Pāli pappotum is formed on the present pappoti; Sanskrit prāptum contains āptum, formed on the root āp, not on the present stem āp-no- (3rd sing. present indicative prāpnoti).

Some grammatical features show dialectal variation; e.g., the Aśokan dative singular form is -āya in the western dialects (Girnār atthāya ‘for the purpose of’) but -āye in the east (Kālsī, Dhauli aṭṭhāye).


As noted above, the most advanced development of Middle Indo-Aryan is seen in Apabhraṃśa. Sound changes that are typical of Apabhraṃśa include the replacement of the vowel sound a by u in final syllables; e.g., karahu ‘you all do, make,’ corresponds with karaha (karadha) in other Prākrits. From stems in -aya- develop forms in -aü and nasalized -aũ (nasalization is here indicated by a tilde [~]): bhaḍāraü ‘honoured one, king’ (Prākrit bhaṭṭārayo), haũ ‘I’ (Aśokan hakaṃ). Nasalization also appears in environments in which earlier m occurred between vowels—e.g., gāũ ‘village’ (from an earlier base gāma-, Sanskrit grāma-).

Numerous other sound changes are evident, among them the development of -s(s)- between vowels into h: tahŏ ‘of him’ (Prākrit tassa, Sanskrit tasya); hohinti ‘will be’ (compare Pāli hossati [3rd sing.]).

Apabhraṃśa contractions, such as -aya- changing to -aü and -iya to -ī, foreshadow New Indo-Aryan, in which the development was extended—e.g., Apabhraṃśa pāṇiü ‘water’ (Old Indo-Aryan pāniyam), Gujarati pāṇī, Hindi pānī.

In other points Apabhraṃśa also presaged New Indo-Aryan. Contracted forms are reflected in the New Indo-Aryan opposition of masculine, neuter, and feminine nouns—thus, Apabhraṃśa -aü, -aũ, -ī, Gujarati -o, -ũ, -ī (gayo, gayũ, gaī ‘went’), Hindi -ā, -ī (gayā, gaī). The case system of Apabhraṃśa is also at a more advanced level of disintegration than that of earlier Middle Indo-Aryan, with the instrumental and locative plurals being identical in form (-ahĩ or -ehĩ for -a- stems) and instrumental singular forms also being used as locatives.

In the Apabhraṃśa verb system, present tense stems in -a predominate. Apabhraṃśa verb endings differ from those of other Prākrits. Particularly interesting is the third person plural type karahĩ ‘they do,’ which coexists with karanti. The form karahĩ, corresponding to the third person singular karaï ‘he does,’ is formed on the model of the pair karaũ (1st person singular, ‘I do’) and karahũ (1st person plural, ‘we do’). Here again Apabhraṃśa comes close to New Indo-Aryan. Moreover, Apabhraṃśa has some causative formations that do not occur elsewhere in Middle Indo-Aryan but are known from New Indo-Aryan—e.g., bham-āḍ-a-i ‘causes to turn,’ Gujarati bhamāṛe che ‘causes to turn around,’ and pais-ār-a-i ‘causes to enter,’ Gujarati pɛsāre che ‘causes to enter, to penetrate.’

Also noteworthy are syntactic usages that closely parallel those present in New Indo-Aryan. The present participle is used as a conditional—e.g., jivă̇ tivă̇ tikkhā levi kar jaï sasi chollijjantu | to jaï gorihe muhkmali sarisima kāvi lahantu ‘if somehow the moon had its sharp rays taken away and [it] were then fashioned, then it might gain some similarity in the world to the lotus face of my beautiful lady,’ where the phrases jaï sasi chollijjantu ‘if the moon were fashioned’ and sarisima lahantu ‘would gain similarity’ contain present participle forms used in stating a contrary to fact conditional. In Sanskrit the conditionals atakṣiṣyata and alapsyate would be used.

The Apabhraṃśa gerundive in -iv(v)a or -ev(v)a can be used as an infinitive—e.g., pi-eva-e laggā ‘began to drink.’ This is the Gujarati construction pi-vā lāgyo ‘began to drink,’ in which pi-vā is an inflected form of pi-vũ—that is, a verbal noun corresponding etymologically to the Apabhraṃśa gerundive.

Influences on Old and Middle Indo-Aryan

Middle Indo-Aryan shows evidence of the influence of linguistically more advanced vernaculars on literary compositions. The Prākrits of elegant literary compositions must have been artificial, different in many respects from the vernaculars current at the time, though reflecting languages that were current at some former time. The Old Indo-Aryan and Middle Indo-Aryan stages, then, present a picture of concurrent vernaculars with dialects and literary languages influenced by the vernaculars. It is impossible to compartmentalize the different stages as beginning and ending at any definite date.

The literary languages borrowed words and suffixes from earlier languages. There are Prākritisms (i.e., forms of earlier Prākrits) in Apabhraṃśa—e.g., the genitive singular ending -ssa instead of -hŏ and 2nd person plural verb forms terminating in -ha instead of -hu. All the literary Prākrits had recourse to Sanskrit as a source for borrowing words. Words that were incorporated into the Prākrits from Sanskrit with no change in form are called saṃskṛta-sama ‘identical with the Sanskrit (form)’ or tat-sama ‘identical with that’ and are contrasted with words termed saṃskṛta-bhava (tad-bhava) ‘whose origin is in Sanskrit’ (literally, ‘located in Sanskrit’)—that is, words that the grammarians can derive from Sanskrit by using certain rules. Another class of words, called deśya (or deśī) ‘belonging to the area, country,’ includes items that the grammarians cannot derive easily from Sanskrit and that are supposed to have been in use in particular areas from early times.

Many or most of the deśya words are indeed derivable from earlier Indo-Aryan, but some are of Dravidian origin—e.g., akka ‘sister’ (Telugu akka), attā ‘father’s sister’ (Telugu atta), appa ‘father’ (Telugu appa), ūra ‘village’ (Telugu uru), pulli ‘tiger’ (Telugu puli). Whether borrowing from Dravidian occurred in prehistoric times and is reflected in the Ṛgveda remains a source of scholarly debate.

Another object of debate is whether any borrowing that might have taken place at such an early time would have occurred in a situation where Dravidians were a substrate group that transferred features from their speech to that of superiors whose language they used, or in a situation of equality, so that bilinguals affected each other’s languages. Such borrowing definitely took place in later Sanskrit. It is not always certain that borrowing proceeded from Dravidian to Indo-Aryan, however, because Dravidian languages freely borrowed from Indo-Aryan. Thus, some scholars claim that Sanskrit kaṭu ‘sharp, pungent’ is from Dravidian, but others claim that it is a Middle Indo-Aryan form deriving from an earlier *kṛt-u ‘cutting’ (root kṛt; an asterisk [*] preceding a form indicates that it is not attested but has been reconstructed as a hypothetical form).

Whatever the judgment on any individual word, it is clear that Indo-Aryan did borrow from Dravidian, and this phenomenon is important in considering a group of sounds that sets Indo-Aryan apart from the rest of Indo-European—the cacuminal, or retroflex, stops. The influence of Dravidian may be considered as contributing to the extension of these sounds beyond their limited occurrence in inherited Indo-European items such as nīḍa ‘nest’ (from Proto-Indo-Aryan *nizḍa-, Proto-Indo-European *ni-sd-o-), mīḍha- ‘reward’ (from Proto-Indo-European *misdho-), stīr-ṇa- ‘spread out’ (from Proto-Indo-European *stṝ-no-), dviṭ ‘hating’ (nominative singular, from earlier *dviṣ-s), where retroflex consonants developed by regular phonetic developments from inherited Indo-European terms.

Such developments led to contrasts between retroflex—or at least retracted—stops and dental consonants, as in sīdati ‘is sitting down,’ vidhavā- ‘widow,’ agnicit (nominative singular) ‘one who has set up ritual fires.’ Moreover, retroflex stops developed in Middle Indo-Aryan dialects through sound changes; as noted earlier, kaṭa- developed from earlier kṛta-, and, in eastern dialects, aṭṭha- developed from artha-. As also noted, Old Indo-Aryan Sanskritic speech communities interacted with speakers of Middle Indo-Aryan vernaculars, from which they borrowed terms with retroflex stops. They then maintained the terms, as Old Indo-Aryan had also developed contrastive retroflex consonants. When, as a result of close contact, Dravidian words with retroflex consonants were borrowed, they too could be taken into Indo-Aryan without changing the retroflex consonants to dentals. The Munda languages (or, more generally, the Austroasiatic languages) are also a source of some borrowing into Indo-Aryan—e.g., Sanskrit jambāla- ‘mud’ (Santali jobo).

In the 7th century ce, the philosopher Kumārila mentioned not only Dravidian but also Persian and Greek as sources of foreign words. Such borrowing can be traced back to early times. In the 6th century bce the Achaemenid emperor Darius I counted Gandhāra as a province of his kingdom, and Alexander the Great penetrated into northern India in the 4th century bce. From Iranian come words such as that meaning ‘inscription, writing, script’; in the northwest inscriptions of Aśoka the word is dipi (Old Persian dipi), and Sanskrit has lipi-, the form in other Aśokan versions and in Pāli. Also from Persian is Sanskrit kṣatrapa- ‘satrap’—Old Persian xšassa-pāvan-. Of Greek origin are such mathematical and astronomical terms as Sanskrit kendra ‘centre’ (Greek kéntron), jāmitra ‘diameter’ (diámetron), and horā ‘hour’ (hṓra). Yavana ‘foreigner,’ originally the Greek word for Ionian, is known from as early as the time of Pāṇini. Later, Arabic words such as taślī ‘trigon’ came into Sanskrit.

The modern Indo-Aryan stage

The division of the Indian subcontinent into linguistic states and even into countries (Pakistan, Bangladesh, and India) is a recent phenomenon (see table). Even after independence from Britain was achieved and partition had taken place, Bombay state existed until it was split into Gujarāt and Mahārāshtra states in 1960. The division of Punjab into Punjab and Haryana states in 1966 occurred as a result of Punjabi agitation for a separate linguistic state. Before independence, under British rule (entrenched from the 18th century), there were princely states within dialect areas; under Mughal rule (16th–18th centuries), Persian was the language which was used by the court and by courts of justice and this practice continued in the latter function for a time under the British. Though Hindi–Urdu may have been a lingua franca, however, the great dialectal diversity of earlier times continued.

Some of the modern Indo-Aryan languages have literary traditions reaching back centuries, with enough textual continuity to distinguish Old, Middle, and Modern Bengali, Gujarati, and so on. Bengali can trace its literature back to Old Bengali caryā-padas, late Buddhist verses thought to date from the 10th century; Gujarati literature dates from the 12th century (Śālibhadra’s Bharateśvara-bāhubali-rāsa) and to a period when the area of western Rājasthān and Gujarāt are believed to have had a literary language in common, called Old Western Rajasthani. Jñāneśvara’s commentary on the Bhagavadgītā in Old Marathi dates from the 13th century and early Maithili from the 14th century (Jyotīśvara’s Varṇa-ratnākara), while Assamese literary work dates from the 14th and 15th centuries (Mādhava Kandalī’s translation of the Rāmāyaṇa, Śaṅkaradeva’s Vaiṣṇaviṭe works). Also of the 14th century are the Kashmiri poems of Lallā (Lallāvākyāni), and Nepali works have also been assigned to this epoch. The work of Jagannāth Dās in Old Oriya dates from the 15th century.

Amīr Khosrow used the term hindvī in the 13th century, and he composed couplets that contained Hindi. In early times, however, other dialects were predominant in the midlands (Madhyadeśa) as literary media, especially Braj Bhasa (e.g., Sūrdās’ Sūrsāgar, 16th century) and Awadhi (Rāmcaritmānas of Tulsīdās, 16th century). In the south, in Golconda (Andhra, near Hyderābād), Urdu poetry was seriously cultivated in the 17th century, and Urdu poets later came north to Delhi and Lucknow. Punjabi was used in Sikh works as early as the 16th century, and Sindhi was used in Ṣūfī (Islāmic) poetry of the 17th–19th centuries. In addition, there is evidence in late Middle Indo-Aryan works for the use of early New Indo-Aryan; e.g., provincial words and verses are cited.

The creation of linguistic states has reinforced the use of certain standard dialects for communication within a state in official transactions, teaching, and on the radio. In addition, attempts are being made to evolve standardized technical vocabularies in these languages. Dialectal diversity has not ceased, however, resulting in much bilingualism; for example, a native speaker of Braj Bhasa uses Hindi for communicating in large cities such as Delhi.

Moreover, the attempt to establish a single national language other than English continues. This search has its origin in national and Hindu movements of the 19th century down to the time of Mahatma Gandhi, who promoted the use of a simplified Hindi–Urdu, called Hindustani. The constitution of India in 1947 stressed the use of Hindi, providing for it to be the official national language after a period of 15 years during which English would continue in use. When the time came, however, Hindi could not be declared the sole national language; English remains a co-official language. Though Hindi can claim to be the lingua franca of a large population in North India, other languages such as Bengali have long and great literary traditions—including the work of Nobel Prize winner Rabindranath Tagore—and equal status as intellectual languages, so that resistance to the imposition of Hindi exists. This resistance is even stronger in Dravidian-speaking southern India. The use of English as an official language entails problems, however, because with the use of state languages for education, the level of English competence is declining. Another danger faced is the agitation for more separate linguistic states, threatening India with linguistic fragmentation hearkening back to earlier days.

Characteristics of the modern Indo-Aryan languages

The trends noted in Middle Indo-Aryan continue in New Indo-Aryan. The Middle Indo-Aryan vowel sequences ai and au were changed to single vowels during the development of New Indo-Aryan, final vowels were shortened and deleted, and ḍ and ḍh sounds between vowels were replaced by the sounds ṛ and ṛh. The noun cases were further reduced, and the introduction of nominal (noun) forms into the verb system became more pronounced.

Literary languages tend to become somewhat removed from the usual standard colloquial. Literary, or High, Hindi, for example, tends to replace some of the Perso-Arabic vocabulary with Sanskritic items, whereas literary Urdu makes great use of Perso-Arabic words. The gap is formalized in Bengali, in which a distinction is made between the highly Sanskritic language Sadhu-Bhaṣa and the colloquial standard called Calit-Bhasa.


[Note: The forms of the words given below reflect actual pronunciation, rather than being transliterated versions of the standard orthographies. For New Indo-Aryan the symbols ə, pronounced as the a in English “sofa,” and a are used for the sounds earlier transcribed as a and ā, respectively; e.g., Gujarati karũ “I do” and māro “beat” are now written kərũ and maro. This practice permits certain contrasts to be made among sounds that are significant in the description of dialectal features. In Kashmiri words, a is short, opposed to ā.]

Vowels in sequence contracted in early New Indo-Aryan; e.g., Old Indo-Aryan aśīti became Middle Indo-Aryan asīi, Hindi and Punjabi əssī, and Bengali aši “80.” Further, ai and au sounds changed to e and o, and aũ to ũ, while iu developed into ī. The diphthongs ai and au were retained well into the New Indo-Aryan period and are still pronounced in some areas; e.g., Braj Bhasa kərəũ “I do,” kərəi “he does.” Middle Indo-Aryan -ḍ- and -ḍh developed into the flaps ṛ and ṛh; e.g., Prākrit sāḍiā “woman’s garment,” Kashmiri, Lahnda, Hindi, Gujarati, Bhojpuri, Bengali, Oriya saṛī “sari”; and Prākrit paḍh- “recite, read,” Sindhi pəṛh-əṇu, Lahnda pəṛh-əṇ, Hindi, Punjabi pəṛh-na, Gujarati pəṛh-vũ, Marathi pəṛh-ṇə “study.”

Stress is not generally contrastive in New Indo-Aryan as it is, for example, in English (e.g., noun “éxport,” verb “expórt”), though different areas have different rules for placing major emphasis on a given syllable. For example, in Hindi, in which vowel length is pertinent, gilá “swallowed” has major stress on the last syllable, gīla “wet,” on the first. In Gujarati, on the other hand, vowel length is not pertinent; the stress position depends on which vowels occur in contiguous syllables and on the structure of the syllables, whether open or closed; e.g., júno “old,” but dukán “store.” In Bengali each syllable of a word receives about equal stress.

The sounds that most clearly distinguish Indo-Aryan from the rest of Indo-European are the voiced aspirate stops (gh and the like, pronounced with an accompanying audible puff of breath) and the retroflexes (ṭ and so on, pronounced by curling the tongue upward toward the hard palate). In the outlying New Indo-Aryan areas, however, the sound system is reduced. Sinhalese has no aspirated stops, Assamese has no retroflexes, and Kashmiri has no voiced aspirates. The geographic position of these languages doubtless contributed to these losses: Sinhalese coexists with Tamil, Assamese is surrounded by Tibeto-Burman languages, and Kashmiri is on the border of the Iranian area.

New Indo-Aryan shows evidence of early dialect distribution; this is discernible by considering sound changes proper to each group. The eastern group (Assamese, Bengali, Oriya) has three important changes. Long and short i and u merged; e.g., Assamese nila, Oriya niḷɔ (ɔ is similar to the o of “coffee” in some English dialects), Bengali nil “blue-black” but Sanskrit nīla; Assamese dhuli, Bengali dhulo, Oriya dhuḷi “dust” but Hindi dhūl and Sanskrit dhūli. The vowel sound a of Middle Indo-Aryan was replaced by ɔ in Bengali and Oriya and ɒ (similar to the o of “hot” in southern British English) in Assamese in initial position and open syllables; e.g., Bengali mɔron, Oriya mɔrɔn, Assamese mɒrɒn “death”; Sindhi, mərəno “mortal, death,” Sinhalese mərəṇə, Gujarati, Marathi mərəṇ (compare Sanskrit maraṇa-). Moreover, in this group a vowel is affected by the quality of the vowel in a following syllable. For example, in Bengali ami kori “I do,” the verb root has o followed by i in the next syllable, but tumi kɔro “you do” has an ɔ sound; similarly, ami kini “I buy” but tumi keno. As a result of vowel assimilation also, Assamese has an ɔ sound instead of ɒ representing Middle Indo-Aryan a: Assamese xɔhur, Bengali šošur “husband’s father” (compare Hindi səsur, Prākrit sasura-, Sanskrit śvaśura-).

Assamese and Bengali are set off from Oriya. In the former two, Middle Indo-Aryan ḍ and ḍh merge medially to ḍ (then ṛ) with a subsequent development to r in Assamese; e.g., Oriya daṛhi, Bengali daṛi, Assamese dari “beard”; Hindi, Gujarati daṛhī, Prākrit dāḍhiā. Assamese is also distinguished from Bengali by several developments, among them the merger of Assamese retroflex sounds with dental sounds; e.g., Assamese ut “camel” but Bengali uṭ, Oriya oṭɔ, Sindhi uṭhu, Lahnda, Pahari uṭṭh, and so on. Assamese also has s for earlier c and ch sounds and a z sound for j and jh; e.g., Assamese kas “glass,” Bengali kac; Assamese azi “today,” Oriya aji, Bengali, Hindi aj. In addition, Assamese replaced an s sound initially by x and between vowels by h—xɔhur.

Particular sound changes also characterize languages of the northwest. In this group, an older voiceless stop (e.g., t) became voiced (e.g., became d) after a nasal sound; in other areas, the voiceless stop is retained: Kashmiri dand, Punjabi dənd, Sindhi ḍəndu “tooth” (the ḍ in Sindhi is an imploded stop; see below) but Assamese, Bengali, Hindi, Gujarati, Marathi dãt, Sinhalese dətə (Sanskrit danta-). Moreover, in the northwest group a voiced stop (e.g., d) preceded by a nasal was assimilated to the latter, resulting in two nasals, which were subsequently reduced to one in some areas; in the rest of New Indo-Aryan, the vowel preceding the nasal was nasalized. Thus, Kashmiri don “churning stick,” Sindhi ḍənu “tribute,” Punjabi dənn “fine,” Lahnda ḍənn “force,” Kumauni dan “roof” contrast with Assamese dãr “pole,” Bengali dãṛ “oar,” Hindi dãḍ “oppression, fine,” and others; all forms derive from Old Indo-Aryan daṇḍa- “stick, staff, club, royal power, fine, punishment.”

In the sequence of a short vowel followed by two consonants, Pahari differs from the rest of the northwest group and agrees with the rest of New Indo-Aryan. In the northwest this sequence either remained unchanged or the cluster was simplified without lengthening of the vowel; other languages generally simplified the cluster and lengthened the vowel: Punjabi bhətt, Sindhi bhətu, Lahnda bhət, Kashmiri batɨ “cooked rice, food” but Nepali, Kumauni, Hindi, Assamese, Bengali, Gujarati, Marathi bhat.

Dardic occupies a special position. The sibilant sounds did not all merge here. For example, Kashmiri, a Dardic tongue, has šurah “16” with š rather than s, as in most other Indo-Aryan languages, and sat “7” with s. Further, voiced aspirated stops merged with unaspirated stops in Dardic; e.g., Kashmiri gur “horse” but Hindi ghoṛa; Kashmiri dɔd “milk” but Hindi dūdh.

One major feature distinguishing Sindhi from the rest of the northwest group is the development of a series of imploded stops (also called suction stops and recursive stops), for b, ḍ, j, and g. Implosive stops also occur in the Sindhi vicinity; for example, Kacchi has imploded b. Another feature that distinguishes Sindhi from other northwest languages, including Kacchi, is the retention of the Middle Indo-Aryan final short vowels; e.g., Sindhi əkhi “eye” but Hindi ãkh (Middle Indo-Aryan akkhi-).

Punjabi is distinguished from other members of the northwest group by its tonal system, having low (ˋ), mid (¯), and high (´) tones. Initial voiced aspirated stops of earlier Indo-Aryan appear in Punjabi as voiceless stops with low tone on the following vowel; e.g., Punjabi kòṛa but Hindi ghoṛa; Punjabi tàī “2 1/2” but Hindi ḍhaī. Non-initially, a voiced aspirate became unaspirated and the preceding vowel received high tone; thus, Punjabi dū́d “milk” but Hindi dūdh, and Punjabi láb “profit” but Hindi labh.

Gujarati, Marathi, and Konkani in the west and southwest differ from the languages of the midlands in that, as in the east, there is no contrast between long and short i and u vowels. The i of Gujarati and Marathi vis “20” is pronounced like the ee of English “teeth,” the i of Gujarati iccha and Marathi iččha “wish” like the i of “pitch,” but such a difference is not contrastive, as it is in Hindi (gīla “wet”: gila “swallowed”). Gujarati has certain features that, in turn, set it apart from the other languages of this group. In addition to e and o sounds, it has the open vowels ɛ, ɔ; e.g., cɔthũ “fourth” (Middle Indo-Aryan cauttha), bɛs-vũ “to sit” (Middle Indo-Aryan baisai “sits”). Moreover, Gujarati has murmured vowels, generally developed from vowels followed by h; e.g., kɛh che “says” (h represents murmuring of the vowel), Old Gujarati kahai chai. Marathi and Konkani have two series of affricate sounds; e.g., č (pronounced as the ch in English “chat”; the equivalent of c in some other languages) and c (pronounced as the ts of “rats”).

There was clearly mutual influence of Indo-Aryan languages at an early time, together with movement of groups of speakers (compare the position of Pahari). Thus, while Punjabi səcc “true” is the expected form comparable to Middle Indo-Aryan sacca- (Old Indo-Aryan satya-), Hindi səc “true” does not represent the expected outcome. The item səc must come from the Punjabi area.


Like Middle Indo-Aryan, New Indo-Aryan distinguishes only two numbers—singular and plural. Unlike Middle Indo-Aryan, the New Indo-Aryan languages differ in the degree to which gender distinctions are made. Three genders are retained in the west and southwest (Gujarati, Marathi, Konkani), and this is true also of Sinhalese. Unlike Gujarati, Marathi, and Konkani, in which every noun, whether it denotes an animate being or not, has a particular gender that is unpredictable, Sinhalese restricts masculine and feminine gender to animates and neuter to inanimates. The eastern group (Assamese, Bengali, Oriya) has no grammatical gender distinctions, and two genders are distinguished elsewhere.

Over a large area of New Indo-Aryan the noun has only two cases—direct and oblique. A lack of distinction between direct and oblique cases in the plural is typical of several languages, including forms in Hindi, Gujarati, Marathi, and Bhojpuri. Direct forms are used independently, oblique forms before postpositions (words or word elements following a noun that function similarly to English prepositions) and other affixes; the combination of stem and postposition serves the function of inflected case forms of earlier Indo-Aryan. Thus, to denote an object (direct or indirect) Hindi uses the postposition ko, which occurs in direct object constructions normally only with nouns denoting animate beings; e.g., ləṛke-ko dekh-ta hɛ “He sees the boy,” ləṛke-ko miṭhaī do “Give a sweet to the boy.” Other postpositions are mē “in,” pər “on,” se “from, with, by means of.” A large group of postpositions are linked to the noun with the affix ka (oblique form ke, feminine kī), which also is used to form adjectives (possessives); e.g., ləṛke-ke sath gəya “He went with the boy,” ləṛke-ke pas hɛ “The boy has it” (literally, “It is by the boy”). Many such postpositions represent old nominal (noun) forms. Other New Indo-Aryan languages have systems similar to that of Hindi, though the forms of the postpositions differ.

Though the nominal (noun) system of Punjabi is very close to that of Hindi, it has separate ablative (indicating separation and source) and locative (indicating place) forms in the singular and plural, respectively, for nouns such as koṭha “house”; e.g., koṭhiõ “from the house,” koṭhĩ “in the houses.” Some languages have a fuller case system than that noted above; e.g., Bengali has a genitive singular ending, a genitive plural ending, and a locative case. Similarly, Kashmiri has nominative, dative, ablative, and agentive cases. Not all such case forms are inherited from Middle Indo-Aryan. In addition to case endings, these languages also use postpositions; e.g., Kashmiri garājas-andar “in the garage,” with -andar after the dative ending -as.

Adjectives behave generally in the same way as nouns but have a syntactic restriction. In Hindi the possessive is in the oblique (non-nominative) form, as is the noun after which it occurs; but in the plural, only the noun has the oblique form. Further, the formation of comparatives and superlatives with derivative affixes has been eliminated. To a Sanskrit sentence such as ime amū-bhyaḥ āḍhya-tarāḥ “These (people) are richer than those,” in which the comparative āḍhya-tara occurs construed with the ablative form, corresponds a Hindi sentence ye un-se əmīr hɛ̃, in which no comparative affix is used—literally, “These are rich from (i.e., in comparison with) those.” Comparable constructions with a postposition meaning “from” occur elsewhere in New Indo-Aryan.

The pronominal system of New Indo-Aryan formally resembles the Middle Indo-Aryan stage more than its noun system. For example, Gujarati hũ “I,” mɛ̃ “I” (agentive), əme “we” (also agentive) are directly comparable to Apabhraṃśa haũ, maĩ, amhaĩ. The number distinctions of the Middle Indo-Aryan pronoun have been replaced, however, by distinctions of familiarity and politeness. For example, Hindi and Bengali have a three-way distinction—Hindi ap, Bengali apni “you” are polite or honorific forms; Hindi tum, Bengali tumi are informal forms; and Hindi tū, Bengali tui are used only for inferiors and small children. (Hindi and Bengali differ, however, in the plural forms of these.) In Gujarati, on the other hand, tū~ is a very familiar pronoun, whereas təme is used generally, covering the approximate domains of Hindi ap and tum; ap, if used, strikes the hearer as fawning. Marathi has a similar system. Southwestern languages also make a distinction in the 1st person plural between inclusive and exclusive, the exclusive excluding the person spoken to. In the form of the relative pronoun and the 3rd person pronoun, languages differ in the degree to which gender distinctions are made, thus contrasting with Old and Middle Indo-Aryan, in which these forms had three genders. For example, Marathi has masculine, feminine, and neuter for the relative pronoun, while Bengali has animate and inanimate.

New Indo-Aryan languages differ in the degree to which finite verb forms have been replaced by nominal (noun) forms. In Bengali a contrast is made between continuous or actual present (English “be . . . -ing”) and non-continuous or habitual present; e.g., ami kaj kor-i “I work” (literally, “I do work”), with the ending -i, contrasts with ami kaj kor-ch-i “I am working,” in which ch intervenes between the root and the ending. Hindi has a similar contrast but uses nominal forms; e.g., mɛ̃ kam kar-ta hũ “I work,” mɛ̃ kam kər rəh-a hũ “I am working.” Both contain the finite form hũ of the auxiliary; but kər-ta and rəh-a are nominal forms, the latter the past of rəh-“stay.” Gujarati has both types, the present tense using finite verb forms, the imperfect employing nominal forms; e.g., hũ kam kərũ chũ “I work, am working” and hũ kam kər-to hə-to “I was working, used to work.” Even in areas in which finite forms are not used in the present, they occur in the imperative forms and what may be called the subjunctive; e.g., Hindi tum kam kər-o “work,” mɛ̃ əndər aũ “May I come in?”

The person–number system of the New Indo-Aryan verb accords with the use of pronouns. For example, the forms ja-o, kər-o in Gujarati təme kyã jao cho “Where are you going?” and šũ kəro cho “What are you doing?” are historically plurals but are used with reference to one person addressed by the pronoun təme. Similarly, in Hindi, in which a person distinction is not made in the plural, ap kəhã ja rəhe hɛ̃, ap kya kər rəhe hɛ̃, equivalent in meaning to the Gujarati sentences, have the plural form rəhe hɛ̃. Bengali has completely given up any number distinction in verb forms: ami/amra kori “I/we do.” In the 3rd person a distinction is made between ordinary and honorific: še (ordinary)/tini kɔren, plural tara/tãra kəren. Other languages (e.g., Hindi) also have honorific forms, for which the plural is used.

In the formation of the future there are again regional differences. Some retain the future in -s- (Gujarati hũ kər-iš, 3rd person e kər-š-e) or -h- (e.g., eastern dialects of Braj Bhasa, cəlihəõ “I will go”). Characteristic of the Eastern languages and of Bihari (including Bhojpuri, Magahi, Maithili) is the suffix -b-; e.g., Bengali jabe “will go.” All of these are finite forms. On the other hand, in Hindi and adjoining areas, the future is inflected for gender.

A similar contrast between the use of verbal and nominally inflected forms also appears in the past tense forms. The predominant pattern in New Indo-Aryan is that of Middle Indo-Aryan: forms are used that are etymologically participles.

The New Indo-Aryan languages retain the passive and causative forms. The causative is conservative in retaining both the affixes that appear in Middle Indo-Aryan and vowel alternation. The passive is also formed by affixation in some areas. But many languages also have a compound formation involving the verb ja “go” and an auxiliary (hɛ̃); e.g., Hindi yahã hindī bol-ī ja-t-ī hɛ̃ “Hindi is spoken here.”

There are other auxiliaries, which, like hɛ̃, can occur with any verb in the language; e.g., the verb “can,” Hindi sək-, Gujarati šək. A characteristic feature of New Indo-Aryan, however, is the use of certain verbs, variously called vector verbs or compound verbs, in restricted contexts and with particular semantics. For example, one can say mər gə-ya “He died,” bhūl gə-ya “He forgot,” bol uṭh-a “He blurted out” in Hindi, using the verbs ja “go” (masculine singular past gə-ya), uṭh “stand up.” This phenomenon is pan-Indo-Aryan and still requires investigation.

The examples cited above also illustrate the normal word order in New Indo-Aryan languages: subject (including agential forms), object (with attributive adjectives preceding), verb (together with auxiliaries). Adverbials can precede the full sentence or occur after the subject, with slight differences in emphasis; e.g., Hindi mɛ̃ kəl aũga, or kəl mɛ̃ aũga “I will come tomorrow (kəl).” Relative clauses normally precede correlatives: Hindi jo admī kəl tumhare ghər-mẽ tha vo kɔn hɛ “Who (kɔn) is the man (admī) who (jo) was in your house yesterday?” A notable exception to the normal final position for verbs occurs in Kashmiri, in which the verb usually occurs in second position after the subject; thus, to Hindi vo kha rəha hɛ “he is eating” corresponds Kashmiri su chu kh́avān with the auxiliary chu after the subject.


The two most important sources of non-Indo-Aryan vocabulary in New Indo-Aryan are Persian (including Arabic items introduced through Persian), the court language of the Mughals, and English. The Perso-Arabic vocabulary permeates every aspect of New Indo-Aryan vocabulary, especially in the midlands (Uttar Pradesh through the Punjab). There are, of course, Hindi-Urdu words proper to Islām: Hindi kuran “Qurʾān,” ʿīd (name of a holy day), nəmaz (certain prayers), məsjid “mosque,” as well as the word for “religion,” məźhəb. In addition, there are numerous Perso-Arabic military and administrative terms (kila “fort,” səvar “horseman,” ədalət “court of justice”); architectural and geographic terms (imarət “building,” məkan “house,” məhəl “palace,” duniya “world,” ilaka “province”); words having to do with learning and writing (kələm “pen,” kitab “book,” ədəb “literature, good manners”) and with apparel (jeb “pocket,” moja “socks,” rumal “handkerchief”) and anatomy (khūn “blood,” gərdən “neck,” dil “heart,” bazu “arm,” sər “head”). Indeed some of the most common vocabulary is of this origin: tārīkh “date,” vəkt “time,” sal “year,” həfta “week,” umər “age,” admī “man,” ɔrət “woman,” and others. Even the grammatical apparatus of postpositions and conjunctions reflects Perso-Arabic influence; e.g., -ke bad “after,” əgər “if,” məgər “but,” ya “or.”

The colloquial language used by any Hindu or Muslim communicating in Hindi-Urdu will contain a large number of such words. There have been efforts to polarize the two, and at times champions of Indo-Aryan have tried to replace Perso-Arabic vocabulary with Sanskritic words. The style that tends toward eliminating all but the most common Perso-Arabic words may be called High Hindi, written in the Devanāgarī script, as opposed to High Urdu, which retains Perso-Arabic of long standing, uses Persian and Arabic for learned vocabulary and is written in the Perso-Arabic script.

The influence of English as a source of borrowing still continues, and it is rare to hear a conversation on any technical subject among speakers of any Indian language in which English words are not liberally used. Among loanwords from English are names of conveyances such as Hindi rel-gaṛi “railroad-train” and ṭɛksī “taxi”; profession names such as injinīr “engineer,” jəj “judge,” ḍaktər “Western doctor,” pulis “police”; and terms of educational administration such as kaləj “college” and yunivərsiṭī “university.” English words are susceptible to replacement in India by Sanskritic ones as are those of Perso-Arabic origin.

Of much lesser magnitude are New Indo-Aryan borrowings from other languages, among them Portuguese and Turkic. From the latter, the word urdū came to be used as the name of a language. From Portuguese come such Hindi words as ənənnas “pineapple,” paũ “(Western style) bread,” kəmīz “(Western) shirt,” kəmra “room,” and girja “(Christian) church.”

Writing systems

Ancient India had two main scripts in which Indo-Aryan languages were written. Kharoṣṭi, used in the northwest, is of Aramaic origin and is written from right to left; Brāhmī, of North Semitic origin, is written from left to right and appears earliest on Aśokan inscriptions in areas other than the northwest. Most scripts of New Indo-Aryan are developments of the Brāhmī. The Devanāgarī (or simply Nāgarī), used for writing Sanskrit documents in North India, is the script of Hindī and Marāṭhī as well as Nepālī. Gujarātī uses a more cursive derivative. Devanāgarī also is used, mainly among Hindus, for Kashmirī, which has, in addition, a traditional script called Sarada, which is not now in common use. The Perso-Arabic script is used instead. Also usually written in Perso-Arabic are Urdū and Sindhī (for which the Devanāgarī also is used in schools in India), whereas Punjabi employs it in Pakistan as well as a particular script of its own, known as Gurmukhi (“From the Teacher’s Mouth”) in the sacred writings of the Sikhs. In the east, the scripts used for Bengali and Assamese are closely related; and that of Oṛiyā, related to the other two, is highly cursive like that of neighbouring Dravidian languages. Such is also the case with Sinhalese.

The traditional alphabets are both over-explicit and not clear enough with regard to accurate representation of the spoken word. As systems in which a consonant symbol with no other accessory symbol accompanying it stands for the syllable consisting of the consonant followed by short a, they require previous knowledge of items for correct interpretation; Hindī kərta is written ka-ra-tā in the Devanāgarī, and, to pronounce it properly, one must know that the word has only two syllables. Although Bengali has only the spirant sound š, the alphabet has symbols for ś, ṣ, and s, as in Old Indo-Aryan; but verb forms such as kori and kəren are written ka-ri and ka-re-na, both with the same initial symbol. And, though syllabic ṛ was lost as early as Middle Indo-Aryan, the scripts have a separate symbol for this. Script reform has been suggested; it has even been proposed that all Indo-Aryan languages adopt a Latin (roman) alphabet with diacritics, but chances for this are poor. (See also alphabet.)

