Mandarin IPA

The International Phonetic Alphabet (IPA) is the universal sound key for languages: one symbol, one sound, no overlap. For Mandarin specifically, the IPA matters more than for most languages, because the standard way of writing Mandarin sounds in Roman letters, pinyin, is a Romanisation rather than a phonetic alphabet. Pinyin makes deliberate compromises so that every syllable can be typed on a keyboard with the 26 Latin letters; the IPA does not.

The result is that pinyin has letters doing jobs they would never do in any European language. The pinyin q is not the English /kw/. The pinyin x is not the English /ks/. The pinyin c is not the English /k/ or /s/. The pinyin i sounds completely different after z, c, s, zh, ch, sh, r than it does after any other consonant. None of these are mistakes; they are choices a 1958 committee made to keep pinyin typeable. The IPA disambiguates immediately. Read q as /tɕʰ/ and you know exactly what to do.

This page covers the 21 initial consonants of standard Mandarin in IPA, the vowels and finals with the pinyin spellings that mislead, the four tones in IPA tone-letter notation, the sandhi rules, and the erhua /ɚ/ ending. For more on pinyin itself, including the writing-system context and the typing system, see the pinyin page.

The 21 initials

Mandarin initials, the consonants that begin a syllable, organise by place of articulation and aspiration. Each row of the chart that follows shares an articulation point. Each column distinguishes by manner (unaspirated stop, aspirated stop, fricative, etc.).

Bilabials and labiodental

IPAPinyinExampleNote
/p/bbā 八Unaspirated. Halfway between English b and p. Like the p in "spy".
/pʰ/ppā 趴Aspirated. Strong puff of air. Like the p in "pie".
/m/mmā 妈Same as English m.
/f/ffā 发Same as English f.

The pinyin b is /p/ in IPA, not /b/. This is the first surprise. English uses voicing to distinguish "b" from "p" (the vocal cords vibrate during b, not during p). Mandarin uses aspiration: both consonants are voiceless but one has a puff of air after and the other does not. Pinyin chose the letter b for the unaspirated one to match the English speaker's intuition that it sounds like "softer than p". In IPA, the unaspirated voiceless stop is /p/, the aspirated voiceless stop is /pʰ/.

Dental and alveolar

IPAPinyinExampleNote
/t/ddà 大Unaspirated t, like the t in "stop".
/tʰ/ttā 他Aspirated t, like the t in "top".
/n/nnǐ 你Same as English n.
/l/llái 来Same as English l. Clear, not the dark "ll" of "ball".
/ts/zzài 在Unaspirated "ts". Like the "ds" in "kids" but voiceless.
/tsʰ/ccài 菜Aspirated "ts". Like the "ts" in "cats" with a puff after.
/s/ssān 三Same as English s.

The pinyin c is /tsʰ/ in IPA, a single sound: aspirated "ts". It is not /k/ as in English "cat", not /s/ as in English "city", and not two separate consonants. It is one affricate (a stop followed immediately by its release as a fricative). The fastest way for an English speaker to produce it: say "cats" and clip off the "ka". You are left with "tsa" with a puff of air after the t. That is /tsʰa/.

Retroflex

IPAPinyinExampleNote
/ʈʂ/zhzhū 猪Unaspirated retroflex affricate. English "j" with tongue curled back.
/ʈʂʰ/chchī 吃Aspirated retroflex affricate. English "ch" with tongue curled back.
/ʂ/shshū 书Retroflex fricative. English "sh" with tongue curled back.
/ʐ/rrì 日Retroflex voiced fricative or approximant. Between English r and "zh".

Retroflex consonants curl the tongue tip back so it points to or touches the area just behind the alveolar ridge (the bump behind the upper teeth). To produce /ʈʂ/, /ʈʂʰ/, /ʂ/: curl the tongue back as if you are trying to lick the roof of your mouth, then produce an English "j", "ch", or "sh" from that position. The result is darker, hollower, and unmistakably Mandarin.

The pinyin r is /ʐ/ in IPA, not the English /r/. Standard Mandarin /ʐ/ is the voiced version of /ʂ/ (sh). It sounds halfway between English r and the s in "pleasure", with the tongue tip curled back. English speakers often substitute their own /r/ approximant; this is recognisable as foreign but understood. Some northern Chinese speakers produce /r/ as something closer to a true retroflex approximant /ɻ/, blurring the line.

Alveolo-palatal

IPAPinyinExampleNote
/tɕ/jjī 鸡Unaspirated alveolo-palatal affricate. English "j" with the tongue flat and forward.
/tɕʰ/qqī 七Aspirated alveolo-palatal affricate. English "ch" with the tongue flat and forward.
/ɕ/xxī 西Alveolo-palatal fricative. English "sh" with the tongue flat and forward.

This is the row that confuses English speakers most. The pinyin q is /tɕʰ/ in IPA. It is not the English /kw/ as in "queen". It is an aspirated affricate produced with the tongue flat against the area where the alveolar ridge meets the hard palate, lips spread (not rounded). The closest English starting point: say "cheese" and pay attention to where your tongue is. Now make the tongue even flatter and further forward. That is /tɕʰ/.

The pinyin x is /ɕ/ in IPA. It is not the English /ks/ as in "box". It is a soft "sh" sound with the tongue tip down behind the lower teeth and the tongue body raised toward the hard palate. The fastest English starting point: say "she" and pay attention to the tongue. Now lower the tongue tip behind the lower teeth and let the tongue body do the work. That is /ɕ/.

The pinyin j is /tɕ/, the unaspirated partner of q. Like the English "j" in "jeep" but with the tongue forward and flat. It is not voiced like the English j; both /tɕ/ and /tɕʰ/ are voiceless. The difference between them is the puff of air after.

Velar

IPAPinyinExampleNote
/k/ggē 哥Unaspirated k. Like the k in "sky".
/kʰ/kkē 棵Aspirated k. Like the k in "kite".
/x/hhē 喝Velar fricative. Harsher than English h.

The pinyin h is /x/ in IPA, the same symbol as Spanish j and German Bach. It is produced further back in the throat than English /h/, with audible friction. Many Mandarin speakers (especially southern) realise it closer to English /h/, so substitution is tolerated. The standard target is the harsher /x/.

The three sibilant series, side by side

The hardest single thing about Mandarin initials for adult learners is keeping the three sibilant series straight. Each series has an unaspirated affricate, an aspirated affricate, and a fricative.

SeriesUnaspirated affricateAspirated affricateFricativeTongue position
Dentalz /ts/c /tsʰ/s /s/Tongue tip behind upper front teeth.
Retroflexzh /ʈʂ/ch /ʈʂʰ/sh /ʂ/Tongue tip curled back toward the palate.
Alveolo-palatalj /tɕ/q /tɕʰ/x /ɕ/Tongue flat and forward, tongue body raised to palate.

The series are mutually exclusive in their finals. The alveolo-palatal series (j, q, x) only ever combines with i, ü, or finals starting with i or ü. The dental and retroflex series never combine with i, ü, or finals starting with i or ü, only with the back vowels and the buzzy "i" (see below). This complementary distribution is what lets pinyin spell j+i as ji without ambiguity: only the alveolo-palatal j can appear before i.

Vowels and finals

A final is what follows the initial: a vowel, a diphthong, or a vowel with a nasal coda. Mandarin has about 35 finals built from a small set of vowel sounds.

Monophthongs

IPAPinyin (after most initials)Pinyin (after j/q/x/y)ExampleApproximation
/a/aabā 八The "a" in "father".
/o/o-bō 波A rounded back vowel, lips rounded.
/ə/e-gē 哥The schwa of "the". Not English "eh".
/i/iijī 鸡The "ee" in "see".
/u/u-wū 屋The "oo" in "boot", lips rounded.
/y/ü (after n, l)u (after j, q, x, y)nǚ 女, jū 居French "tu". See below.

The pinyin e alone is /ə/ in IPA, a schwa, not English "eh". The word (older brother) is /kɤ/ to /kə/, not "geh". The pinyin o alone is a rounded back vowel /o/, only appearing after b, p, m, f and in a few other contexts. The vowel /y/ (the umlauted u) is the front-rounded vowel of French tu and German Tür: say "ee", then round the lips while keeping the tongue forward. After j, q, x, and y the dots are dropped in pinyin spelling (ju is /tɕy/, not /tɕu/), but the sound is still /y/.

The buzzy "i"

The pinyin i has two completely different sounds depending on what precedes it.

  • After j, q, x, y and after the bilabials, dentals, etc.: /i/, the standard "ee" vowel. is /tɕi/.
  • After the dental sibilants z, c, s: /ɹ̩/ or /ɨ/ in some transcriptions. A buzzy continuation of the consonant, with the tongue staying in the dental position. zi is /tsɹ̩/, not /tsi/.
  • After the retroflex sibilants zh, ch, sh, r: /ʐ̩/ or /ɻ̩/ in some transcriptions. A buzzy retroflex vowel, with the tongue staying curled. shi is /ʂʐ̩/ or /ʂɻ̩/, not /ʃi/.

The buzzy i is not a separate vowel sound English has any obvious equivalent for. The closest English analogy is the vocalic ending of "shh" when you shush someone: a held continuation of the fricative, with no clear vowel quality. Once you hear it, it stops being confusing.

Diphthongs

IPAPinyinExampleApproximation
/ai/aiài 爱"eye"
/ei/eigěi 给"ay" in "say"
/au/aohǎo 好"ow" in "now"
/ou/ougǒu 狗"oh" in "go", without glide

Nasal endings

IPAPinyinExampleApproximation
/an/ansān 三"ahn"
/ən/enrén 人"un" in "fun"
/aŋ/angfāng 方"ahng"
/əŋ/enggēng 更"ung"
/oŋ/ongtóng 同"oong" with rounded lips

The mismatches: where pinyin lies

A handful of common pinyin spellings mask what they sound like in IPA. These are the spellings most likely to mislead an adult learner.

  • Pinyin ian is /iɛn/ in IPA, not /ian/. The a is opened to an /ɛ/ before the n. The word tiān (day) is /tʰiɛn/, rhyming with English "yen", not "yan".
  • Pinyin ui is /uei/ in IPA. The e is spelled-out in slow speech, suppressed in spelling. The word guī (rule) is /kuei/, sounding like "gway" not "gwee".
  • Pinyin iu is /iou/ in IPA. Same logic: the o is suppressed in spelling, pronounced in speech. liú (flow) is /liou/, sounding like "lyoh" not "lyoo".
  • Pinyin un is /uən/ in IPA after most consonants. The e is suppressed. dūn is /tuən/, sounding like "dwun" not "doon".
  • Pinyin ong is /oŋ/ in IPA, with rounded lips. Not the "ong" of English "long".
  • Pinyin ü appears as plain u after j, q, x, y. The pinyin ju is /tɕy/, not /tɕu/. There is no /tɕu/ in standard Mandarin, which is what licenses the spelling shortcut.

These six spellings cause the bulk of pronunciation errors among self-taught Mandarin learners. The IPA shows you what is actually said.

The four tones in IPA

Mandarin is tonal: the pitch contour of a syllable is part of the word, not a layer of intonation on top. Get the tone wrong and you have said a different word.

IPA represents tone with tone letters, a column of vertical lines from 1 (low) to 5 (high), or with numeric superscripts. Below is each Mandarin tone in tone-letter notation, with the pinyin diacritic and a Chao tone-number representation (the system Chinese linguist Yuen Ren Chao introduced, on a 1-to-5 scale where 5 is highest).

ToneNamePinyin markIPA tone letterChao numbersDescription
1High levelmā / ˉ˥55Pitch starts high, stays high.
2Risingmá / ˊ˧˥35Pitch rises from mid to high.
3Dippingmǎ / ˇ˨˩˦214Pitch starts low, drops further, rises slightly.
4Fallingmà / ˋ˥˩51Pitch falls sharply from high to low.
5Neutralma (no mark)(depends)(varies)Short, light, pitch depends on the preceding tone.

The third tone in citation form is the famous dip: starts at 2 on the scale, drops to 1, rises to 4. In connected speech, the rising tail often disappears and the tone simplifies to a low half-third (just the 21 part). Native speakers do this automatically; learners who try to produce the full dipping contour every time can sound unnatural.

The fifth tone (neutral) is not a separate pitch contour. It is the absence of a tonal target: the syllable is short, light, and its pitch is whatever falls naturally after the preceding tone. After a first tone, the neutral falls to a low pitch; after a third tone, it rises slightly. It appears on grammatical particles (的 de, 了 le, 吗 ma), the second syllable of doubled nouns (妈妈 māma, 谢谢 xièxie), and many suffixes.

A full Mandarin word in IPA looks like:

  • ma1 mother: /ma˥/
  • ma2 hemp: /ma˧˥/
  • ma3 horse: /ma˨˩˦/
  • ma4 scold: /ma˥˩/
  • ma0 question particle: /ma/ (neutral, pitch depends on context)

Or with the Chao numbers: /ma55/, /ma35/, /ma214/, /ma51/.

Tone sandhi in IPA

Tone sandhi rules are the changes in tone that happen when certain tones meet in connected speech. The pinyin is normally written with the citation tones, but the actual IPA realisation shifts.

Third plus third becomes second plus third. When two third tones meet, the first becomes a second tone:

  • nǐ hǎo (你好, hello): citation /ni˨˩˦ xau˨˩˦/, actual /ni˧˥ xau˨˩˦/.

The character 一 (yī, one) changes tone depending on what follows:

  • Alone or at the end of a phrase: /i˥/.
  • Before a fourth tone or neutral tone: becomes second tone, /i˧˥/. So yí gè /i˧˥ kɤ/.
  • Before any other tone (1st, 2nd, 3rd): becomes fourth tone, /i˥˩/. So yì zhāng /i˥˩ ʈʂaŋ˥/.

The character 不 (bù, not) changes from fourth to second tone before another fourth tone:

  • Before any tone other than fourth: stays fourth, /pu˥˩/. So bù chī /pu˥˩ ʈʂʰi˥/.
  • Before a fourth tone: becomes second, /pu˧˥/. So bú duì /pu˧˥ tuei˥˩/.

These are all automatic for native speakers. Learners do not need to memorise them as rules for life, only to be exposed to enough connected speech that the changes happen without conscious thought.

Erhua: /ɚ/

The character 儿 (ér) is /ɚ˧˥/ on its own (a rhotacised mid vowel with rising tone). As a suffix, it does not become a separate syllable; it modifies the end of the syllable it attaches to. This is erhua (儿化), the rhotacisation of finals especially characteristic of northern Mandarin and the Beijing dialect.

The IPA realisation depends on what the final ends in. The tongue curls back at the end of the syllable, and the rhoticisation can swallow or modify the preceding sound:

  • 花 huā /xua˥/, flower, becomes 花儿 huār /xuɚ˥/.
  • 一点 yìdiǎn /i˥˩ tiɛn˨˩˦/, a little, becomes 一点儿 yìdiǎnr /i˥˩ tiɚ˨˩˦/. The n is swallowed.
  • 这边 zhèbiān /ʈʂɤ˥˩ piɛn˥/, this side, becomes 这边儿 zhèbiānr /ʈʂɤ˥˩ piɚ˥/.

Standard Putonghua (the official spoken standard) tolerates erhua but does not require it. Learners can recognise it when they hear it and use it sparingly or not at all without sounding wrong. Heavy erhua is a marker of northern, especially Beijing, speech.

Why the IPA is essential for Mandarin

For most languages, the IPA is a useful tool. For Mandarin, it is closer to essential, because pinyin is built on compromises that mislead anyone who reads it as if it were English spelling. The pinyin q read as English "kw" is unrecognisable. The pinyin x read as English "ks" is unrecognisable. The pinyin i after shi read as "ee" is wrong. None of these problems exist in IPA: /tɕʰ/, /ɕ/, and /ʂʐ̩/ all unambiguously specify what to say.

Treat the IPA as the spec, and pinyin as the input method. The pinyin page covers the practical, typeable system you will actually write Mandarin in. This page covers the underlying sound system the pinyin is trying to encode. Adult learners who hold both in their heads at once stop being surprised by Mandarin pronunciation traps. They have the answer key.