Pinyin

Pinyin is the most useful single tool a Mandarin learner has and the thing most beginners misunderstand. It is not the Chinese alphabet. Chinese does not have an alphabet. Chinese is written in characters (hanzi), and pinyin is the Romanisation system invented to write those sounds down using the Latin letters you already know.

Hanyu Pinyin ("spelled-out sounds of the Han") was developed by a committee led by Zhou Youguang and officially adopted by the People's Republic of China in 1958. The ISO made it the standard for Romanising Mandarin in 1982 and the UN followed in 1986. Today it is the system used to teach Mandarin pronunciation almost everywhere, the system used to type Chinese characters, and the system the rest of this site uses for every Mandarin word.

This page covers the four tones, the 21 initials, the finals, the tone-mark placement rules, the sandhi rules, the small lies pinyin tells about the letter u, and the historical alternatives.

What pinyin is, and what it is not

Pinyin is a Romanisation, not a phonetic alphabet. A phonetic alphabet (like the IPA) tries to capture sounds neutrally. A Romanisation uses the 26 Latin letters even where they do not map cleanly, so it is typeable on a keyboard.

That is why pinyin has q, x, and c doing jobs they would never do in English. The mapping is internally consistent, but it is a code you have to learn. Pinyin is not the writing system of Chinese (native speakers read books in hanzi), the tones are not stress marks, and the same letter can represent different sounds depending on what comes after it.

The four tones

Mandarin is a tonal language. The pitch contour of a syllable is part of the word, just like the consonants and vowels. Get the tone wrong and you have said a different word, not the same word with an accent.

The textbook example is ma, four (or five) different words depending on tone:

PinyinToneCharacterMeaning
1stmother
2ndhemp
3rdhorse
4thscold
maneutralquestion particle

The four full tones:

  1. First, high level. Flat line: ā, ē, ī, ō, ū. Pitch starts high and stays high, like singing a steady note. Example: mā (mother).
  2. Second, rising. Acute accent: á, é, í, ó, ú. Pitch rises from the middle to the top, like the English intonation of a question ("really?"). Example: má (hemp).
  3. Third, dipping. Caron: ǎ, ě, ǐ, ǒ, ǔ. Pitch starts low, drops further, and rises slightly. In connected speech it often just stays low. Example: mǎ (horse).
  4. Fourth, falling. Grave accent: à, è, ì, ò, ù. Pitch falls sharply from high to low, like an irritated English "no!". Example: mà (scold).

The fifth tone, normally called the neutral tone (轻声 qīngshēng, "light sound"), has no mark and is pronounced short, light, and unstressed. It has its own section below.

If you are coming from a non-tonal language, the conceptual leap: tone is not a layer on top of a word. It is the word. The pitch contour is as much part of "mā" as the m and the a are. Mixing up mā and mà is, to a Mandarin ear, the same kind of mistake as mixing up "bat" and "bath" in English.

The 21 initials

An initial is the consonant at the start of a syllable. Mandarin has 21, in groups that share an articulation point.

InitialIPAClosest English approximation
b/p/Unaspirated p, between English b and p. Like the p in spy.
p/pʰ/Aspirated p, strong puff of air. Like the p in pie.
m/m/Same as English m.
f/f/Same as English f.
d/t/Unaspirated t, between English d and t. Like the t in stop.
t/tʰ/Aspirated t. Like the t in top.
n/n/Same as English n.
l/l/Same as English l.
g/k/Unaspirated k, between English g and k. Like the k in sky.
k/kʰ/Aspirated k. Like the k in kite.
h/x/Harsher than English h. Closer to Scottish loch or German Bach.
j/tɕ/Like English "j" in jeep, but with the tongue forward and flat.
q/tɕʰ/Aspirated j. English "ch" with tongue forward, lips spread (cheese, not chocolate).
x/ɕ/English "sh" with the tongue tip down behind the bottom teeth.
zh/tʂ/English "j" with the tongue curled back.
ch/tʂʰ/Aspirated zh. English "ch" with the tongue curled back.
sh/ʂ/English "sh" with the tongue curled back.
r/ʐ/Between English "r" and the s in "pleasure". Not the English approximant.
z/ts/Unaspirated "ts". Like the "ds" in kids.
c/tsʰ/Aspirated "ts". Like the "ts" in cats, with a puff of air after.
s/s/Same as English s.

Two things in this table to internalise.

The b/p, d/t, g/k pairs are aspirated vs unaspirated, not voiced vs voiceless. In English, b and p differ mainly in voicing. In Mandarin both are voiceless; the difference is the puff of air after. Hold a piece of paper in front of your mouth and say "spy" and "pie": the paper barely moves on "spy" but flutters on "pie". Mandarin b is the "spy" consonant, Mandarin p is the "pie" consonant. Same logic for d/t and g/k.

j, q, x are alveolo-palatal (tongue forward and flat). zh, ch, sh, r are retroflex (tongue curled back). z, c, s are dental (tongue tip behind the upper front teeth). The problem children for English speakers are q, x, c, and the zh/ch/sh/r set. The fix is to slow down and pay attention to where the tongue actually goes, rather than reaching for the nearest English equivalent.

The finals

A final is what follows the initial: a single vowel, a diphthong, or a vowel with a nasal ending. There are around 35, spelled with combinations of a, o, e, i, u, ü, plus n or ng for the nasal ones.

The basic vowels:

  • a as in father
  • o as in sore, lips rounded
  • e alone as the schwa of the ("uh"), not English "e"
  • i as in see, except after z, c, s, zh, ch, sh, r where it is a buzzy continuation of the consonant
  • u as in boot, lips rounded
  • ü as in French tu or German Tür: "ee" with rounded lips. Not an English sound.

Two surprises. The letter e alone is a schwa; "le" is "luh", not "lay". And i after the dental and retroflex consonants (zi, ci, si, zhi, chi, shi, ri) is not "ee" at all; it is a buzzy vowel that holds the position of the consonant. "Shi" is more like "shrr" than "shee".

Three of the basic vowels (i, u, ü) can act as a glide at the start of a final, sliding into the main vowel. That gives most of the long list:

  • ai ("eye"), ei ("ay"), ao ("ow"), ou ("oh")
  • an ("ahn"), en ("un"), ang ("ahng"), eng ("ung"), ong ("oong", rounded)
  • ia ("yah"), ie ("yeh"), iao ("yow"), iu ("yoh"), ian ("yen"), in ("een"), iang ("yahng"), ing ("eeng"), iong ("yoong")
  • ua ("wah"), uo ("waw"), uai ("why"), ui ("way"), uan ("wahn"), un ("wun"), uang ("wahng"), ueng ("wung")
  • üe ("yweh"), üan ("ywen"), ün ("yoon" with rounded lips)
  • er, a syllable on its own, the only final with an r

A few of these spellings hide what they really sound like.

  • iou is written iu. The o is suppressed in spelling, but pronounced. liu is "lyoh", not "lyoo".
  • uei is written ui. The e is suppressed but pronounced. gui is "gway", not "gwee".
  • uen is written un after a consonant. dun is "dwun", not "doon".
  • ian is pronounced "yen", not "yan". Tian, mian, nian all rhyme with "yen".
  • ong is closer to "oong", with rounded lips.

These are the spellings most likely to mislead a fresh learner. The rest are what they look like once you have the basic vowel values.

The ü problem

The ü (u with two dots) is a separate vowel from u. It is the front-rounded vowel of French (tu) and German (Tür), absent from standard English. The closest English approximation: say "ee" and round your lips, tongue forward.

The ü appears with its dots after n and l, because plain u after those consonants is a different sound:

  • (女, woman) is not nu (奴, slave).
  • (绿, green) is not lu (路, road).

After j, q, x, and y, the ü is written as plain u with the dots dropped. The reasoning: j, q, x, y cannot combine with the true u sound at all; the only u sound they can combine with is ü, so there is no ambiguity.

This is one of the small lies pinyin tells. When you read ju, qu, xu, yu, you should pronounce them , , , . Not "joo", "choo", "shoo", "yoo". Same for jue, quan, xun, yuan: all ü-finals dressed as u-finals. After j/q/x/y, every u in the spelling is really ü.

Tone mark placement

Tone marks go on a vowel. When the final has more than one vowel, you need a rule for which gets the mark.

The priority order: a > o = e > i = u > ü. If there is an a, the mark goes on the a. If no a but there is an o or e, the mark goes there. Otherwise, the mark goes on the i, u, or ü. When i and u sit next to each other (as in liu or gui), the tone mark goes on the second of the two.

Worked examples:

  • liú (流, flow): i and u together, mark on the second (u).
  • guī (规, rule): u and i together, mark on the second (i).
  • xiào (笑, laugh): priority rule picks a.
  • jué (觉, feel): no a, mark on e.
  • (旅, travel): single vowel ü.

Tone sandhi

Tone sandhi is the change in tone that happens when certain tones meet in connected speech. The pinyin is normally written with the original tones, but the actual pronunciation shifts. Three rules to know from the start.

1. Third + third becomes second + third. Two third tones in a row: the first is pronounced as a second tone. The classic example is nǐ hǎo (你好, hello), written with two third tones but pronounced ní hǎo. The spelling does not change; the pronunciation does. Longer strings group similarly.

2. The character 一 (yī, one) changes tone depending on what follows. Alone or at the end of a phrase, it is (first). Before first, second, or third tone, it becomes (fourth). Before fourth or neutral, it becomes (second): 一个 yí ge, 一定 yí dìng, 一样 yí yàng. Most modern materials write the sandhi tone, which is what you actually say.

3. The character 不 (bù, not) changes from fourth to second tone before another fourth tone. bù chī (not eat) stays fourth; bù hǎo (not good) stays fourth; bú shì (is not) and bú duì (not correct) shift to second.

All three sandhi rules are automatic for native speakers. You do not need to memorise them as rules for life; you need to be exposed to them often enough that they happen without thinking.

The neutral tone

The neutral tone (轻声 qīngshēng) is short, light, and unstressed. It has no tone mark and its actual pitch depends on the tone before it. Common contexts:

  • Grammatical particles: 的 de, 了 le, 吗 ma, 呢 ne, 吧 ba.
  • The second character of a doubled word: 妈妈 māma, 爸爸 bàba, 谢谢 xièxie.
  • Noun suffixes: 子 zi (桌子 zhuōzi, table; 椅子 yǐzi, chair).
  • Directional and aspectual complements: 起来 qǐlai, 进去 jìnqu.

If a syllable sounds too short and quiet to belong to any of the four full tones, it is neutral. Leave the mark off.

Pinyin against the alternatives

Pinyin is the standard, but you will meet the older systems in proper nouns, library catalogues, and Taiwan.

Wade-Giles dominated the English-speaking world from the late 19th century until the 1980s. It is behind older spellings like Peking (Beijing), Tsingtao (Qingdao), Mao Tse-tung (Mao Zedong), and Chiang Kai-shek (Jiang Jieshi). Taiwan still uses it for many place names: Kaohsiung, Hsinchu, Taichung. It marks aspiration with apostrophes, which were often dropped in print.

Bopomofo, officially zhuyin fuhao (注音符号), is a phonetic system of 37 symbols derived from Chinese character components. It is taught in Taiwanese primary schools and shares no spellings with pinyin, though the underlying sound inventory is the same.

Yale Romanisation was developed in the US during WWII to teach Mandarin to military personnel. It was the standard in American universities for decades and has been replaced.

Pinyin is now the standard in mainland China, for international communication, and for the UN and ISO. Taiwan officially adopted it in 2009 for everything that interfaces with the outside world. If you are learning Mandarin in 2026, you learn pinyin.

Erhua and the er suffix

The character 儿 (ér) has two functions. It can be a syllable on its own (儿子 érzi, child) or a suffix that softens and rhotacises the end of another word. The second use is erhua (儿化), a feature of northern Mandarin and particularly Beijing.

When 儿 is suffixed, it does not become its own syllable. It changes the ending of the syllable it attaches to, written in pinyin by adding r:

  • 花 huā (flower) becomes 花儿 huār.
  • 一点 yīdiǎn (a little) becomes 一点儿 yīdiǎnr.
  • 这边 zhèbiān (this side) becomes 这边儿 zhèbiānr.

The tongue curls back at the end of the syllable, often swallowing a final n or i. Diǎnr sounds more like "dyar" than "dyan-r". Standard Putonghua tolerates erhua without requiring it. Recognise it when you hear it; do not feel obliged to use it unless you are learning a northern variety.

Typing Chinese with pinyin

Pinyin is the everyday typing system for Mandarin. Chinese characters are entered on phones and computers via an Input Method Editor (IME) that takes pinyin and offers matching characters.

You type wo and the IME shows candidates: 我 (I), 沃 (fertile), 卧 (recline), most frequent first. You press a number key or tap to select. Modern IMEs predict the next character from context, so typing wohen offers the bigram 我很 directly. Tone marks are almost always omitted; the IME works it out from context and frequency. To type ü after n or l, most IMEs accept v as a substitute: nv for nü, lv for lü.

A native speaker types pinyin all day. For a learner, this means your pinyin practice is also your future typing practice.

What pinyin is not, again

Pinyin is a Romanisation with deliberate compromises. The most counterintuitive:

  • q is an aspirated alveolo-palatal affricate, not English "q". The letter q was free, so it was used.
  • x is an alveolo-palatal fricative, not English "x". Same logic.
  • c is an aspirated dental "ts", not English "c".
  • i is not always "ee". After z, c, s, zh, ch, sh, r it is a buzzy continuation vowel.
  • e alone is a schwa, not English "e".
  • u after j, q, x, y is ü, not "oo".

None of these are mistakes. They are choices a 1958 committee made because the alternative was diacritics, apostrophes, or extra letters that would have made the system harder to type. Pinyin syllables follow a strict shape: one initial + one final + one tone. No consonant clusters, no syllable-final consonants except n, ng, and r. Once you have the rules, you can read pinyin and produce the correct sound every time.

The single most important thing

Mandarin has a notoriously high time-to-fluency for English speakers, and the most common reason is that learners spend too long on characters and not enough on pronunciation. Characters look hard and exotic, so they must be the main effort. But they do not help you say a word and they do not help you understand a spoken word. They are a separate skill on top of the spoken language.

The fastest route to basic competence is to spend the first six months on pinyin and tones, with characters as a parallel low-priority effort. Pronounce the four tones reliably and you can be understood; fail at them and no vocabulary will save you, because Mandarin words depend on tone for their identity. Master pinyin first. The characters will come faster once you can hear and produce the language they encode.

We use essential cookies to make the site work. With your consent we also use analytics and advertising cookies (Google Analytics, Google AdSense) to understand site usage and fund the editorial content. You can change your choice at any time using the Cookie Settings link in the footer. Learn more