Differences in word order between PinyinAbcSort and the ABC Chinese-English Dictionary
Link to the Github repo: http://github.com/alfons/PinyinAbcSort
In the esteemed ABC Chinese–English Comprehensive Dictionary published by the University of Hawai‘i Press, word order follows a Western alphabetical principle, quite strictly so: entries are sorted primarily by the base letters of the Latin alphabet, with Pīnyīn diacritics merely considered as tie-breakers when the base spellings are otherwise identical.
By contrast, the PinyinAbcSort-algorithm sorts words by the base letters of the Latin alphabet while fully respecting each diacritic as a core feature—allowing tones to shape the order from the very beginning, rather than serving merely as secondary tie-breakers. As a result, the sorting is quite different. For example, a side-by-side comparison:
2. Philosophy of Word List Design
Alphabetization isn’t neutral—it’s a design decision. Languages which use an extended Latin alphabet generally have their own conventions for treatment of the extra letters. Here, too, one road prioritizes ease of lookup for users familiar with English language dictionaries; the other preserves phonological fidelity by respecting the tone marks of Chinese.
- The ABC dictionary approach is Westernized: prioritizing base-letter order à la English.
- The PinyinAbcSort-algorithm approach reflects a more Sinophone-conscious logic, aligning with how tones carry semantic weight in Mandarin.
This distinction touches on:
- Does the order reflect how Chinese is actually pronounced?
- Should sorting reflect technical accuracy or acedemic tradition?
- What’s easier or more natural to search for?
3. Personal Commentary
Language is highly personal. The way we speak is part of our identity. Therefore, I don’t think a choice of roads can be made by mere academic reasoning. Here is my personal commentary on why I designed PinyinAbcSort the way it is.
“Children learn what they live. Put kids in a class and they will live out their lives in an invisible cage, isolated from their chance at community; interrupt kids with bells and horns all the time and they will learn that nothing is important; force them to plead for the natural right to the toilet and they will become liars and toadies; ridicule them and they will retreat from human association; shame them and they will find a hundred ways to get even.” — John Taylor Gatto, in his book Dumbing Us Down.
My extreme, extraordinary, and uncompromising passion for Hànyǔ Pīnyīn does not stem from academic study, nor from an early acquaintance with the works of John DeFrancis, Zhōu Yǒuguāng, Yīn Bǐnyōng, and others. These figures are among the most prominent pioneers of Hànyǔ Pīnyīn, and without their vision and perseverance, it would not exist. But I only discovered their names, books and writings after my own path had already taken shape, after I had become deeply and irrevocably committed to the fruits of their labour, Hànyǔ Pīnyīn.
In June 2024, I’ve made a personal decision:
From hereafter I will use Hànyǔ Pīnyīn as a complete replacement for Chinese characters. For me this concerns diary-style writing and casual, leisure reading (novels, interview transcripts, subtitles, etc).
Where does my passion come from?
My passion comes from nearly two decades of being shamed, ridiculed, corrected, and condescended, as a “lǎowài”, a perpetual foreigner, deemed incapable of grasping one fundamental truth: that Mandarin has four tones.
I’ve spent thousands of hours, and thousands of dollars, trying to learn Mandarin in every imaginable setting. And yet, I failed spectacularly, miserably and almost completely, and nearly every teacher I’ve encountered did sing the same refrain:
“Your pronunciation is wrong. Chinese has four tones. You need to learn this first.”
Ironically, the very script they use for teaching Chinese, and rely on themselves — Chinese characters — offers little to no tonal guidance. In contrast, Hànyǔ Pīnyīn not only represents the four tones — it excels at doing so. And yet, most teachers I’ve met cannot read Hànyǔ Pīnyīn with ease. They shy away from it, downplay its significance, try to talk me out of it — even though it encodes tone with precision and elegance.
Anyone who has read the official standard (GB/T 16159–2012) or any major scholarly treatment of Hànyǔ Pīnyīn will know how much care is given to the spelling of names, how meticulously tone rules are applied in personal and place names. And yet, many of the very pioneers of Pīnyīn often spell their own names without diacritics. I find this unacceptable.
Chinese has four tones. This much I have learned. Hànyǔ Pīnyīn spells them out. This is my gospel.
The reason why “PinyinAbcSort” can’t be spelled “PīnyīnAbcSort”
While “PīnyīnAbcSort” would be linguistically accurate, it’s not fit for technical use. Diacritics can break URLs, aren’t allowed in most programming identifiers, and may cause issues in older or non-ASCII-safe systems. Sad, but acceptable—as an exception due to technical limitations. Confucius says: blind passion fades, passion that’s guided by reason endures.
Confucius also says: passion burns like fire, few add tone marks.