Pīnyīn Tone Vowel Coloring

Today I’ve spent almost the entire day, from 7:30 am to 22:20 pm, working on my coloring of vowels. Fun. Borderline crazy. Total obsession. But I guess that’s the only way to get things done.

And I have to say, it’s b-e-a-u-t-i-f-u-l ! A total beauty. I love it.

The most difficut part was to get the text to copy. That’s probably something anyone is taking for granted. Copy, Paste, Print. But to get “copy the whole page into the clipboard” to work, that alone did cost me around 8 hours, and I had to leave it at copying the whole page. I tried a many ways, and also a few well-established 3rd party solutions (such as highlightjs and quilljs) but to no avail. At times I felt a bit like Bilbo in the Mirkwood – with some Elves somewhere having a picknick, singing merry tunes all along.

On the upside, it now copies nicely into rich text apps like Apple Pages and TextEdit – and maybe even Microsoft Word (who knows?) – and there the colored text can be printed or edited further.

You can play with Pīnyīn Tone Vowel Coloring here: alfonsgrabher.com/pinyin-colors

Pīnyīn Vowel Tone Mapping

“While Chinese writing is traditionally understood as logographic and syllable-based, Hànyǔ Pīnyīn is written using the Latin alphabet, letter by letter.”

If this fact is taken seriously, it opens up many new possibilities and areas for creative expression. For example, by casting a fresh light on tone coloring as a tool for education and language study.

In Mandarin, tones are pitch patterns realized on vowels. The vowel is where the tone truly “lives,” as it sustains the sound long enough for the pitch to rise, fall, dip, or remain level. Consonants, by contrast, are typically too short or voiceless to carry pitch and play little to no role in tone perception.

Since tones are phonetically manifested on vowels, it seems reasonable — in an experimental or pedagogical context — to apply visual tone indicators, such as color, specifically to vowels rather than to entire syllables. This approach emphasizes the part of the syllable that actually conveys tonal information and may enhance learners’ sensitivity to tone production and perception.

I played with these ideas focusing on the pedagogical and perceptual side – how learners can visually and intuitively grasp tones. To show how this might look in practice, I put together three visually distinct examples, using Pleco’s standard tone colors for simplicity:

Style 1: Colored vowels inside soft colored rectangles

Style 2: Colored vowels with a soft glow

Style 3: Vowels inside large, colored rectangles

To my mind, this is clean, intuitively readable, and – dare I say – pedagogically brilliant. Simply by coloring the vowels, we visually encode the essence of tone in a way that’s instantly graspable and clearly highlights the tonal components of each word.

Getting the pronunciation and tones right is one of the primary goals for beginner learners of Mandarin. Coloring – and thereby highlighting – the vowels, which carry the tones, feels almost revolutionary. Here are two arguments in favor of what I call “Pīnyīn Vowel Tone Mapping”, or “Pīnyīn Vowel Tone Coloring” (still looking for a name):

Vowel-centric highlighting

We’re staying faithful to the phonetic reality: tones live on the vowel, and we’re giving that fact a visual identity.

Color-coded tonal categories

With a little bit of practice, vowel colors provide an instant way to relate to tone, helping with tone perception and possibly even improving the ability to remember tones.

To conclude this post, instead of just presenting static example text, I’ve turned this idea into an interactive experience. You can try it out here:

Link: alfonsgrabher.com/pinyin-colors

One Latin alphabet, two roads : Sorting differences with PinyinAbcSort

Differences in word order between PinyinAbcSort and the ABC Chinese-English Dictionary

Link to the Github repo: http://github.com/alfons/PinyinAbcSort

In the esteemed ABC Chinese–English Comprehensive Dictionary published by the University of Hawai‘i Press, word order follows a Western alphabetical principle, quite strictly so: entries are sorted primarily by the base letters of the Latin alphabet, with Pīnyīn diacritics merely considered as tie-breakers when the base spellings are otherwise identical.

By contrast, the PinyinAbcSort-algorithm sorts words by the base letters of the Latin alphabet while fully respecting each diacritic as a core feature—allowing tones to shape the order from the very beginning, rather than serving merely as secondary tie-breakers. As a result, the sorting is quite different. For example, a side-by-side comparison:

2. Philosophy of Word List Design

Alphabetization isn’t neutral—it’s a design decision. Languages which use an extended Latin alphabet generally have their own conventions for treatment of the extra letters. Here, too, one road prioritizes ease of lookup for users familiar with English language dictionaries; the other preserves phonological fidelity by respecting the tone marks of Chinese.

  • The ABC dictionary approach is Westernized: prioritizing base-letter order à la English.
  • The PinyinAbcSort-algorithm approach reflects a more Sinophone-conscious logic, aligning with how tones carry semantic weight in Mandarin.

This distinction touches on:

  • Does the order reflect how Chinese is actually pronounced?
  • Should sorting reflect technical accuracy or acedemic tradition?
  • What’s easier or more natural to search for?

3. Personal Commentary

Language is highly personal. The way we speak is part of our identity. Therefore, I don’t think a choice of roads can be made by mere academic reasoning. Here is my personal commentary on why I designed PinyinAbcSort the way it is.

“Children learn what they live. Put kids in a class and they will live out their lives in an invisible cage, isolated from their chance at community; interrupt kids with bells and horns all the time and they will learn that nothing is important; force them to plead for the natural right to the toilet and they will become liars and toadies; ridicule them and they will retreat from human association; shame them and they will find a hundred ways to get even.” — John Taylor Gatto, in his book Dumbing Us Down.

My extreme, extraordinary, and uncompromising passion for Hànyǔ Pīnyīn does not stem from academic study, nor from an early acquaintance with the works of John DeFrancis, Zhōu Yǒuguāng, Yīn Bǐnyōng, and others. These figures are among the most prominent pioneers of Hànyǔ Pīnyīn, and without their vision and perseverance, it would not exist. But I only discovered their names, books and writings after my own path had already taken shape, after I had become deeply and irrevocably committed to the fruits of their labour, Hànyǔ Pīnyīn.

In June 2024, I’ve made a personal decision:

From hereafter I will use Hànyǔ Pīnyīn as a complete replacement for Chinese characters. For me this concerns diary-style writing and casual, leisure reading (novels, interview transcripts, subtitles, etc).

Where does my passion come from?

My passion comes from nearly two decades of being shamed, ridiculed, corrected, and condescended, as a “lǎowài”, a perpetual foreigner, deemed incapable of grasping one fundamental truth: that Mandarin has four tones.

I’ve spent thousands of hours, and thousands of dollars, trying to learn Mandarin in every imaginable setting. And yet, I failed spectacularly, miserably and almost completely, and nearly every teacher I’ve encountered did sing the same refrain:

“Your pronunciation is wrong. Chinese has four tones. You need to learn this first.”

Ironically, the very script they use for teaching Chinese, and rely on themselves — Chinese characters — offers little to no tonal guidance. In contrast, Hànyǔ Pīnyīn not only represents the four tones — it excels at doing so. And yet, most teachers I’ve met cannot read Hànyǔ Pīnyīn with ease. They shy away from it, downplay its significance, try to talk me out of it — even though it encodes tone with precision and elegance.

Anyone who has read the official standard (GB/T 16159–2012) or any major scholarly treatment of Hànyǔ Pīnyīn will know how much care is given to the spelling of names, how meticulously tone rules are applied in personal and place names. And yet, many of the very pioneers of Pīnyīn often spell their own names without diacritics. I find this unacceptable.

Chinese has four tones. This much I have learned. Hànyǔ Pīnyīn spells them out. This is my gospel.

The reason why “PinyinAbcSort” can’t be spelled “PīnyīnAbcSort”

While “PīnyīnAbcSort” would be linguistically accurate, it’s not fit for technical use. Diacritics can break URLs, aren’t allowed in most programming identifiers, and may cause issues in older or non-ASCII-safe systems. Sad, but acceptable—as an exception due to technical limitations. Confucius says: blind passion fades, passion that’s guided by reason endures.

Confucius also says: passion burns like fire, few add tone marks.

Here’s to the alphabet

I’ve been on Github for 17 years now, never bothered to upload or contribute anything. This changed last weekend, or at least, I decided to upload one of my many projects. 😅

This one was quite a wild ride actually. I wasn’t sure if I needed the algorithm to begin with, but I thought, well, “nice to have.”  And in the beginning it looked super straightforward and easy, then for a long time it looked super complicated, and in the end, the final version looks so simple, it almost looks like it was jotted down in a minute.

Four days, and I guess 40 hours of work later, here’s my repo:

https://github.com/alfons/PinyinAbcSort

PinyinAbcSort – Sort Hànyǔ Pīnyīn in alphabetical order (fast)

Description:

This project implements sorting Pīnyīn words into alphabetical word order, based on the rules outlined by John DeFrancis in ABC Chinese-English Dictionary, Page xiii, Reader’s Guide, I. Arrangement of Entries.

The sorting algorithm compares words letter by letter, not syllable by syllable. This approach reflects the fact that Hànyǔ Pīnyīn is written using the Latin alphabet — the key insight and algorithm design choice behind this implementation.

The ordering rules are:

  1. Alphabetical order: Base characters (a–z), compared letter by letter
  2. u before ü, U before Ü
  3. Tones: 0 < 1 < 2 < 3 < 4
  4. Case: lowercase and mixed-case before uppercase
  5. Separators: apostrophe < hyphen < space
  6. Since no rules for numbers 0–9 were given, they were added first. All other characters are appended according to their Unicode value.

Credits:

  • John DeFrancis (1911-2009): Original Pīnyīn alphabetical word order, in passionate acknowledgment of the advocates of writing reform Lù Zhuāngzhāng (陆璋章, 1854–1928), Lǔ Xùn (鲁迅, 1881–1936), Máo Dùn (Shěn Yànbīng, 茅盾, 沈雁冰, 1896–1981), Wáng Lì (王力, 1900–1988) and Lù Shūxiāng (吕叔湘, 1904–1998), and Zhōu Yǒuguāng (周有光, 1905–2017).
  • Mark Swofford of Banqiao, Taiwan: summarised the rules on the internet, and pointed out where to find them.
  • Alfons Grabher: Idea, concept, prompting, testing, and driving the development of pinyinAbcSort.
  • Grok (xAI), ChatGPT 4o: Coding the implementation with flair and precision.

Status update and things learned

I’ve been completely obsessed with work for the past few months, writing on a book about Chinese grammar using romanisation (Pīnyīn), and writing software to be able to actually write in Chinese Pīnyīn. I was working to the point of madness. Only with a short one week holiday in Taiwan to sort out a visa issue, debounce, and catch up with good friends.

Turns out, both my new book and new software are far from finished. What could have finished, or “ended” to use the right term, is my health. So I take a step back, take the time to write a blog post, and summarise what I’ve learned:

1. Going to bed early is important

As Dr. Neal Bernard once pointed out, in his book about hormonal health, going to bed at 10pm seems right for him. Because if he sets his mind to go to bed at 11:30pm then the next thing you know, it’s a half hour past midnight and your whole next day is messed up.

2. Eating early is important

My grandmother used to have her last meal of the day at around 5:00pm, for which she usually had more of a snack, not a meal. Usually that was a slice of old, dark rye bread, with butter on it so thin… well as a kid I always doubted it was worth the effort to go and fetch the butter from the pantry and also the work it took to clean the butter knife. Butter usually sticks to the knife and you need a sponge with liquid soap, and then you need to clean the sponge properly or your hands will be all sticky and smell like you touched a cow. As a kid I was thinking a lot about that.

In hindsight, and from decades of personal experiments, I find a small meal at 4pm helps my sleep and recovery the most.

3. Good things come to those who wait

Growing a book and a software so fast, with daily 4 to 10 hours of adding to it, naturally creates a large, messy code base. And now, of course, I ran into software performance issues. How could I not. The profiler says that my software will perform poorly on slower systems.

And for the book I need to go through all chapters AGAIN and clean up the mess I left behind, too. Making it better, and better.

Maybe it’s also a form of obsessing over details, or trying to achieve perfection. Just for example, I went through 26 iterations for the app’s icon. On the upside, the names for the software and the book came to me naturally, and I’m really happy, excited, and satisfied with them.

Trying to achieve perfection? I say, what else do we have in life, if not the pursuit of happiness, striving for meaning, beauty, harmony, and satisfaction with our creations? Or is this the talk of middle age? To me it really feels good to think about a difficult, structural problem, and sort it out, make the solution beautiful. And there’s the age old question: What have the Romans REALLY ever done for us?

There’s more. But I think this is a good time to end this blog post, or diary page. I’m very happy I didn’t use ChatGPT, or any LLM, for this blog post, for anything, not even for a spell-check. THIS feels amazing, too. It’s all me. My achievements, my mistakes, my own expression of beauty, of coherent, accomplised thinking and writing.

Wish you a great day, and take good care of your health!

New book by Alfons

Testing line heights, fonts and trim sizes. My new book on Hànyǔ Pīnyīn is getting along well. It will still take some time, though. So many things to do. Still getting up around 6-7 am, a bit of somatic movement practice, and then working every free minute until late, around 11-12 pm.

However, creating and working on a self-chosen, self-defined project, as it presents itself, as it is brought to life, organic unfolding—it gives life meaning and purpose, I can definitely feel that. And I do enjoy that feeling.

Input efficiency for novel Pinyin input scheme

Ok, now it’s two days later. Two days with hardly any sleep and constant starring at my computer screen. Which is beautiful, btw, this nano texture screen on my MacBook, I love it so much.

And a few walks in between, to think about how to fix the terrible problems I kept running into. Well… sometimes we need to slow down, so that solutions may come to us. I guess solutions, in their very nature, they also don’t like running targets.

I’ve been working on the numbers based input helper for about 3-4 months now, and I think it’s beautiful, too. I mean my frontend, which I can’t show you just yet.

It’s just… this entire idea of typing text using numbers seems flawed to me. I guess in this sense I’m pretty old school. In my mind and heart numbers and words are from two different realms. Nevertheless, it’s the most efficient input method.

But since I want to type text without using numbers I had to come up with an input scheme that can do just that.

This is why I developed my Twin Tone input scheme. The idea behind this scheme is that there’s no double vowels (twins) in the Chinese language, at least not in the romanised script called Pinyin.

This input scheme allows for writing Chinese (using Pinyin) without having to type numbers. There’s only one drawback: it’s terribly inefficient. A lot more characters have to be typed than result in actual text (depending on the text, there’s close to a 50% overhead).

That’s why I was trying hard to find an optimised, streamlined scheme. And to solve this problem, I used statistics. I had to find out which letter combinations are permissible for using in a Pinyin typing helper scheme.

The result of this work was my optimised Twin Tone input scheme. But the original design was very hard to implement from a software developer’s perspective. Therefore I created a much simplier variant, which I named, Twin Tone Variant (TTV). It’s a lot simpler, and also much easier to learn from a user’s perspective.

Turns out, this works very, very well. I’m very happy with it. And despite the slightly larger overhead, it’s more convenient to use than the numbers based input—since with TTV-Input the fingers can stay on the homerow on the keyboard.

So.

That’s my contribution from a teachers’ and inventors’ perspective. I blurred the input letter combination since it was so much work to find this scheme. Maybe in the future billions of people will use it. I probably should get it patented. But I have no idea how.

If you’re a business angel, investor, or startup incubator, I would like to team up with you; with me being Educational Director or Head of Innovation of a new Startup focusing on Chinese language related projects, all without Chinese characters.

Furthermore, I would like to start a Publishing company in Taiwan (for one, to have a legal basis for the work that follows, and secondly, to get temporary residency and a work permit for Taiwan.) But I’m open to other countries as well. Any help, a point in the right direction, or suggestions for partnering up are very welcome. 🙏😊