Night mode colors for Chinese Pinyin transcription software

I probably shouldn’t work into the wee hours on my hobby project, but in case I do, I finally completed colored pinyin-vowels for night mode. I’m very happy with it 😄 It looks like this:

Also, sometimes I’m quite fond of a transcription. My favourite transcription of the day:

青年一口气讲完,吁了一口气,道:“我是在机场中遇到她,她知道我有事要到日本来,所以才托我传达这句口讯的!”

Qīngnián yīkǒuqì jiǎngwán, xūle yī kǒu qì, dào: “Wǒ shì zài jīchǎng zhōng yùdào tā, tā zhīdào wǒ yǒu shì yào dào Rìběn lái, suǒyǐ cái tuō wǒ chuándá zhè jù kǒuxùn de!”

I assume that the first use of “yīkǒuqì” (一口气) is to say “he finished speaking in one breath”, and thus is a noun phrase functioning adverbially and written together. To my mind, however, the second instance of 一口气 is quite different, “he let out one breath, he sighed” and thus I separated the cardinal number, classifier, noun. Chinese Pinyin, which uses word boundaries, raises many such questions. Questions that don’t seem to exist when writing in one continuous string without word boundaries, without any whitespace in between words, as in Chinese Simplified.

I’m just a language learner, and it still might be wrong, but I feel like I’m about to start to get the hang of it, really close! 🥳🤩

Also, I played a bit with ChatGPT’s Sora app, to create a cover for the transcribed novel. Really, just a hobby, but fun nevertheless!

However, what is a lot more time consuming (weeks and weeks and weeks), is my creation of a Chinese Pinyin-English dictionary for the Amazon Kindle. Argh savage! This is nothing short of an endless nightmare, due to many software issues with the Amazon Kindle and the complexity of the dictionary raw data.

One would think a trillion dollar company like Amazon would put more resources into their e-reader… but I understand that for Amazon the Kindle is primarily a store front, and the reading function is secondary, I’m not the first to complain. Therefore, my entirely novel and out-of-the-ordinary Pinyin-English dictionary might never be fully functional, but progress is made nonetheless!

Transcription Software Update

After almost 5 months of super intense, daily work on half a dozen apps and algorithms (for being able to use Chinese Pinyin, to write and read in Pinyin, to work with text written in Pinyin, and to transcribe Hanzi to Pinyin) I’m kinda burned out.

Now I’m actually using my half finished software suite with its half finished apps, mainly for reading and transcribing text from Chinese characters into Chinese Pinyin. I’m quite enjoying it. I’m reading 3 books in parallel!

The only new thing I did on the software side, I added an (optional) function to completely remove the Chinese characters from the transcription table in my transcription software. This was unexpected, but a big relief. I kinda enjoyed the Chinese Characters on the tiles, but once they were gone… SO MUCH BETTER! It just feels and looks so much lighter, and cleaner.

The main screen seems to be quite well designed now, too:

Oh, and I added a dark mode, now that I actually use my own software for transcription, and sometimes late into the evenings:

However, there’s quite a few adjustments I still need to make. I guess it will cost me two full days to get the main screen finished. And then I can finally work on adding more rules, like formatting of proper nouns, or adding an alternative, better pos tag system.

But at the moment, I don’t have that in me anymore. I’m exhausted. Now I’ll continue reading, and transcribing. Wish you a good day, and all the best with your own projects!

I find the tone marks quite charming

I need to shift my focus more towards Feldenkrais/Somatics again. However, in different news, I finally got a first version of a Pīnyīn dictionary for my Amazon Kindle Paperwhite to work. It’s far from perfect, but it works.

I was trying to read a bit in Chinese Pīnyīn, and even though I now can read everything, I still only understand about 10-20%, depending on the story (with the exception of Peppa Pig, where I understand 70-90%). Chinese is really very different from German, or English.

However, I noticed that I started to miss the diacritics (tone marks) from Hànyǔ Pīnyīn, when I’m reading in English. The tone marks, I find them quite charming. They are not as prominent as in Vietnamese, but also not as sparse as in French. I’ll post a few examples below.

A short remark on spelling and orthography

Some of these examples are older transcriptions of mine, some newer. The older ones were done before I had a working version of my transcription software, and I still had to use ChatGPT, Deepseek, Google Translate etc. to obtain the Pīnyīn. Pīnyīn that is produced by large language models usually has quite many mistakes. These mistakes usually fall into 3 categories:

  1. misspellings — wrong letters, wrong tone marks, wrong uppercase/lowercase,
  2. wrong word boundaries — wrongly taken apart words, or wrongly fused words,
  3. word hallucinations — dropped, completely altered, or added words.

Large language models rely on their training data, and I suspect that—in terms of Pīnyīn orthography—quite big chunks of that is problematic, or outright wrong. Even with front-loading the official rules (GB/T 16159-2012), and reasoning switched on, usually LLMs get the Pīnyīn wrong, and quite stubbornly so. As of now, the only way to reliably produce correct Pīnyīn, is to

  • use a decent transcription software,
  • to know the rules by heart,
  • have practice applying them.

Therefore, disclaimer: I checked the examples several times for spelling mistakes (and fixed all that I’ve found), but there still might be a few. I still have to create a spellchecker (since there is none on the market), and get better at spotting mistakes myself.

🤓 Example 1 – a rare, untranslated novel by the famous writer Ni Kuang (倪匡), from the renowned Wisely Series (衛斯理系列)

Ní Kuāng »Mílù«, Dì-yī zhāng: Qíguài qǐngjiǎn sān dù chūxiàn

Ànxiàle bàngōngzhuō páng, yī xìliè ànniǔ zhōng de yī gè, luòdì chángchuāng qián de chuānglián, jiù zìdòng xiàng liǎng páng fēnle kāilái. Chuāng bōli mǒ de yīchén-bùrǎn, chuānglián yī lākāi, jiù kěyǐ kàndào dàbàn gè chéngshì de jǐngsè.

Wáng Yīhéng de bàngōngshì, zài zhè zhuàng yǐ tā de míngzì mìngmíng de dàshà de dǐnglóu, qīshíbā céng gāo. Tā de bàngōngzhuō, jiù miànduìzhe nà yī fú gāodá sì gōngchǐ, kuān shí’èr gōngchǐ de dà chuāng.

Wáng Yīhéng hěn xǐhuān zuò zài bàngōngzhuō hòu, tòuguò zhège chuāngzi, xīnshǎng zhège Yàzhōu dà chéngshì de jǐngsè, tóngshí xīnzhōng duì zìjǐ duì zhège dà chéngshì yǒu jídà de yǐngxiǎnglì ér zì’ào.

Wáng Yīhéng de shìxiàn, cóng chuāng wài shōu huílái, yòu luò zài miànqián nà zhāng qíguài de qǐngjiǎn shàng. Tā xíguàn de wánnòngzhe jīnzhì de chāixìndāo, yòng dāojiān qīng qiāozhe nà fèn qíguài de qǐngjiǎn.

Qǐngjiǎn néng shǐ Wáng Yīhéng gǎndào qíguài, dāngrán bù shì méiyǒu lǐyóu de. Zhè díquè shì yī fèn qíguài de qǐngjiǎn, Wáng Yīhéng yě bù shì dì-yī cì shōudào tā le.

Shì dì-sān cì le. [..]

🤓 Example 2 – from a novel, a cute love story by Mòbǎofēibǎo (墨宝非宝), which was turned into the widely known TV-series Go Go Squid! (亲爱的,热爱的)

Mòbǎofēibǎo »Mìzhī dùn yóuyú«, Dì-yī zhāng

Xiǎng zhīdào shénme jiào yījiàn-zhōngqíng ma?

Jiùshì zhè yī miǎo.

Jiùshì xiànzài, tā duì miànqián gézhe yī gè guìtái de nánrén yījiàn-zhōngqíng le.

Tóng Nián dītóu, shǒuzhǐ zài jiànpán shàng pīli-pālā de qiāo ya qiāo, míngmíngxiǎng yào jiànlì yī gè xīn mìmǎ, kě, nǎozǐ lǐ què zài pīnmìng de huíxiǎng, qián yī miǎo, tā duì zìjǐ shuō “bāoyè” liǎng gè zì de shíhòu, zìjǐ jiūjìng yǒu méiyǒu duì tā xiào? Hǎoxiàng zuǐjiǎo yǒu shàngyáng? Háishì chún dāi?

Hǎo bù róngyì wánchéngle xīn mìmǎ de shèzhì, tā chōu guòlái yī zhāng zhǐtiáo, chāo shàng zhànghào hé mìmǎ.

“En…… bāoyè cóng shíyī diǎn kāishǐ suàn shíjiān, liù diǎn jiéshù. Zhèlǐ tōngcháng qī diǎn guān diàn, dào qī diǎn yě méi wèntí,” tā bǎ zhǐtiáo fàng zài zhuō shàng, yòng zìjǐ rènwéi zuì kě’ài hǎotīng de shēngyīn, zhuāngzuò wēnróu tǐtiē, hái shāo dài diǎn ruǎn méngméng de yǎnshén, duì tā shuō, “Ā, zhèlǐ,” tā zhǐle zhǐ shēnhòu de guìzi, “hái yǒu fāngbiànmiàn hé yǐnliào, nǐ rúguǒ è le, kěyǐ suíshí jiào wǒ, wǒ kěyǐ gěi nǐ shāo rèshuǐ pàomiàn de.”

Bèi tā fàngdiàn de nánrén, sìhū méi tài rènzhēn tīng, suíbiàn de diǎnle tóu, jiāng zhǐtiáo cóng guìtái shàng názǒu.

Duō yī yǎn dōu bù kàn mòshēng nǚhái……

Juéduì shì gè hǎo nánrén! [..]

p.s. I was thinking some more about the correct Pīnyīn for the title, Mì (honey) zhī (juice) dùn (steamed) yóuyú (squid) and came to the conclusion that the honey and juice must be a compound word, even though not in the dictionary, as compound words often aren’t, und thus be written as »Mìzhī dùn yóuyú«.

🤓 Example 3 – the beginning of Book 1 from the famous Three Body Problem book series (三体) by Liú Cíxīn (刘慈欣), which was turned into a famous Chinese TV-series, and the famous Netflix series

Liú Cíxīn »Sān Tǐ«, Dì-yī zhāng: Kēxué Biānjiè

Wāng Miǎo juéde, lái zhǎo tā de zhè sì gè rén shì yī gè qíguài de zǔhé: liǎng míng jǐngchá hé liǎng míng jūnrén, rúguǒ nà liǎng gè jūnrén shì wǔjǐng hái suàn zhèngcháng, dàn zhè shì liǎng míng lùjūn jūnguān.

Wāng Miǎo dì-yī yǎn jiù duì lái zhǎo tā de jǐngchá méiyǒu hǎogǎn. Qíshí nà míng chuān jǐngfú de niánqīngrén hái xíng, jǔzhǐ hěn yǒu lǐmào, dàn nà wèi biànyī jiù ràng rén tǎoyàn le. Zhè rén zhǎngde wǔdàsāncū, yī liǎn héngròu, chuānzhe jiàn zāngxīxī de píjiā kè, húnshēn yānwèi, shuōhuà cūshēngdàsǎng, shì zuì lìng Wāng Miǎo fǎngǎn de nà lèi rén.

“Wāng Miǎo?” Nà rén wèn, zhíhūqímíng lìng Wāng Miǎo hěn bù shūfu, kuàngqiě nà rén tóngshí hái zài diǎnyān, tóu dōu bù tái yīxià. Bù děng Wāng Miǎo huídá, tā jiù xiàng pángbiān nà wèi niánqīngrén shìyìle yīxià, hòuzhě xiàng Wāng Miǎo chūshìle jǐngguānzhèng, tā diǎnwán yān hòu jiù zhíjiē xiàng wū lǐ chuǎng.

“Qǐng bùyào zài wǒ jiālǐ chōuyān.” Wāng Miǎo lánzhùle tā.

“Ò, duìbuqǐ, Wāng jiàoshòu. Zhè shì wǒmen Shǐ Qiáng duìzhǎng.” Niánqīng jǐngguān wēixiàozhe shuō, tóngshí duì xìng Shǐ de shǐle gè yǎnsè.

“Chéng, nà jiù zài lóudào lǐ shuō ba.” Shǐ Qiáng shuōzhe, shēnshēn de xīle yī dàkǒu, shǒu zhōng de yān jīhū rán xiàqù yībàn, zhīhòu jìng bùjiàn tǔchū yān lái. “Nǐ wèn.” Tā yòu xiàng niánqīng jǐngguān piānle yīxià tóu.

“Wāng jiàoshòu, wǒmen shì xiǎng liǎojiě yīxià, zuìjìn nǐ yǔ ‘Kēxué Biānjiè’ xuéhuì de chéngyuán yǒuguo jiēchù, shì ba?”

“ ‘Kēxué Biānjiè’ shì yī gè zài guójì xuéshùjiè hěn yǒu yǐngxiǎng de xuéshù zǔzhī, chéngyuán dōu shì zhùmíng xuézhě. Zhèyàng yī gè héfǎ de xuéshù zǔzhī, wǒ zěnme jiù bù néng jiēchù le ne?”

“Nǐ kànkàn nǐ zhège rén!” Shǐ Qiáng dàshēng shuō, “Wǒmen shuō tā bù héfǎ le ma? Wǒmen shuō bù ràng nǐ jiēchù le ma?” Tā shuōzhe, gāngcái xījìn dùzi lǐ de yān dōu pēndào Wāng Miǎo liǎn shàng.

“Nà hǎo, zhè shǔyú gèrén yǐnsī, wǒ méi bìyào huídá nǐmen de wèntí.”

“Hái shá dōu chéng yǐnsī le, xiàng nǐ zhèyàng yī gè zhùmíng xuézhě, zǒng gāi duì gōnggòng ānquán fùzé ba.” Shǐ Qiáng bǎ shǒu zhōng de yāntóu rēngdiào, yòu cóng yābiǎnle de yānhé lǐ chōuchū yī gēn.

“Wǒ yǒuquán bù huídá, nǐmen qǐngbiàn ba.” Wāng Miǎo shuōzhe yào zhuǎnshēn huí wū.

“Děngděng!” Shǐ Qiáng lìshēng shuō, tóngshí cháo pángbiān de niánqīng jǐngguān huīle yīxià shǒu, “Gěi tā dìzhǐ hé diànhuà, xiàwǔ qù zǒu yī tàng.”

“Nǐ yào gān shénme!” Wāng Miǎo fènnù de zhìwèn, zhè zhēngchǎo yǐndé línjūmen yě tànchū tóu lái, xiǎng kànkan chūle shénme shì.

“Shǐ duì! Nǐ shuō nǐ—” niánqīng jǐngguān shēngqì de jiāng Shǐ Qiáng lādào yībiān, xiǎnrán tā de cūsú bùzhǐ shì ràng Wāng Miǎo yīrén bù shìyìng.

“Wāng jiàoshòu, qǐng bié wùhuì.” Yī míng shàoxiào jūnguān jímáng shàngqián, “Xiàwǔ yǒu yī gè zhòngyào huìyì, yào qǐng jǐ wèi xuézhě hé zhuānjiā cānjiā, shǒuzhǎng ràng wǒmen lái yāoqǐng nín.”

“Wǒ xiàwǔ hěn máng.”

“Zhè wǒmen qīngchu, shǒuzhǎng yǐjīng xiàng chāodǎo zhōngxīn lǐngdǎo dǎ le zhāohu. Zhè cì huìyì shàng bù néng méiyǒu nín, shízài bù xíng, wǒmen zhǐyǒu bǎ huìyì yánqī děng nín le.”

Shǐ Qiáng hé tā de tóngshì méi zài shuōhuà, zhuǎnshēn xiàlóu le, liǎng wèi jūnguān kànzhe tāmen zǒu yuǎn, sìhū dōu zhǎngchūle yī kǒu qì.

“Zhè rén zěnme zhèyàngr.” Shàoxiào xiǎoshēng duì tóngshì shuō.

🤓 Example 4 – the first page from a book about modern life in China, taken from the famous Chinese digital reading and publishing platform Dòubàn Yuèdú (豆瓣阅读)

Dàshāntóu »Dīsú! Dìngyuè le« Dì-yī zhāng: Shìjiè shì yī gè wǔtái

Xiǎo Mài cízhí le. Gōngsī zhǐyǒu tā yī gè yùnyíng, jiéjiàrì yě yào shàngbān, zìyuàn jiābān méiyǒu gōngzī. Cóngzǎodàowǎn, jiǎnjí shìpín, zuò xuānchuán tú, gēngxīn tuīsòng, shénme dōu shì tā de huó.

Wán yóuxì rènshi de péngyou shuō: “Zhè bù ná nǐ dāng niúmǎ ma? Bùnéng yǒu núxìng! Nǐ děi fǎnkàng a.”

Xiǎo Mài tóuyī cì chángshì fǎnkàng le.

Nà yī tiān, tā zhèng yīrú-jìwǎng, lèi de gēn gǒu yīyàng nǔlì gōngzuò, shàngsi tūrán fālái xiāoxi. Xiǎo Mài de xīn dùnshí bèi nǐngjǐn.

Xiǎo Mài chàndǒuzhe yídòng shǔbiāo, shuāngjī, bìshang yǎnjing, shēn hūxī, zài zhēngkāi, bìxū zuò hǎo xīnlǐ zhǔnbèi zài miànduì.

Shàngsi fālái de bù shì “zhège zài gǎigǎi”, yě bù shì fúwùqì jiétú hé xīn rènwu, ér shì: “Nǐ shàngzhōu yǒu yī tiān méi xiě rìbào, nà tiān jìxiào jiù méi gěi nǐ suànle ò ~ xīqǔ jiàoxun, xià cì zhùyì ba ~”

Díquè, yī fān jìlù, Xiǎo Mài fāxiàn zìjǐ shàng Zhōuwǔ wàngle xiě rìbào. Dāngtiān tā xiàdìngjuéxīn, zhōumò juébù lái gōngsī, yúshì pīnle mìng gànhuó, yīzhí máng dào shí diǎn cái zǒurén. Qīng wán dàibàn rènwu de kuàilè mábìle shénjīng, xiǎngbudào lèjíshēngbēi, bǎ rìbào zhè shìr wàng de yīgān’èrjìng.

Xiǎo Mài dīngzhe shàngsi jùmò de bōlànghào, dīngle hěn jiǔ, hěn jiǔ.

Jiǎrú shì píngshí, Xiǎo Mài dàgài zhǐ huì yī nù zhīxià nù yīxià. Kěshì, zhè yī tiān, bù zhīdào nǎ lái de yǒngqì, tā juédìng yìngqì yī huí, fǎnkàng yī cì.

Xiǎo Mài xiān tíchū cízhí, zài tíchū jiāxīn, bǎichū bù gěi tā zhǎng gōngzī jiù zǒurén de jiàshi. Xiǎo Mài zìrèn yī gè rén néng dǐng sān tóu shēngchǎnduì de lǘ, cóng bù rěshì, jìn gōngsī yǐlái hái méi zhǎngguo xīn. Zhāo xīnrén yě yào chéngběn, dànfán shàngsi jīngshen zhèngcháng, kěndìng huì dāying.

Hěn kuài, tā jiù wèi zìjǐ de fǎnkàng fùchūle dàijià.

Shàngsi bǎ Xiǎo Mài jiàoqù, quànshuō Xiǎo Mài bùyào gǎnqíngyòngshì. Xiǎo Mài jiěshì zìjǐ méi gǎnqíngyòngshì, zhǐshì rènwéi zìjǐ de fùchū pèideshàng gèng gāo de shōurù. Zuìhòu, méi tánchéng.

Xiǎo Mài bèi jùjué de cuòshǒubùjí, tài yìwài le, yìwài de yǒudiǎn shēngqì. Tā yànbùxià zhè kǒuqì, píngjiè zhè qiāng nùhuǒ, yīgǔzuòqì zǒuwánle cízhí liúchéng. Děngdài lízhí zhè duàn shíjiān lǐ, Xiǎo Mài bùzài zìyuàn jiābān le, dàodiǎn jiù huíjiā. Shàngsi gěi tā fā xiāoxi zài yě bù jiā bōlànghào le. Xiǎo Mài huídào jiā, jiǔwéi de chīshàng wǎnfàn, yībiān wán hǎojiǔ méi shàngxiàn de yóuxì, yībiān hé péngyou liáotiān.

“Wǒ jiù bù míngbai le,” Xiǎo Mài dàizhe ěrjī, kuàisù cāozuò juésè fàng jìnéng, “Wèishénme qíngyuàn zài zhāorén yě bù gěi wǒ jiā gōngzī ne?” [..]

🤓 Example 5 – a rewrite of the subtitles from the Peppa Pig TV-Series stories, Episode 9, Daddy Pig lost his glasses

Xiǎo Zhū Pèiqí »Bàba de yǎnjìng bùjiàn le!«

Zhū bàba cháng dàizhe yǎnjìng. Tā dàishang yǎnjìng cái néng kàn de qīngchu. Dànshì dāng Zhū bàba zhāidiào yǎnjìng zhīhòu, jiù shénme dōu kànbuqīng le. Suǒyǐ, duì Zhū bàba lái shuō, zhīdào zìjǐ de yǎnjìng zài nǎr shì hěn zhòngyào de. Kěshì, yǒude shíhou Zhū bàba huì zhǎobudào tā de yǎnjìng…

Zhū bàba yībiān huánkàn yībiān shuō: “Suǒyǒu de dōngxi kàn qǐlai dōu hěn móhu!” Zhū māmā wèn: “Pèiqí, Qiáozhì, nǐmen jiàndàoguo Zhū bàba de yǎnjìng ma?” Pèiqí hé Qiáozhì yīqǐ huídá: “Méiyǒu, Zhū māma.”

Zhū māma jiēzhe shuō: “Ái yā, Zhū bàba méiyǒu yǎnjìng jiù shénme dōu kànbudào.” Méiyǒu yǎnjìng, Zhū bàba gēnběn kànbuliǎo shénme dōngxi. Tā tànxī shuō: “Zhè zhēnshi lìngrén wúyǔ, wǒ shénme dōu kànbudào le!”

Zhū māmā wèn: “Nǐ hái jìde zuìhòu bǎ yǎnjìng fàng zài nǎr le ma?” Zhū bàba huídá shuō: “Yǎnjìng zhè dōngxi, bù yòng de shí, wǒ yībān dōu fàng zài kǒudài lǐ a. Zěnme xiànzài bù zài le?”

Pèiqí jiànyì shuō: “Nàme, Zhū bàba, wǒmen kěyǐ bāng nǐ yīqǐ qù zhǎo yǎnjìng ba!” Zhū māmā diǎntóu zànchéng: “Hǎo zhǔyi, Pèiqí.”

Pèiqí hé Qiáozhì kāishǐ zhǎo Zhū bàba de yǎnjìng le. Pèiqí bǎ bàozhǐ jǔ qǐlai, kànkan xiàmiàn, dànshì Zhū bàba de yǎnjìng bù zài zhèr. Qiáozhì zhǎole diànshìjī de shàngmiàn, dànshì Zhū bàba de yǎnjìng yě bù zài nàr. Ránhòu, tāmen zài chúfáng lǐ zhǎo, zài yùshì lǐ zhǎo, zài lóushàng zhǎo lái zhǎo qù, shénme dìfang dōu zhǎo le. Zuìhòu Pèiqí shuō: “Zhè yǎnjìng zhēnde tài nán zhǎo le.”

Zhū bàba wúnài de shuō: “Ái yā, wǒ xiànzài gāi zěnme bàn?”

Jiù zài zhège shíhou, Pèiqí zuòchūle yī gè zhòngdà fāxiàn. Tā zhǐzhe Zhū bàba zuò de dìfāng, dàshēng hǎn dào: “Nǐmen kuài kàn! Yǎnjìng jiù zài zhèr!” Zhū māmā xiàole xiào, shuō: “Ó, yuánlái tāmen yīzhí zài nǐ de pìgu xiàmiàn!”

Qiáozhì wèn: “Zěnme huì dào nàr qù de?” Zhū bàba huídá shuō: “Wǒ yě xiǎng zhīdào!” Tīng dào zhège, dàjiā dōu xiàole qǐlai.

Pīnyīn Transcription Table Design

“These are the hardest problems humans have every solved in engineering.” – gorklon rust.

I think about this quote by Elon Musk often. I don’t know what it really means, though. I can’t imagine the problems they face at SpaceX. I struggle just to push my 200-pound galvanized steel Feldenkrais table a few inches. I can’t begin to imagine what it takes to push 5,000 metric tons of steel into orbit.

Furthermore, I often think of a quote by Jony Ive, the former Apple lead designer. He said something similar, that they had to solve incredibly hard design problems. And in the end, Apple product users don’t even know that there was anything to solve, because the seemingly simple solution doesn’t show the complexity of the original problem.

A friend once told me to always compare yourself up, never down. Always strive upwards, always reach for the stars. I guess that’s what I’m doing here. Anyways, what I wanted to share is this:

After having pushed pixels around for what doesn’t seem like days, but, like, forever, and then some, and having thought long and hard about how to solve this-and-that problem during several long walks and showers, I finally came up with a design I can agree with.

Looking very much forward to the final implementation, so that I can start using it myself. Not so much looking forward to the implementation work itself; I start to feel tired of day-in-day-out software engineering. On the upside, the vibe coding part of it is quite fun. For once, I’m the one who knocks, lol…

Absolute, utter madness – it’s a thin line between genius

I’m starting to be afraid of my project, and of myself. I just wanted to write a little tool that’s less annoying than Google translate, or ChatGPT, or DeepSeek, when it comes to transcribing Chinese characters into Hànyǔ Pīnyīn. I just didn’t want to see the same rookie mistakes made over and over again, stubbornly made without any hope for improvement, not any time soon anyway.

So I sat down to write this little tool. And with Grok xAi, and a bit of ChatGPT, and a couple of amazing open source libraries (such as rakutenMA and LibreTranslate) it’s actually quite convenient to lay down a bit of code.

But, alas. Lo-and-behold. What a monster I have created! Hundreds of hours of work, thousands of lines of code. Have a look at a snapshot of the user interface:

This is how it looks now. You drop your text written in Chinese characters in the top left, and get Hànyǔ Pīnyīn on the right, and all the tools you need to manipulate and re-arrange the final text. Terribly beautiful. A complex task made ease.

There’s still a few quirks, and quite a few rules to implement and to improve, to make it (almost) compliant with GB/T 16159-2012 — but I already love to use it, and love to use it over anything else.

However… it’s a lost cause from the start: modern text segmentation and tokenisation tools operate with 70 to 98% accuracy. Some sentences come out better than others, before I try to fix the mistakes. But it will never be 100%.

And some years later, when Baidu, Alibaba and Google, etc, finally jump on the Pīnyīn-train, my software will be obsolete. Yet, I started this, fully aware of its transitoriness. A little bit crazy. But beautiful.

A novel Pīnyīn transcription helper

O. M. G. – in my obsessive spree (that rhymes!) I moved on to create yet another tool in the field of Hànyǔ Pīnyīn! Now I have like 7 projects parallel in the cooking, er, kitchen… in the making! …is what I’m saying!

This one was a really tuff cookie. I had many problems to solve. It took me the better part of the entire month. Next to sleeping – and a daily short session of self-Feldenkrais for self-preservation – there was not much else I did. But here I am, here we are!

My transcription tool, even in its early prototype stage, is already on par with Google translate, ChatGPT and Deepseek… Deepseek! of all large language models, in terms of transcription quality — which is not very difficult because they are very sloppy in this regard; nobody seems to care about Pīnyīn orthography, not even the machinese.

I use the Made-In-Japan rakutenMA for segmentation (which works in any browser and is purely javascript based), paired with some simple lookups from creative commons dictionary CC-CEDICT (which is second by a wide margin to the great ABC-Dictionary by Hawaii Press, but it’s all we have for now, and for that I’m grateful), a frequency based table for single character Chinese morphemes, and a smooth working user interface (overall design still needs to be created.)

Next, I will need to focus on my actual work as a Feldenkrais teacher, for the last week of the month, and produce two Feldenkrais videos. Despite my crazy work hours due to my software projects, I still need to pay the rent. The upcoming two videos will probably be the slowest and deepest I’ve produced so far in the past 15 years; I hope to be able to get closer to live-class experience in my videos.

For the theme of the upcoming video, I’ve made my pick in the beginning of the month already (dealing with lingering shoulder pain due to sitting in front of the computer too much). Looking forward to teaching and filming!

And then, next month, I will start implementing the official government rules (GB/T 16159-2012) for Hànyǔ Pīnyīn into my new transcription software, making it the best in the world, second to none!

Also, I’m planing to wrap my colorful Pīnyīn Colors app into a MacOS app, or maybe even an iPad app, register and pay for an Apple developer account (walk of shame), and put it on the Apple App-Store (for free, as most of my work is, so is my plan.)

Ok, now, no time to proof-read, ok, but just one run through, no time Toulouse, I need to get on with it!

Hyperworking Chinese

What a crazy past 2-3 months! Almost every day I was working for up to 10 or even 14 hours on my projects about Hànyǔ Pīnyīn, the official romanization system for Standard Mandarin Chinese. I’ve produced quite a few projects:

  • it started with a grammar book, which then lead to
  • several apps,
  • several typing and input helpers,
  • including a unique, highly innovative typing helper to write in Pīnyīn without numbers and selection boxes,
  • a dictionary filter and dictionary interface,
  • I even put Pīnyīn diacritics on a font,
  • and worked on several algorithms.

Algorithms, algorithms everywhere

For example, my algorithm for coloring vowels (or vowel groups) in HTML containers: what started as a lagging, heavy, cumbersome-to-use prototype (as shown in a previous post), is now a blazingly fast algorithm that can handle tens of thousands of words, on large canvas.

As a surprising side-effect, this unleashed the potential to play with Chinese words and text in an artistic way, in a way hardly ever done before, because the means simply didn’t exist before.

Here’s a list of all 8-letter Chinese words that contain syllables (or words) that sound like »yào« — or at least, all the words listed in the CC-CEDICT dictionary. Sorted with PinyinAbcSort, my algorithm that can sort Hànyǔ Pīnyīn (fast).

This must be the Austrian Viennese blood boiling inside of me, this innovative, unconventional play with language and words, a genetic feature that suddenly, unexpectedly, rose up to get hold of me and my senses.

And I can’t stop. It’s like writing a novel. The story unfolds. The story pushes me, the writer, forwards, manically, unceasingly, like in a fever dream. But instead of plots and chapters, I produce product designs and prototypes.

But to what ends? Where does this lead to? This is becoming far too big for me.