Chengyu Wordle Is Not a Love Story

From The New York Times article “Wordle Is a Love Story”, it began as a sweet love story of Josh Wardle building a game for his partner. It became super popular and now has its own Wikipedia page.

Wordle brag on social media

I first saw it on my Twitter timeline when people keep posting their Wordle results. I was honestly annoyed because they were just bragging and the results didn’t really tell much. At that time, I didn’t play Wordle yet, so I pretty much ignored them. They kind of pollute the social media timelines, but at least they're better than ads. 🤷‍♂️

Due to peer pressure, I started playing it.

I also came across a few “forks” of Wordle:

I'm like, “Hey, that's pretty cool, but maybe I could try something else?”.

On 21 January 2022, I tweeted:

Wordle, but for Chinese idioms.

#idea

This was timely, because Chinese New Year was coming up.

I looked up some data from a few Google searches and realised that this is kind of doable.

Finding the dataset

For Wordle, one of the rules of the game is that each guess must be a valid 5-letter word. So Wordle needs a huge list of valid 5-letter words in its database.

For Chinese idioms, I would need the same thing and I found a huge list of idioms from pwxcoo/chinese-xinhua repository. They are the Chinese Xinhua Dictionary Database (中华新华字典数据库) with a total of 31,648 idioms, where roughly 29,502 of them are 4-letter idioms.

That’s a lot.

I don’t think anyone will be able to remember that many idioms. I probably can filter out some common or well-known ones but I don’t really know how to do that.

A little googling lead me to another dataset from thunlp/THUOCL repository. It’s THUOCL which stands for THU (Tsinghua University) Open Chinese Lexicon (清华大学开放中文词库).

It has roughly 8000+ idioms and contains word frequency statistics DF value (Document Frequency).

To be honest, I don’t know exactly how the frequency statistics are done. This is really not my domain expertise so I think I’ll just use whatever dataset available and focus on the game coding side of things instead 😂.

Building the game

Generally speaking, the idea is not new.

pinyincaichengyu.com, showing the “How to play” section

There’s already the very popular 拼音猜成语 (Pīnyīn cāi chéngyǔ) made by Li Zhong, which uses pinyin input to guess Chinese idioms. It’s using the same QWERTY keyboard as Wordle, but with more than five letter tiles. It validates the words individually, generally the pinyin spelling per Chinese character, so you can’t just randomly type invalid pinyin and throw in all the vowels 😂.

I had a different idea.

Instead of constructing the pinyin to form an idiom, I feel that it would be better if there’s more focus on the Chinese characters themselves instead of alphabets.

JinGen joked that the hypothetical keyboard would be massive 🤣:

Wordle mock-up, with a massive Chinese keyboard

Of course, Chinese keyboards don’t really look like that. On iOS, there are two kinds; pinyin keyboard and handwriting recognition.

I don’t know, I find them too much work, just to play a game.

In Wordle, a player tap 5 keys to get a 5-letter word. For Chinese characters, a player need to tap quite a lot of keys just to form one character. For example, the word “pinyin” consists of 6 alphabetical letters to form 2 Chinese characters (拼音).

So I want one tap = one character. But JinGen’s mockup above is ridiculous.

I thought about this. 🤔

Instead of showing all characters like JinGen’s mockup, why not a subset of them? Maybe roughly 20 characters? (QWERTY keyboard has 26)

iOS keyboard suggestions interface

Instead of the full keyboard, why not emulate the suggestions a.k.a. Predictive Text at the top of the keyboard?

So for every idiom, I could generate a list of 20 characters. It’ll be minimum 20 characters and they could form a minimum of 6 idioms, because there are 6 chances. And it’s not just minimum 6 possible idioms, it’s minimum 6 possible “high-frequency” idioms, which could later form a few more “low-frequency” idioms.

At this point, I have no idea if this would work at all.

I coded it anyway 🤷‍♂️.

The first prototype looks like this:

Chengyu Wordle first prototype

Not bad.

I use the HTML ruby annotation element to display the pinyin above the Chinese characters. The specifications may have changed slightly but Hui Jing has explained them in depth.

Later on, I found that I’m not the first person who came up with this idea. There’s another Android-based game that shows a list of characters as the keyboard:

[疯狂猜成语安卓版 (Fēngkuáng cāi chéngyǔ ānzhuō bǎn)](http://www.ddooo.com/softdown/44315.htm)

It has 24 keys and even has an image illustration as hints 🤯.

That’s quite impressive if the game really has illustrations for every idiom. I mean, imagine hiring an artist drawing hints for all thousands of idioms 🤯.

Anyway, I wasn’t deterred from this and decided to keep going with my implementation.

Adapting to the viewports

Wordle is actually responsive for different viewport dimensions.

Responsive Wordle, in 3 viewport sizes
  • The letter tiles will shrink in smaller viewports.
  • The letter tiles will grow in larger viewports, but there’s a maximum width and height, so that the letters won’t be too big.
  • The keyboard will not shrink in smaller viewports because it’s where people type. No one likes small keypads.
  • The keyboard will grow in larger viewports but there’s a maximum width and height too.
  • In larger viewports, the keyboard will always stick to the bottom and there’ll be blank spaces around the letters board.

Wordle uses a combination of CSS Flexbox and JavaScript code for this magic.

I manage to re-implement this, with pure CSS.

Few tricks were used such as the aspect-ratio CSS property combined with max-height on the tile rows.

Responsive Chengyu Wordle, in 5 viewport sizes

Unlike Wordle, the keyboard doesn’t need to be exactly 3 rows or match how a software keyboard looks like on mobile operating systems. In even smaller viewports, with smaller heights, the keyboard becomes a one-liner and horizontally scrollable!

Crazily enough, it even works on Apple Watch! 🤯

Chengyu Wordle inside a browser on Apple Watch

Keyboard shortcuts

Despite showing a software keyboard, Wordle also works with the hardware keyboard, which is usually meant for desktop players.

I think there are some Chinese (hardware) keyboards out there but I’ve never used them. I always use my Macbook’s QWERTY keyboard.

I added some basic keyboard shortcuts.

Pressing a letter will match the first letter of the pinyin. For example, pressing ‘Q’ will choose ‘qián’.

If the letter has many matches, pressing right or left arrow will cycle through them. For example, for ‘J’, the arrow keys will cycle through ‘jiàn’ ‘jìn’ and ‘jīng’.

Pretty neat.

Beta testing

Since I have no idea if this would work at all, I asked around for people to help beta test the game.

Surprisingly feedback has been quite interesting. Somehow the game becomes either too easy or too difficult for the players. There’s hardly a middle ground here 😅. So it’s either the players knows a large vocabulary of idioms or just one or two of them 😂.

Along the testing period, there were some small issues.

Pinyin

Pinyin for some characters were wrong. My initial attempt was quite hacky, so I solved it by using the whole pinyin library, currently maintained by hotoo (闲耘™).

I was honestly surprised that pinyin mapping doesn’t work one-to-one, but one-to-many. 🤔

Pinyin mapping, from individual pinyin characters to many Chinese characters

However, more than a week later, someone reported a bug to me:

Player feedback on Twitter Direct Message about a wrong pinyin. The message includes a screenshot and a message “hi this word should be read as ‘mou’ not ‘liao’”

I was quite surprised to see this and turns out, this is heteronym.

Some Chinese characters are spelled and written the same way, but pronounced differently based on certain context.

ZDIC web site showing the multiple pinyin for 缪

I looked again at the documentation for pinyin package and realised that it serves a different dictionary for the web version compared to its Node version. The web version is simpler and smaller in file size. The author encourages that this pinyin conversion should be done in the back-end instead of front-end 🤔.

Hmm, for this game, I don’t plan to do this on the server-side though.

So I tried a few ways to somehow force the library to use the Node version but to no avail. I ended up using a different library called pinyin-pro, currently maintained by Zhou Li Xiang. It seems to have the whole dictionary and able to handle heteronyms too!

After some testing and a bug fix, I manage to implement a pretty good implementation of a multiple-pinyin interface.

The pinyin shown on the keyboard will be based on the whole idiom context instead of per-character.

Technically, pinyin('缪') returns (from pinyin-pro library) or liáo (from the previous pinyin library). But pinyin('未雨绸缪') should correctly return wèi yǔ chóu móu (notice that it’s móu now). So the pinyin on the keyboard should show móu.

Chengyu Wordle showing keys “罗” (luó), “缪” (móu) and “平” (píng)

But for some games, it’s possible that one character exhibits different pronunciations for different idioms. An example would be the character ‘‘ in the idioms 恶性循环 (è xìng xún huán) and 痛深恶绝 (tòng shēn jué). In this special case, the keyboard will show both pinyins.

Chengyu Wordle showing keys “蹈” (dǎo), “恶” (è or wù) and “规” (guī)

Then what happens to the pinyin on the game tiles? Well, they’ll dynamically change based on the whole constructed idiom as-you-type! 🤯

Idiom definitions

I added an extra link that points to the definition or explanation of the hidden idiom. They will be shown in the results dialog when a player won or lost the game.

The first candidate was CC-CEDICT. It’s a project to create an online, downloadable public-domain Chinese-English dictionary.

Idiom definition on CC-CEDICT

Unfortunately, it doesn’t seem to have all the idiom entries, and mostly return not-found results 😢.

Then, one of the testers found Baidu Hanyu (百度汉语).

Idiom definition on Baidu Hanyu

Seems better, but some idiom entries were missing too 😅.

Then, I found ZDIC from Li Zhong’s tweet, as he’s also adding idiom definitions on pinyincaichengyu.com.

Idiom definition on ZDIC

In the end, I use both Baidu and ZDIC links for the sake of completeness.

I’ve even included the definitions into the game itself.

Chengyu Wordle’s dialog showing the player has won with inline idiom definition

Chinese font

During the beta test period, the game uses the Ma Shan Zheng font from Google Fonts. It looks pretty nice.

One of the testers found a bug:

Ma Shan Zheng font bug, incorrectly rendering a Chinese character

The character “丼” is rendered as “弗”?!?

This is really weird. It’s probably due to the font not supporting all glyphs, so I researched a little and found that it’s really difficult to find a good custom (and free) Chinese font. Even when I found one, the web font file sizes are seriously huge! 😅

There are some very detailed articles on this:

As a quick fix, I removed the font. Will probably revisit this later.

Start date for daily idioms

On the original Wordle, a new word is available every day. It’s not timezone-specific and actually relies on the user’s device timezone (local timezone).

This is an interesting thought. Timezone specificity is not important here for this game, unlike an app or a web product. All players can start playing the next new word around the clock instead of everyone around the globe waiting for that specific hour in the day, which could be 4AM or something. It might seem “unfair” for some players that some could play a new word earlier than players in the other timezone, but it doesn’t really matter as long as players only post the results without spoilers. For the more technical folks, they could actually skip to next day or any days by changing their operating system’s clock 😆.

From the implementation side, it could be a list of words and every word is assigned to specific dates. The question is how are they assigned? Off the top of my head, I think that the first day of the year could be the first word, 2nd day is the 2nd word and so on, but that means it’s limited to 365 words (year 2022)? Should it overflow to next year, so 1st January 2023 will serve the 366th word?

During the time when I build Chengyu Wordle, it’s already a few days after 1st January 2022, so setting the first idiom to day 1 would mean no one will be able to play it 😅.

My curiosity lead to digging in the code in Wordle.

Wordle‘s JavaScript source code, showing a hard-coded start date and the game index calculation

Aha! The variable Ha is a hard-coded start date! It’s 19 June 2021! (Months are zero-indexed in JavaScript)

function Na(e, a) {
  var s = new Date(e)
    , t = newDate(a).setHours(0, 0, 0, 0) - s.setHours(0, 0, 0, 0);
  return Math.round(t / 864e5)
}

There’s another function Na that accepts two date values, which are supposedly the start date and today’s date, and returns the index. So if today’s date is 19 June 2021, it returns 0 (first word). 20 June 2021 returns 1 (second word) and so on.

function Da(e) {
  var a, s = Ga(e);
  return a = s % La.length,
  La[a]
}

The next function Da is a surprising find for me. It will cycle back to 0 once the index reaches the last word of the list. For example, if the game has a list of 100 words, what happens on the 101st day? Well, this logic will serve back the first word!

I think this is a pretty neat solution. It solves the problem of what if you’ve missed the last few days of words and thought that you can’t play them anymore. Once the list of words is exhausted, you’ll go back to the beginning of the list again ♻️.

For Chengyu Wordle, it started without this daily game feature. To compensate, I added a button that allows players to play a random idiom. Few days after testing, I added this daily game feature and set day 1 to 27 January 2022.

Mathematically speaking, with 8000+ idioms in Chengyu Wordle, if a player plays one idiom per day, it’ll take more than 21 years to finish all the idioms. 🤯

Evaluation rules

Guessing words on Wordle will show the letters in coloured tiles.

  • Green = The letter is in the word and in the correct spot.
  • Yellow = The letter is in the word but in the wrong spot.
  • Gray = The letter is not in the word in any spot.

They might look simple at first, but oh boy, almost everyone fell into this trap.

It’s explained in this page written by Alex Selby: The best strategies for Wordle:

I think the colour scoring rules work like this, but would be happy to be corrected.

Let's refer to your guess as testword, and the secret word as hiddenword. Then I believe the scoring rules are these:

  • First determine all greens, and cross out these letters in testword and hiddenword.
  • Then from left to right in testword see which letters correspond to a letter in hiddenword. If you find one, then it's a yellow, but you need to cross it off in hiddenword.
  • Remaining positions are scored as black.

The second point means that you can't reuse letters in hiddenword, so for example the total number of green 'T's and yellow 'T's in testword can't exceed the number of 'T's in hiddenword.

So if hiddenword is HOTEL then the testword SILLY would score BBYBB: the second 'L' does not score because only one 'L' can, and the earlier one takes precendence.

But if hiddenword were DAILY then the same testword SILLY would score BYBGG. This time it's the first 'L' that doesn't score, because even though it is earlier than the 'L' in fourth position, greens take priority over yellows.

The trap lies in repeated letters in a word.

Instead of just a single loop to mark which letter is green or yellow, the logic needs to keep track of repeated letters, loop through again and either mark some letters yellow or not.

Initially my code handled the above cases and I was quite proud. Until someone reported another bug:

Chengyu Wordle bug, showing only one character in yellow tile, instead of two

On the second row, both of the character “楚” (chǔ) should be in yellow because the hidden idiom is 清清楚楚 (qīng qīng chǔ chǔ)!

I was quite dumbfounded, got fed up and started writing tests.

Chengyu Wordle tests, in JavaScript, for idiom states; testing every possible cases for highlighting the green, yellow and gray tiles

My logic fails when there are repeated letters in both the testword and hiddenword. And now with this tests, this will finally be fixed once and for all!

Finishing touches

Chengyu Wordle banner, for the social media previews

It’s almost done, and I added a little sprinkle of finishing touches to the game:

  • Metacrap: All the titles, descriptions, icons and social media preview images.
  • Idiom IDs: Besides allowing players to play a random idiom, each idiom is also assigned its own unique ID that’s shareable. So if someone shares the results, a link below the results will include the ID that goes to this exact idiom instead of the idiom of the day.
  • Hints: I added a “I’m stuck” button at the bottom of the keyboard, after seeing some folks struggled hard at the game. Every time it’s tapped, it’ll show a different hint. Technically, the hints should be good enough to be unstuck. 😉
  • Hard mode: While some players are having a hard time playing the game, some actually think that it’s too easy. So, I introduced “Hard mode”, which works a bit differently than Wordle’s. In Wordle’s “Hard mode”, any revealed hints must be used in subsequent guesses. In Chengyu Wordle, the number of keys increase from minimum 20 to 40, thus increasing the number of possible idiom guesses in a game.
  • Localisation: The game is available in both English and Simplified Chinese. The strings for Simplified Chinese were hastily translated via Google Translate, so help is pretty much needed. 🙏
  • Old browsers support: I use Vite to bundle the code and needed to use @vitejs/plugin-legacy to create production builds for legacy browsers. For some reason, there are a lot of folks using really old Android phones with old browsers, as the errors are tracked via Bugsnag.
  • Analytics: I took this chance to try a few privacy-friendly analytics like Panelbear, Pirsch and Plausible. In the end, I end up using Plausible for its better user interface and features.

The launch

Chengyu Wordle on iPhone screenshot

I officially launched Chengyu Wordle on 27 January 2022, just one week after my first tweet and 4 days before Chinese New Year! 🚀

It’s open-sourced on GitHub, with complete documentation on the dataset, technical setup and localisation files.

I’ve also made sure to attribute to other similar attempts by other talented folks!

Few days later, I made a pull request for Chengyu Wordle to be added to Wordles of the World, the most comprehensive list of Wordle-like games and resources online.

Later on, I added dark mode and made this pretty cool 3D-looking mock with Morflax Things.

Chengyu Wordle in light and dark mode on two iPhones mock-up

Chengyu Wordle got a quite few media coverage too! Some are very small mentions while some are quite prominent 😎.

Honestly I haven't felt so excited in a long while!

The craziest ever mention has got to be this:

Facebook post by Lee Hsien Loong, with the content “Many Singaporeans have joined in the fun of playing Wordle - an online word puzzle game, where the goal is to guess a five-letter word within six tries. This game has become such a viral sensation that it was recently purchased by The New York Times. I’ve even noticed some ministries and agencies using the green, yellow and grey squares to liven up government messaging! I was pleasantly surprised to find out that there are Chinese and Malay versions of the game. Give it a go – a simple but fun way to keep the mind engaged. Original https://www.powerlanguage.co.uk/wordle/ Chinese: https://cheeaun.github.io/chengyu-wordle/ Malay: https://www.projecteugene.com/katapat.html – LHL”

On 3 February 2022, The Prime Minister of Singapore, Lee Hsien Loong, made a post on Facebook, talked about how Singaporeans have joined in the fun of playing Wordle, linked to BBC’s article, and mentioned the original Wordle, the Chinese version (mine) and the Malay version! 🤯🤯🤯

When this happened, I was just finishing up my dinner and got quite a bunch of notifications from friends tagging me and congratulating me 😅.

Traffic shot up, from 7,000 visitors to 15,000 on the day itself and grew to 25,000 on the next day! 📈

Visitors chart shot up on 3 February 2022 when Lee Hsien Loong mention the game on Facebook. Chart is from Plausible.

Learnings

Wordle tiles in 3D

I have a lot of fun building this game.

From trying to reverse engineer the mechanics of Wordle, to listening to how Josh Wardle made a prototype back in 2013. Instead of rebuilding the exact same clone, somehow I tried to build a different variant of it, with its own set of challenges that I didn’t know much about. I spent a lot of time polishing every single detail while trying to preserve the whole unobtrusive nature of Wordle. Perhaps similar to the original author, I did not expect this to become huge at all, though mine is still smaller scale compared to the original Wordle.

By building a variant of Wordle, I felt like I’m part of a whole family of developers who have done the same thing and experienced the same thing as I do. The author of Subwaydle mentioned that he came across Chengyu Wordle and Nerdle which made him wanting to make his own.

I see a lot of people posting their Chengyu Wordle results on Twitter and Facebook, together with their Wordle results. On Facebook, I saw a woman posting about playing Chengyu Wordle and it took her 3 kids on dictionaries, 4 clues, 45 minutes and a translation to traditional Chinese to solve their first play. Even a teacher replied to me that the game is very popular among teachers, very well received and shared among their Facebook groups.

This is not the first time I create a web game. Chengyu Wordle is the third one after my first web game, Pentagoo and second game, Bubble Wrap. Both didn’t get popular but were very fun to build too. They gave me some good foundation on how to build games, though the funny thing is I’m actually not very good in playing them. 🤣

Yeah, I built Chengyu Wordle, but I can’t play it because I can't actually read or write Chinese 😅. I can speak Mandarin though, and the pinyin was actually to help me, which in turn help others too. 🤷‍♂️

But heck, who cares, building a game is fun!

All of this happens within just two weeks.

Wow, what a ride.