CHIDEOD

Data packages for current and future me

tl; dr I show why it is worthwile to put my Chinese-related datasets in packages and how I went about it. Introduction I don’t know if I’m very late to the party, but in this final sprint towards a finished dissertation, I keep finding myself juggling multiple datasets when using them in R. This usually is paired with a readr::read_csv() or related functions, but the drawback is that I have

Guanguan goes the Chinese Word Segmentation (II)

tl; dr This double blog is first about the opening line of the Book of Odes, and later about how to deal with Chinese word segmentation, and my current implementation of it. So if you’re only interested in the computational part, look at the next one. If, on the other hand, you want to know more about my views on the translation of guān guā

Bridging phonology, meaning, and written form across time: introducing CHIDEOD, a database of Chinese literary ideophones

With an open source database of ideophones we can address a multitude of issues regarding Chinese ideophones.