coding

Tidy collostructions

tl ; dr In this post I look at the family of collexeme analysis methods originated by Gries and Stefanowitsch. Since they use a lot of Base R, and love using vectors, there is a hurdle that needs to be conquered if you are used to the rectangles in tidy data. I first give an overview of what the method tries to do, and then at the end show the

Guanguan goes the Chinese Word Segmentation (II)

tl; dr This double blog is first about the opening line of the Book of Odes, and later about how to deal with Chinese word segmentation, and my current implementation of it. So if you’re only interested in the computational part, look at the next one. If, on the other hand, you want to know more about my views on the translation of guān guā

coding

Tidy collostructions

Guanguan goes the Chinese Word Segmentation (II)

Guanguan goes the Chinese Word Segmentation (I)