R

Data packages for current and future me

tl; dr I show why it is worthwile to put my Chinese-related datasets in packages and how I went about it. Introduction I don’t know if I’m very late to the party, but in this final sprint towards a finished dissertation, I keep finding myself juggling multiple datasets when using them in R. This usually is paired with a readr::read_csv() or related functions, but the drawback is that I have

Rbootcamp 2019

tl; dr Below you find what we did during the Rbootcamp for Lexical Semanticists. In between this paragraph and the contents, there is a bit of my own #Rstory. Warning, many R-puns. Intro Somewhere in the beginning of this semester, I got the request to teach the other people in my lab “the basics of R”, so that they could see what kind of benefits

Tidy collostructions

tl ; dr In this post I look at the family of collexeme analysis methods originated by Gries and Stefanowitsch. Since they use a lot of Base R, and love using vectors, there is a hurdle that needs to be conquered if you are used to the rectangles in tidy data. I first give an overview of what the method tries to do, and then at the end show the

Guanguan goes the Chinese Word Segmentation (II)

tl; dr This double blog is first about the opening line of the Book of Odes, and later about how to deal with Chinese word segmentation, and my current implementation of it. So if you’re only interested in the computational part, look at the next one. If, on the other hand, you want to know more about my views on the translation of guān guā

Guanguan goes the Chinese Word Segmentation (I)

tl; dr This double blog is first about the opening line of the Book of Odes, and later about how to deal with Chinese word segmentation, and my current implementation of it. So if you’re only interested in the computational part, look at the next one. If, on the other hand, you want to know more about my views on the translation of guān guā

Mapping the terminology for ideophones

#Goal The goal for this short update is to use the R package lingtypology (click here for the tutorial), in order to create a map that shows which for which languages we use which terminology relating to ideophones. Now, I know that the data isn’t complete yet. It is an ongoing cataloguing project. You can find the more recent versions of this map on my Github