Rstats

Robert A. Amezquita

6 minute read

Tibbles are beautiful constructs that allow for a new column type, list columns, to exist, capturing bundles of (untidy) data into the structure of a data frame. They are especially great for when you want to do cool things like apply a function to multiple datasets (kept in the aforementioned list column), and generate a new column of (say, tidier) data. One application might be if you want to do a quick t-test for various datasets, possibly even varying the contrasts at each iteration.

Robert A. Amezquita

2 minute read

In the process of working on a new R package, one of the TODO’s on my list was testing it on a new version of R. However, upgrading R is a somewhat dreaded process, as this involves (re)installing all your old packages. While solutions like packrat deal with R package dependencies, this doesn’t seem to work for R upgrades. Another solution involves simply copying the R package library into the new R version’s package library, but this introduces the issue of potential breakage.

Robert A. Amezquita

8 minute read

This is the second part of a series of posts working with an NFL quarterback data, following up after doing some initial cleanup. Here, I’ll focus on how I like to format data for optimal tidyness - the tidy (also known as long) format. A Small Example Typically, when we get a dataset, we’ll see it as a series of columns (variables) with values across many rows (each an observation). This format - the wide format - is certainly amenable for human parsing, and also implies a relationship between a single observation across multiple variables.