Last year I was awarded a Project TIER (Teaching Integrity in Empirical Research) fellowship, and last week my work on the fellowship wrapped up with a meeting with the project leads, other fellows from last year, as well as new fellows for the next year. In a nutshell Project TIER focuses on reproducibility. Here is a brief summary of the project’s focus from their website: As part of the fellowship, beyond continuing working on integrating reproducible data analysis practices into my courses with the use of literate programming via R Markdown and version control via git/GitHub, I have also created templates two GitHub repositories that follow the Project TIER guidelines: one for use with R and the other with Stata.

Continue reading

Check out my guest post on the Simulation-based statistical inference blog: Teaching computation as an argument for simulation-based inference If you are interested in teaching simulation-based methods, or if you just want to find out more why others are, I highly recommend the posts on this blog. The page also hosts many other useful resources as well as information on upcoming workshops as well.

Continue reading

A few weeks ago I gave a two-hour Introduction to R workshop for the Master of Engineering Management students at Duke. The session was organized by the student-led Career Development and Alumni Relations committee within this program. The slides for the workshop can be found here and the source code is available on GitHub. Why might this be of interest to you? The materials can give you a sense of what’s feasible to teach in two hours to an audience that is not scared of programming but is new to R.

Continue reading

This post is about ggplot2 and dplyr packages, so let’s start with loading them: library(ggplot2) library(dplyr) I can’t be the first person to make the following mistake: ggplot(mtcars, aes(x = wt, y = mpg)) %>% geom_point() Can you spot the mistake in the code above? Look closely at the end of the first line. The operator should be the + used in ggplot2 for layering, not the %>% operator used in dplyr for piping, like this:

Continue reading

The other day on the isostat mailing list Doug Andrews asked the following question: Doug asked for a top 10 list, and a few people have already chimed in with great suggestions. I thought those not on the list might also have good ideas, so, with Doug’s permission, I’m reposting the question here. Here is my top 10 (ok, 12) list: (Links go to vignettes or pages I find to be quickest / most useful references for those packages, but if you know of better resources, let me know and I’ll update.

Continue reading

Somehow almost an entire academic year went by without a blog post, I must have been busy… It’s time to get back in the saddle! (I’m using the classical definition of this idiom here, “doing something you stopped doing for a period of time”, not the urban dictionary definition, “when you are back to doing what you do best”, as I really don’t think writing blog posts are what I do best…)

Continue reading

I have gotten several requests for the R syntax I used to analyze the ranked-choice voting data and create the animated GIF. Rather than just posting the syntax, I thought I might write a detailed post describing the process. Reading in the Data The data is available on the Twin Cities R User Group’s GitHub page. The file we are interested in is 2013-mayor-cvr.csv. Clicking this link gets you the “Display” version of the data.

Continue reading

Author's picture

Citizen Statistician

Learning to swim in the data deluge