Part of the reason why we have been somewhat silent at Citizen Statistician is that it’s DataFest season, and that means a few weeks (months?) of all consuming organization followed by a weekend of super fun data immersion and exhaustion… Each year that I organize DataFest I tell myself “next year, I’ll do [blah] to make my life easier”. This year I finally did it! Read about how I’ve been streamlining the process of registrations, registration confirmations, and dissemination of information prior to the event on my post titled “Organizing DataFest the tidy way” on the R Views blog.

Continue reading

Last year I was awarded a Project TIER (Teaching Integrity in Empirical Research) fellowship, and last week my work on the fellowship wrapped up with a meeting with the project leads, other fellows from last year, as well as new fellows for the next year. In a nutshell Project TIER focuses on reproducibility. Here is a brief summary of the project’s focus from their website: For a number of years, we have been developing a protocol for comprehensively documenting all the steps of data management and analysis that go into an empirical research paper.

Continue reading

A few weeks ago I gave a two-hour Introduction to R workshop for the Master of Engineering Management students at Duke. The session was organized by the student-led Career Development and Alumni Relations committee within this program. The slides for the workshop can be found here and the source code is available on GitHub. Why might this be of interest to you? The materials can give you a sense of what’s feasible to teach in two hours to an audience that is not scared of programming but is new to R.

Continue reading

In one of our previous posts (Halloween: An Excuse for Plotting with Icons), we gave a quick tutorial on how to plot using icons using ggplot. A reader, Dr. D. K. Samuel asked in a comment how to use multiple icons. His comment read, ...can you make a blog post on using multiple icons for such data year, crop,yield 1995,Tomato,250 1995,Apple,300 1995,Orange,500 2000, Tomato,600 2000,Apple, 800 2000,Orange,900 it will be nice to use icons for each data point.

Continue reading

In my course on the GLM, we are discussing residual plots this week. Given that it is also Halloween this Saturday, it seems like a perfect time to code up a residual plot made of ghosts. The process I used to create this plot is as follows: Find an icon that you want to use in place of the points on your scatterplot (or dot plot). I used a ghost icon (created by Andrea Mazzini) obtained from The Noun Project.

Continue reading

This post is about ggplot2 and dplyr packages, so let’s start with loading them: library(ggplot2) library(dplyr) I can’t be the first person to make the following mistake: ggplot(mtcars, aes(x = wt, y = mpg)) %>% geom_point() Can you spot the mistake in the code above? Look closely at the end of the first line. The operator should be the + used in ggplot2 for layering, not the %>% operator used in dplyr for piping, like this:

Continue reading

The other day on the isostat mailing list Doug Andrews asked the following question: Which R packages do you consider the most helpful and essential for undergrad stat ed? I ask in great part because it would help my local IT guru set up the way our network makes software available in our computer classrooms, but also just from curiosity. Doug asked for a top 10 list, and a few people have already chimed in with great suggestions.

Continue reading

Author's picture

Citizen Statistician

Learning to swim in the data deluge