Ten years after Ioannidis alleged that most scientific findings are false, reproducibility – or lack thereof – has become a full-blown crisis in science. Flagship journals like Nature and Science have published hand-wringing editorials and revised their policies in the hopes of heightening standards of reproducibility. In the statistical and data sciences, the barriers towards reproducibility are far lower, given that our analysis can usually be digitally encoded (e.g., scripts, algorithms, data files, etc.

Continue reading

Last year I was awarded a Project TIER (Teaching Integrity in Empirical Research) fellowship, and last week my work on the fellowship wrapped up with a meeting with the project leads, other fellows from last year, as well as new fellows for the next year. In a nutshell Project TIER focuses on reproducibility. Here is a brief summary of the project’s focus from their website: For a number of years, we have been developing a protocol for comprehensively documenting all the steps of data management and analysis that go into an empirical research paper.

Continue reading

Check out my guest post on the Simulation-based statistical inference blog: Teaching computation as an argument for simulation-based inference If you are interested in teaching simulation-based methods, or if you just want to find out more why others are, I highly recommend the posts on this blog. The page also hosts many other useful resources as well as information on upcoming workshops as well.

Continue reading

A few weeks ago I gave a two-hour Introduction to R workshop for the Master of Engineering Management students at Duke. The session was organized by the student-led Career Development and Alumni Relations committee within this program. The slides for the workshop can be found here and the source code is available on GitHub. Why might this be of interest to you? The materials can give you a sense of what’s feasible to teach in two hours to an audience that is not scared of programming but is new to R.

Continue reading

In my course on the GLM, we are discussing residual plots this week. Given that it is also Halloween this Saturday, it seems like a perfect time to code up a residual plot made of ghosts. The process I used to create this plot is as follows: Find an icon that you want to use in place of the points on your scatterplot (or dot plot). I used a ghost icon (created by Andrea Mazzini) obtained from The Noun Project.

Continue reading

Are you looking for a way to celebrate World Statistics Day? I know you are. And I can’t think of a better way than supporting the African Data Initiative (ADI). I’m proud to have met some of the statisticians, statisticis educators and researchers who are leading this initative at an International Association of Statistics Educators Roundtable workshop in Cebu, The Phillipines, in 2012. You can read about Roger and David’s Stern’s projects in Kenya here in the journal Technology Innovations in Statistics Education.

Continue reading

The LA Times reported today, along with several other sources, that the California Department of Justice has initiated a new “open justice” data initiative. On their portal, the “Justice Dashboard”, you can view Arrest Rates, Deaths in Custody, or Law Enforcement Officers Killed or Assaulted. I chose, for my first visit, to look at Deaths in Custody. At first, I was disappointed with the quality of the data provided. Instead of data, you see some nice graphical displays, mostly univariate but a few with two variables, addressing issues and questions that are probably on many people’s minds.

Continue reading

Author's picture

Citizen Statistician

Learning to swim in the data deluge