One of the themes of this blog is to make statistics relevant and exciting to students by helping them understand the data that’s right under their noses.   Or inside their ears.  The iTunes library is a great place to start.

For awhile, iTunes made it easy to get your data onto your hard drive in a convenient, analysis-ready form. Then they made it hard.  Then (10.7) they made it easy again. Now, in 11.0, it is once again ‘hard’.  Prior to version 11.0, these instructions would do the trick: Open up iTunes, Control-click on the “Music” library and choose Export.  And a tab-delimited text file with all iTunes library data appears.

Now, iTunes 11.0 provides only an xml library.  This is a shame for us teachers, since the data is now one step further removed from student access. In particular, it’s a shame because the data structure is not terribly complex—a flat file should do the trick. (If want the xml file, select File>Library>Export.)

But all is not lost, with one simple work-around, you can get your data. First, create a smart playlist that has all of your songs.  I did this by including in the list all songs added before today’s date.  Now control-click on the name of the playlist, and choose Export.  Save the file wherever you wish, and you now have a tab-delimited file. (It does take a few minutes, if your library is anything near the size of my own. Not bragging.)

So now we can finally get to the main point of this post.  Which is to point out that almost all of the datasets I give my students, whether they are in intro stats or higher, have a small number of variables.  And even if not, the questions almost all involve using only a small number of variables.

But if students are to become Citizen Statisticians, they must learn to think more like scientists.  They must learn how to pose questions and create new measures.  I wonder what most of our students would do, when confronted with the 27 or so variables iTunes gives them.  Make histograms?  Fine, a good place to start.  But what real-world question are they answering with a histogram? Do students really care about the variability in length of their song tracks?

I suggest that one interesting question to ask students to explore is to see if their listening habits have changed over some period of time.  Now I know younger students won’t have much time to look back across.  But I think this is a meaningful question, and one that’s not answered in an obvious way.  More precisely, to answer this question requires thinking about what it means to have a ‘listening habit’, and to question how such a habit might be captured in the given data.

I’m not sure what my students would think of.  Or, frankly, what I would think of.  At the very least, the answer to any such question will require wrestling with the date variables and so require some work with dates.  Some basic questions might be to see how many songs I’ve added per year. This isn’t that easy in many software packages, because I have to loop over the year and count the number of entries. Another question that I might want to know: What proportion of songs remain unplayed each year? (In other words, am I wasting space storing music I don’t listen to?)  Has the mix of genres changed over time, or are my tastes relatively unchanged?

Speaking of genres…unless you’ve been really careful about your genre-field, you’re in for a mess. I thought I was careful. but here’s what I’ve got (as seen from Fathom):

If you want to see questions asked by some people, download the free software SuperAnalyzer (This links to the Mac version via CNET).  Below is a graph that shows the growth of my library over time, for example. (Thanks to Anelise Sabbag for pointing this out to me during a visit to U of Minnesota last year, and to Elizabeth Fry and Laura Ziegler for their endorsements of the app.)

And the most common words in the titles:

So let me know what you want to do with your iTunes library. Or what your students have done.  What was frustrating? Impossible? Easier than expected?