October 2, 2017



Reading posts

  • Review the existing commentary in a discussion thread before posting

  • Be careful of submitting duplicate questions

  • Be mindful that you aren't answering a question that was already answered by someone else earlier in the thread


Homework due dates

The following penalties apply for most assignments (please note that weekends count as days):

Homework notes

  • Some (but not full) leniency on homework 1 submission times and submitting pull requests

  • Above policy will be strictly enforced on subsequent assignments

  • Be sure you read and understand what each question asks for

  • Do not wait until the evening on the last day to start (you get close to 2 weeks for a reason!)

Finish up Tidy Data

Tidy data demo

Follow along in RStudio

Data Adventure

In recap…

  • We've learned a lot up until this point!

  • We can take a dataset, reshape it, transform it in various ways, and create interesting figures

  • Remaining topics in the pipeline

    • Obtaining data

    • Importing data

    • Data cleanup

  • These remaining topics can be a little dry, and in some ways, they're better learned by practicing

And then what?

  • Once we do all this cleaning and preparing, we move into the Exploratory Data Analysis phase

[Exploratory data analysis is using] visualisation and transformation to explore your data in a systematic way… [by]:

  1. [Generating] questions about your data

  2. [Searching] for answers by visualising, transforming, and modelling your data

  3. [Using] what you learn to refine your questions and/or generate new questions Chapter 7.1, R for Data Science

  • More on that next class

Importing data files

  • Datasets are typically distributed using one of several standardized file formats

  • Flat files

    • comma-seperated values (CSV), tab-seperated values (TSV), fixed-width values
    • Use readr package
  • Proprietary software files

    • SPSS, Stata, and SAS files (read using haven package)
    • excel files (both .xls and .xlsx) (read using readxl package)
  • Databases (interfaces available using the DBI package)

Dataset for extended activity and demos