October 2, 2017

## Annoucements

• Review the existing commentary in a discussion thread before posting

• Be careful of submitting duplicate questions

• Be mindful that you aren't answering a question that was already answered by someone else earlier in the thread

## Annoucements

### Homework due dates

The following penalties apply for most assignments (please note that weekends count as days):

### Homework notes

• Some (but not full) leniency on homework 1 submission times and submitting pull requests

• Above policy will be strictly enforced on subsequent assignments

• Be sure you read and understand what each question asks for

• Do not wait until the evening on the last day to start (you get close to 2 weeks for a reason!)

## Tidy data demo

Follow along in RStudio

## In recap…

• We've learned a lot up until this point!

• We can take a dataset, reshape it, transform it in various ways, and create interesting figures

• Remaining topics in the pipeline

• Obtaining data

• Importing data

• Data cleanup

• These remaining topics can be a little dry, and in some ways, they're better learned by practicing

## And then what?

• Once we do all this cleaning and preparing, we move into the Exploratory Data Analysis phase

[Exploratory data analysis is using] visualisation and transformation to explore your data in a systematic way… [by]:

2. [Searching] for answers by visualising, transforming, and modelling your data

3. [Using] what you learn to refine your questions and/or generate new questions Chapter 7.1, R for Data Science

• More on that next class

## Importing data files

• Datasets are typically distributed using one of several standardized file formats

• Flat files

• comma-seperated values (CSV), tab-seperated values (TSV), fixed-width values
• Use readr package
• Proprietary software files

• SPSS, Stata, and SAS files (read using haven package)
• excel files (both .xls and .xlsx) (read using readxl package)
• Databases (interfaces available using the DBI package)