October 2, 2017
Review the existing commentary in a discussion thread before posting
Be careful of submitting duplicate questions
Be mindful that you aren't answering a question that was already answered by someone else earlier in the thread
The following penalties apply for most assignments (please note that weekends count as days):
- First day late, by 11:59pm: -10%
- Second day late, by 11:59pm: -20%
- Third day or later: no credit Policies, CDS-101 course syllabus
Some (but not full) leniency on homework 1 submission times and submitting pull requests
Above policy will be strictly enforced on subsequent assignments
Be sure you read and understand what each question asks for
Do not wait until the evening on the last day to start (you get close to 2 weeks for a reason!)
Follow along in RStudio
We've learned a lot up until this point!
We can take a dataset, reshape it, transform it in various ways, and create interesting figures
Remaining topics in the pipeline
Obtaining data
Importing data
Data cleanup
These remaining topics can be a little dry, and in some ways, they're better learned by practicing
[Exploratory data analysis is using] visualisation and transformation to explore your data in a systematic way… [by]:
[Generating] questions about your data
[Searching] for answers by visualising, transforming, and modelling your data
[Using] what you learn to refine your questions and/or generate new questions Chapter 7.1, R for Data Science
Datasets are typically distributed using one of several standardized file formats
Flat files
readr
packageProprietary software files
haven
package)readxl
package)Databases (interfaces available using the DBI
package)