Nothing in science has any value to society if it is not communicated, and scientists are beginning to learn their social obligations. Anne Roe, The Making of a Scientist (1953)
If you cannot - in the long run - tell everyone what you have been doing, your doing has been worthless. Erwin Schrodinger (Nobel Prize winner in physics)
The greatest value of a picture is when it forces us to notice what we never expected to see. John Tukey (Mathematician, recipient of National Medal of Science)
Numbers have an important story to tell. They rely on you to give them a clear and convincing voice. Stephen Few (Founder of Perceptual Edge, author of Show Me the Numbers)
Visualizations act as a campfire around which we gather to tell stories. Al Shalloway (Founder and CEO of Net Objectives)
Source: Digital Image, AP photo used on Business Insider, Accessed September 10, 2017, http://www.businessinsider.com/the-first-iphone-2013-12
Source: The Fallen of World War II
Source: The Ebb and Flow of Movies - Box Office Receipts 1986–2008 - Interactive Graphic - NYTimes.com
The Challenger disaster, January 28th, 1986
The Space Shuttle Challenger broke apart 73 seconds into flight, all seven crew members died
The rubber O-rings, which held the rockets together, had failed due to the low temperatures (below 30°F)
Engineers at Morton Thiokol, who supplied solid rocket motors to NASA, warned about this on January 27th, 1986 in a conference call
NASA and the managers at Morton Thiokol overruled their concerns, unpersuaded by the engineers
Source: Figure 2.18(a) in Modern Data Science with R by Benjamin Baumer, Daniel Kaplan, and Nicholas Horton
Source: Figure 2.17 in Modern Data Science with R by Benjamin Baumer, Daniel Kaplan, and Nicholas Horton
Source: Figure 2.18(b) in Modern Data Science with R by Benjamin Baumer, Daniel Kaplan, and Nicholas Horton
Book by Darrell Huff, published in 1954
Aside: The title is tongue-in-cheek and is usually misunderstood. The book is not about “fudging the numbers” with statistics.
Illustrates ways that visualizations can be manipulated such that they are misleading, but technically show accurate information
General method: Violate conventions and expectations
Context: Florida passed a “Stand Your Ground” law in 2005
Advocates claimed it would reduce crime, opponents argued it would increase use of lethal force
If you wanted to use data to answer this question, and you came across this graphic published by the news organization Reuters, what would you conclude?
Source: Nasa Goddard Institute for Space Studies
Temperatures from the 1800s and onward were recorded using thermometers at various locations around the globe, and by the 1880s thermometers had become precise. Systematic measurements began around the mid-1800s at various army posts, and in 1891 the National Weather Service was formed to continue the effort.
Source: National Oceanic and Atmospheric Administration, “How do we observe today’s climate?”
A quantity, quality, or property that you can measure.
The state of a variable when you measure it. The value of a variable may change from measurement to measurement.
A set of measurements made under similar conditions (you usually make all of the measurements in an observation at the same time and on the same object). An observation contains several values, each associated with a different variable.
A set of values, each associated with a variable and an observation.
Data that is a number, either an integer (whole numbers) or a float (real numbers). This kind of data is collected from device sensors, through counting and polling, outputs of computational simulations, etc.
Groups observations into a set. Categories can be in text form (strings or characters), for example brand names for a certain kind of product, or numerical, for example labeling city districts by numbers.
Plain text that is too varied to be treated as a category. Some examples can be full names, the text of a literary work, tweets, etc.
Open up your github-class-demo-username
repository. Create a new file named class4demo.Rmd
. At the top, put:
---
title: mpg dataset demo
---
Commit and push!
mpg
datasetlibrary(tidyverse)
mpg
manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class |
---|---|---|---|---|---|---|---|---|---|---|
audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | auto(av) | f | 21 | 30 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | manual(m5) | f | 18 | 26 | p | compact |
…
Plot each car’s highway fuel efficiency (hwy
) as a function of the engine size (displ
):
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
Add color = class
inside the aes()
piece, what happens?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
geom_point(mapping = aes(x = displ, y = hwy, size = class))
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
cyl
and trans
.