class: center, middle, title-slide .upper-right[ ![](cds-101-logo.png)<!-- --> ] # Class 22: Foundations for inference II .title-hline[ ## November 13, 2017 ] ### <p>Slides are licensed under the <a href='http://creativecommons.org/licenses/by-sa/3.0/us/'>CC BY-SA 3.0</a> license.</p> --- class: middle, center, inverse # Variability in estimates --- # Pew Research Survey <img src="figures/pew1.png" width="90%" style="display: block; margin: auto;" /><img src="figures/pew2.png" width="90%" style="display: block; margin: auto;" /><img src="figures/pew3.png" width="90%" style="display: block; margin: auto;" /> .footnote[ http://pewresearch.org/pubs/2191/young-adults-workers-labor-market-pay-careers-advancement-recession ] --- # Margin of error <img src="figures/pew4.png" width="90%" style="display: block; margin: auto;" /> * 41% `\(\pm\)` 2.9%: We are 95% confident that 38.1% to 43.9% of the public believe young adults, rather than middle-aged or older adults, are having the toughest time in today's economy. * 49% `\(\pm\)` 4.4%: We are 95% confident that 44.6% to 53.4% of 18–34 years olds have taken a job they didn't want just to pay the bills. --- # Parameter estimation * We are often interested in **population parameters**. * Since complete populations are difficult (or impossible) to collect data on, we use **sample statistics** as **point estimates** for the unknown population parameters of interest. * Sample statistics vary from sample to sample. * Quantifying how sample statistics vary provides a way to estimate the **margin of error** associated with our point estimate. * But before we get to quantifying the variability among samples, let's try to understand how and why point estimates vary from sample to sample. .qa[ Suppose we randomly sample 1,000 adults from each state in the US. Would you expect the sample means of their heights to be the same, somewhat different, or very different? ] -- .answer[Not the same, but only somewhat different.] --- # Mythbusters Statistics An experiment conducted by the *MythBusters*, a science entertainment TV program that aired on the Discovery Channel, tested if a person can be subconsciously influenced into yawning if another person near them yawns. * 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment group) and 16 to a group where there wasn't a person yawning near them (control group). * The results of the experiment are in a file "yawn.csv" posted on the course website: <http://fall17.cds101.com/pages/datasets/> --- count: no # Mythbusters Statistics Open the dataset and either in RStudio or on a piece of paper, fill out a contingency table like the one below: | | Treatment | Control | Total | | :------- | :---------| :------ | :---- | | Yawn | | | | | Not Yawn | | | | | Total | | | | --- # Mythbusters Statistics 1. What is the null hypothesis? -- 2. What is the alternative hypothesis? -- 3. What value of `\(\alpha\)` should we use? -- 4. Should we use a one-sided or two-sided hypothesis test? -- 5. What quantities do we need to subtract to find the observed difference between the yawning rates under the two different groups? --- # Hypothesis test with `infer` Conduct a hypothesis test with `infer`. Is the *Mythbusters* result statistically significant? --- # Confidence interval How to compute the confidence interval for this experiment.