The dataset in coffee_income_distribution.rds
contains the distribution of yearly incomes of 42 patrons at a college coffee shop. Use this dataset to complete the following exercises:
Create a histogram plot for the entire range of data in this distribution. Use a binwidth of 1000 for this plot.
Take your histogram code from part a and change it so that the viewing limits along the horizontal axis go from 60000 to 70000. Try a binwidth of 1000, and then adjust it for the plot if necessary.
Use geom_boxplot()
to create a box-and-whiskers plot for the dataset. Create a boxplot for the full dataset and a second one for the viewing range in b. Note that, based on how ggplot
implements geom_boxplot
, you will need to use mapping = aes(x = "salary", y = salary))
as your mapping input.
Compute the following summary statistics for this distribution:
Based on the above visualizations and summary statistics, describe the center, shape, and spread of this dataset.
Based on the above visualizations and summary statistics, would the mean or the median best represent what we might think of as a typical income for the 42 partrons at this coffee shop? What does this say about the robustness of the two measures?
Based on the above visualizations and summary statistics, would the standard deviation or the IQR best represent the amount of variability in the incomes of the 42 patrons at this coffee shop? What does this say about the robustness of the two measures?
Using the cumulative distribution functions reading as a guide, construct the empirical CDF of this dataset and plot it.