Based on Chapter 8 of ModernDive. Code for Quiz 12.
Load the R packages we will use.
Replace all the instances of ???. These are answers on your moodle quiz.
Run all the individual code chunks to make sure the answers in this file correspond with your quiz answers
After you check all your code chunks run then you can knit it. It won’t knit until the ??? are replaced
Save a plot to be your preview plot
Look at the variable definitions in congress_age
What is the average age of members that have served in congress?
Set random seed generator to 123
Take a sample of 100 from the dataset congress_age
and assign it tocongress_age_100
set.seed(123)
congress_age_100 <- congress_age %>%
rep_sample_n(size=100)
congress_age is the population and congress_age_100 is the sample
18,635 is number of observations in the population and 100 is the number of observations in your sample
Construct the confidence interval 1. Use specify to indicate the variable from congress_age_100 that you are interested in
Response: age (numeric)
# A tibble: 100 × 1
age
<dbl>
1 53.1
2 54.9
3 65.3
4 60.1
5 43.8
6 57.9
7 55.3
8 46
9 42.1
10 37
# … with 90 more rows
2. generate 1000 replicates of your sample of 100
Response: age (numeric)
# A tibble: 100,000 × 2
# Groups: replicate [1,000]
replicate age
<int> <dbl>
1 1 42.1
2 1 71.2
3 1 45.6
4 1 39.6
5 1 56.8
6 1 71.6
7 1 60.5
8 1 56.4
9 1 43.3
10 1 53.1
# … with 99,990 more rows
The output has 100,000 rows
3. calculate the mean for each replicate
Assign to bootstrap_distribution_mean_age
Display bootstrap_distribution_mean_age
The bootstrap_distribution_mean_age has 1000 means.
4. visualize the bootstrap distribution
visualize(bootstrap_distribution_mean_age)
Calculate the 95% confidence interval using the percentile method
congress_ci_percentile <- bootstrap_distribution_mean_age %>%
get_confidence_interval(type = "percentile", level = 0.95)
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1 )
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1) +
geom_vline(xintercept = pop_mean_age, color = "purple", size = 3)
Save the previous plot to preview.png and add to the yaml chunk at the top
Is population mean the 95% confidence interval constructed using the bootstrap distribution? Yes
Change set.seed(123) to set.seed(4346). Rerun all the code.