2  Module 2: approximation of the sampling distribution via bootstrapping

Learning objectives

By the end of this module, you will be able to: - Compare and contrast quantitative and categorical variables. - Explain why we don’t know/have a sampling distribution in practice/real life. - Describe the standard deviation and the variance and write computer scripts to calculate estimates of these parameters. - Define standard error and explain its purpose. - Define bootstrapping. - Write a computer script to create a bootstrap distribution to approximate a sampling distribution. - Contrast a bootstrap sampling distribution with a sampling distribution obtained using multiple samples. - Contrast sampling with and without replacement.

2.1 Proposed structure of this chapter

This is work in progress! Please refer to readings listed on Canvas.

2.1.1 Using bootstrapping to estimate the sampling distribution

  • Cannot resample from population, use sample as approximation of population
  • Box: Introduce categorical variables, and the concept of proportion
  • Sampling with and without replacement
  • Independent vs dependent samples
  • Why the sample size needs to be the same (Effect of sample size)
  • Infer package for bootstrapping

2.1.2 Using bootstrap to estimate of the standard error

  • Comparison of bootstrap vs sampling distribution.
  • Highlighting that the center of the bootstrap distribution is the sample value (sample mean) and that the bootstrap does not provide further info on the point estimate itself.
  • It’s all about the variation (width on the distribution)
  • Clarify notation for SE and estimate of SE. Box: standard deviation (why n-1 in standard deviation)
  • Show that bootstrap SE estimate is a good estimate of the sampling SE

2.1.3 Box: real example

a concrete example, with interpretation of SE.

Required readings

MD: Chapter 8 up to and including Section 8.2

MD: Section 8.7.1