1 Module 1: Introduction to statistical inference and the sampling distribution of parameter estimates
Learning objectives
By the end of this module, you will be able to:
Describe real-world examples of questions that can be answered with the statistical inference methods presented in this course (e.g., estimation, hypothesis testing).
Name common population parameters (mean, median, proportion) often estimated using sample data and write computer scripts to calculate estimates of these parameters.
Define the following terms concerning statistical inference: population, sample, population parameters, estimate, sampling distribution, and sample distribution.
Write an R script to draw random samples from a finite population (e.g., census data).
Write an R script to estimate a sampling distribution for a given statistic and population.
Define random variables and explain how they relate to sampling.
Explain random and representative sampling and how this can influence estimation.
Note this partly a review of many concepts covered in DSCI 100.
1.1 Proposed structure of this chapter
This is work in progress! Please refer to readings listed on Canvas.
1.1.1 How to answer an inferential question
Different type of questions (see https://datasciencebook.ca/intro.html#asking-a-question): descriptive, exploratory, predictive, inferential, Causal, Mechanistic
Box: Measure of centrality: Median vs Mean (explain difference) Talk about the Mode and why is not good in many circumstances.
General idea of taking a sample
- Population vs sample
- Parameter vs sample statistics
Box: what is a random variable
Why random sample
- Bias
- Representative sample
- Generelization
1.1.2 Sampling distribution
What is the sampling distribution (emphasize all possible random sample of size n)
Introduction to package infer
Box: histograms, probability, reminder of ggplot
Where is it centered
Why we want the sampling distribution
How to approximate it computationally (re-emphasize the all possible sample)
Why we don’t have access to it
Properties
- introduce effect of increase sample size
- talk about the increase repetition
- SE (with known population parameters)