1  Module 1: Introduction to statistical inference and the sampling distribution of parameter estimates

Learning objectives

By the end of this module, you will be able to:

  • Describe real-world examples of questions that can be answered with the statistical inference methods presented in this course (e.g., estimation, hypothesis testing).

  • Name common population parameters (mean, median, proportion) often estimated using sample data and write computer scripts to calculate estimates of these parameters.

  • Define the following terms concerning statistical inference: population, sample, population parameters, estimate, sampling distribution, and sample distribution.

  • Write an R script to draw random samples from a finite population (e.g., census data).

  • Write an R script to estimate a sampling distribution for a given statistic and population.

  • Define random variables and explain how they relate to sampling.

  • Explain random and representative sampling and how this can influence estimation.

Note this partly a review of many concepts covered in DSCI 100.

1.1 Proposed structure of this chapter

This is work in progress! Please refer to readings listed on Canvas.

1.1.1 How to answer an inferential question

  • Different type of questions (see https://datasciencebook.ca/intro.html#asking-a-question): descriptive, exploratory, predictive, inferential, Causal, Mechanistic

  • Box: Measure of centrality: Median vs Mean (explain difference) Talk about the Mode and why is not good in many circumstances.

  • General idea of taking a sample

    • Population vs sample
    • Parameter vs sample statistics
  • Box: what is a random variable

  • Why random sample

    • Bias
    • Representative sample
    • Generelization

1.1.2 Sampling distribution

  • What is the sampling distribution (emphasize all possible random sample of size n)

  • Introduction to package infer

  • Box: histograms, probability, reminder of ggplot

  • Where is it centered

  • Why we want the sampling distribution

  • How to approximate it computationally (re-emphasize the all possible sample)

  • Why we don’t have access to it

  • Properties

    • introduce effect of increase sample size
    • talk about the increase repetition
    • SE (with known population parameters)

Required readings

MD: Chapter 7 up to and including 7.4

Additional resources