Stat 406
Daniel J. McDonald
Last modified – 11 September 2023
\[ \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\minimize}{minimize} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\find}{find} \DeclareMathOperator{\st}{subject\,\,to} \newcommand{\E}{E} \newcommand{\Expect}[1]{\E\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]} \newcommand{\given}{\ \vert\ } \newcommand{\X}{\mathbf{X}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\snorm}[1]{\lVert #1 \rVert} \]
https://ubc-stat.github.io/stat-406/
Hosted on Github.
Links to slides and all materials
Syllabus is there. Be sure to read it.
Link to join on Canvas. This is our discussion board.
Note that this data is hosted on servers outside of Canada. You may wish to use a pseudonym to protect your privacy.
Anything super important will be posted to Slack and Canvas.
Be sure you get Canvas email.
If I am sick, I will cancel class or arrange a substitute.
Linked from the website.
This is where you complete / submit assignments / projects / in-class-work
This is also hosted on Servers outside Canada https://github.com/stat-406-2023/
Yes, some data is hosted on servers in the US.
But in the real world, no one uses Canvas / Piazza, so why not learn things they do use?
Much easier to communicate, “mark” or comment on your work
Much more DS friendly
Note that MDS uses both of these, the Stat and CS departments use both, many faculty use them, Google / Amazon / Facebook use things like these, etc.
Much of this lecture is based on material from Colin Rundel and Karl Broman
Words of wisdom
Your closest collaborator is you six months ago, but you don’t reply to emails.
– Paul Wilson
This will hurt, but what doesn’t kill you, makes you stronger.
git
is a command line program that lives on your machinegit init
.git
.git
directory contains a history of all changes made to “versioned” files.ipynb
& .md
git
git
/GitHub is broad and complicated. Here, just what you needTip
First things first, RStudio and the Terminal
Command line is the “old” type of computing. You type commands at a prompt and the computer “does stuff”.
You may not have seen where this is. RStudio has one built in called “Terminal”
The Mac System version is also called “Terminal”. If you have a Linux machine, this should all be familiar.
Windows is not great at this.
To get the most out of Git, you have to use the command line.
Repeat 3–5 as needed. Once you’re satisfied
You decide what is “versioned”.
A file called .gitignore
tells git
files or types to never track
```{bash}
# History files
.Rhistory
.Rapp.history
# Session Data files
.RData
# User-specific files
.Ruserdata
# Compiled junk
*.o
*.so
*.DS_Store
```
You each have your own repo
You make a branch
DO NOT rename files
Make enough commits (3 for labs, 5 for HW).
Push your changes (at anytime) and make a PR against main
when done.
TAs review your work.
On HW, if you want to revise, make changes in response to feedback and push to the same branch. Then “re-request review”.
master
vs main
git
main/develop/branch
workflowmain
is protected, released version of software (maybe renamed to release
)develop
contains things not yet on main
, but thoroughly testeddevelop
gets merged to main
feature
branch off develop
to build your new featuredevelop
. Supervisors review your contributionsI and many DS/CS/Stat faculty use this workflow with my lab.
Typical for your PR to trigger tests to make sure you don’t break things
Typical for team members or supervisors to review your PR for compliance
The .github
directory contains interactions with GitHub
In this course, I protect main
so that you can’t push there
Important
Read the PR template!!
Initializing
```{bash}
git config user.name --global "Daniel J. McDonald"
git config user.email --global "daniel@stat.ubc.ca"
git config core.editor --global nano
# or emacs or ... (default is vim)
```
Staging
Committing
```{bash}
# stage/commit simultaneously
git commit -am "message"
# open editor to write long commit message
git commit
```
Pushing
Branching
```{bash}
# switch to branchname, error if uncommitted changes
git checkout branchname
# switch to a previous commit
git checkout aec356
# create a new branch
git branch newbranchname
# create a new branch and check it out
git checkout -b newbranchname
# merge changes in branch2 onto branch1
git checkout branch1
git merge branch2
# grab a file from branch2 and put it on current
git checkout branch2 -- name/of/file
git branch -v # list all branches
```
Check the status
Sometimes you merge things and “conflicts” happen.
Meaning that changes on one branch would overwrite changes on a different branch.
Here are lines that are either unchanged from
the common ancestor, or cleanly resolved
because only one side changed.
But below we have some troubles
<<<<<<< yours:sample.txt
Conflict resolution is hard;
let's go shopping.
=======
Git makes conflict resolution easy.
>>>>>>> theirs:sample.txt
And here is another line that is cleanly
resolved or unmodified.
You get to decide, do you want to keep
======
)======
)But always delete the <<<<<
, ======
, and >>>>>
lines.
Once you’re satisfied, committing resolves the conflict.
32b252c854c45d2f8dfda1076078eae8d5d7c81f
32b25
git
docs, it’s reversed, they point to the thing on which they dependhttps://training.github.com/downloads/github-git-cheat-sheet.pdf
README.md
git status
will give some of these as suggestions1. Saved but not staged
2. Staged but not committed
```{bash}
# make a new branch with everything, but stay on main
git branch newbranch
# find out where to go to
git log
# undo everything after ace2193
git reset --hard ace2193
git checkout newbranch
```
Anything more complicated, either post to Slack or LMGTFY
In the Lab next week, you’ll practice
UBC Stat 406 - 2023