Frequently asked questions

How do I succeed in this class?

Complete readings before the material is covered in class, and then review again afterwards.
Participate actively in class. If you don’t understand something, I can guarantee no one else does either. I have a Ph.D., and I’ve been doing this for more than 10 years. It’s hard for me to remember what it’s like to be you and what you don’t know. Say something! I want you to learn this stuff, and I love to explain more carefully.
Come to office hours. Again, I like explaining things.
Try the Labs again without the help of your classmates.
Read the examples at the end of the [ISLR] chapters. Try the exercises.
Do not procrastinate — don’t let a module go by with unanswered questions as it will just make the following module’s material even more difficult to follow.
Do the Worksheets.

Git and Github

Homework/Labs workflow

Rstudio version (uses the Git tab. Usually near Environment/History in the upper right)

Make sure you are on main. Pull in remote changes. Click .
Create a new branch by clicking the think that looks kinda like .
Work on your documents and save frequently.
Stage your changes by clicking the check boxes.
Commit your changes by clicking Commit.
Repeat 3-5 as necessary.
Push to Github
When done, go to Github and open a PR.
Use the dropdown menu to go back to main and avoid future headaches.

Command line version

(Optional, but useful. Pull in any remote changes.) git pull
Create a new branch git branch -b <name-of-branch>
Work on your documents and save frequently.
Stage your changes git add <name-of-document1> repeat for each changed document. git add . stages all changed documents.
Commit your changes git commit -m "some message that is meaningful"
Repeat 3-5 as necessary.
Push to Github git push. It may suggest a longer form of this command, obey.
When done, go to Github and open a PR.
Switch back to main to avoid future headaches. git checkout main.

Asking for a HW regrade.

To be eligible

You must have received >3 points of deductions to be eligible.
And they must have been for “content”, not penalties.
If you fix the errors, you can raise your grade to 7/10.
You must make revisions and re-request review within 1 week of your initial review.

Go to the your local branch for this HW. If you don’t remember the right name, you can check the PRs in your repo on GitHub by clicking “Pull Requests” tab. It might be closed.
Make any changes you need to make to the files, commit and push. Make sure to rerender the .pdf if needed.
Go to GitHub.com and find the original PR for this assignment. There should now be additional commits since the previous Review.
Add a comment to the TA describing the changes you’ve made. Be concise and clear.
Under “Reviewers” on the upper right of the screen, you should see a 🔁 button. Once you click that, the TA will be notified to review your changes.

Fixing common problems

`master/main`

“master” has some pretty painful connotations. So as part of an effort to remove racist names from code, the default branch is now “main” on new versions of GitHub. But old versions (like the UBC version) still have “master”. Below, I’ll use “main”, but if you see “master” on what you’re doing, that’s the one to use.

Start from main

Branches should be created from the main branch, not the one you used for the last assignment.

git checkout main

This switches to main. Then pull and start the new assignment following the workflow above. (In Rstudio, use the dropdown menu.)

You forgot to work on a new branch

Ugh, you did some labs before realizing you forgot to create a new branch. Don’t stress. There are some things below to try. But if you’re confused ASK. We’ve had practice with this, and soon you will too!

(1) If you started from main and haven’t made any commits (but you SAVED!!):

git branch -b <new-branch-name>

This keeps everything you have and puts you on a new branch. No problem. Commit and proceed as usual.

(2) If you are on main and made some commits:

git branch <new-branch-name>
git log

The first line makes a new branch with all the stuff you’ve done. Then we look at the log. Locate the most recent commit before you started working. It’s a long string like ac2a8365ce0fa220c11e658c98212020fa2ba7d1. Then,

git reset ac2a8 --hard

This rolls main back to that commit. You don’t need the whole string, just the first few characters. Finally

git checkout <new-branch-name>

and continue working.

(3) If you started work on <some-old-branch> for work you already submitted:
This one is harder, and I would suggest getting in touch with the TAs. Here’s the procedure.

git commit -am "uhoh, I need to be on a different branch"
git branch <new-branch-name>

Commit your work with a dumb message, then create a new branch. It’s got all your stuff.

git log

Locate the most recent commit before you started working. It’s a long string like ac2a8365ce0fa220c11e658c98212020fa2ba7d1. Then,

git rebase --onto main ac2a8 <new-branch-name>
git checkout <new-branch-name>

This makes the new branch look like main but without the differences from main that are on ac2a8 and WITH all the work you did after ac2a8. It’s pretty cool. And should work. Finally, we switch to our new branch.

How can I get better at R?

I get this question a lot. The answer is almost never “go read the book How to learn R fast” or “watch the video on FreeRadvice.com”. To learn programming, the only thing to do is to program. Do your tutorialls. Redo your tutorials. Run through the code in the textbook. Ask yourself why we used one function instead of another. Ask questions. Play little coding games. If you find yourself wondering how some bit of code works, run through it step by step. Print out the results and see what it’s doing. If you take on these kinds of tasks regularly, you will improve rapidly.

Coding is an active activity just like learning Spanish. You have to practice constantly. For the same reasons that it is difficult/impossible to learn Spanish just from reading a textbook, it is difficult/impossible to learn R just from reading/watching.

When I took German in 7th grade, I remember my teacher saying “to learn a language, you have to constantly tell lies”. What he meant was, you don’t just say “yesterday I went to the gym”. You say “yesterday I went to the market”, “yesterday I went to the movies”, “today she’s going to the gym”, etc. The point is to internalize conjugation, vocabulary, and the inner workings of the language. The same is true when coding. Do things different ways. Try automating regular tasks.

Recommended resources

Data Science: A first introduction This is the course textbook for UBC’s DSCI 100
R4DS written by Hadley Wickham and Garrett Grolemund
DSCI 310 Coursenotes by Tiffany A. Timbers, Joel Ostblom, Florencia D’Andrea, and Rodolfo Lourenzutti
Happy Git with R by Jenny Bryan
Modern Dive: Statistical Inference via Data Science
Stat545
Google

My code doesn’t run. What do I do?

This is a constant issue with code, and it happens to everyone. The following is a general workflow for debugging stuck code.

If the code is running, but not doing what you want, see below.
Read the Error message. It will give you some important hints. Sometimes these are hard to parse, but that’s ok.

set.seed(12345)
y <- rnorm(10)
x <- matrix(rnorm(20), 2)
linmod <- lm(y ~ x)
## Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE): variable lengths differ (found for 'x')

This one is a little difficult. The first stuff before the colon is telling me where the error happened, but I didn’t use a function called model.frame.default. Nonetheless, after the colon it says variable lengths differ. Well y is length 10 and x has 10 rows right? Oh wait, how many rows does x have?

Read the documentation for the function in the error message. For the above, I should try ?matrix.
Google!! If the first few steps didn’t help, copy the error message into Google. This almost always helps. Best to remove any overly specific information first.
Ask your classmates Slack. In order to ask most effectively, you should probably provide them some idea of how the error happened. See the section on MWEs for how to do this.
See me or the TA. Note that it is highly likely that I will ask if you did the above steps first. And I will want to see your minimal working example (MWE).

Warning

If you meet with me, be prepared to show me your code! Or message me your MWE. Or both. But not neither.

If the error cannot be reproduced in my presence, it is very unlikely that I can fix it.

Minimal working examples

An MWE is a small bit of code which will work on anyone’s machine and reproduce the error that you are getting. This is a key component of getting help debugging. When you do your homework, there’s lots of stuff going on that will differ from most other students. To allow them (or me, or the TA) to help you, you need to be able to get their machine to reproduce your error (and only your error) without much hassle.

I find that, in the process of preparing an MWE, I can often answer my own question. So it is a useful exercise even if you aren’t ready to call in the experts yet. The process of stripping your problem down to its bare essence often reveals where the root issue lies. My above code is an MWE: I set a seed, so we both can use exactly the same data, and it’s only a few lines long without calling any custom code that you don’t have.

For a good discussion of how to do this, see the R Lecture or stackexchange.

How to write good code

This is covered in much greater detail in the lectures, so see there. Here is my basic advice.

Write script files (which you save) and source them. Don’t do everything in the console. R (and python and Matlab and SAS) is much better as a scripting language than as a calculator.
Don’t write anything more than once. This has three corollaries:
1. If you are tempted to copy/paste, don’t.
2. Don’t use magic numbers. Define all constants at the top of the script.
3. Write functions.
The third is very important. Functions are easy to test. You give different inputs and check whether the output is as expected. This helps catch mistakes.
There are two kinds of errors: syntax and function.
- The first R can find (missing close parenthesis, wrong arguments, etc.)
- The second you can only catch by thorough testing
Don’t use magic numbers.
Use meaningful names. Don’t do this:

data("ChickWeight")
out <- lm(weight ~ Time + Chick + Diet, data = ChickWeight)

Comment things that aren’t clear from the (meaningful) names.
Comment long formulas that don’t immediately make sense:

garbage <- with(
  ChickWeight, 
  by(weight, Chick, function(x) (x^2 + 23) / length(x))
) ## WTF???