Frequently asked questions

How do I succeed in this class?

How can I get better at R/python?

I get this question a lot. The answer is almost never “go read the book How to learn R fast” or “watch the video on FreeRadvice.com”. To learn programming, the only thing to do is to program. Do your tutorialls. Redo your tutorials. Run through the code in the textbook. Ask yourself why we used one function instead of another. Ask questions. Play little coding games. If you find yourself wondering how some bit of code works, run through it step by step. Print out the results and see what it’s doing. If you take on these kinds of tasks regularly, you will improve rapidly.

Coding is an active activity just like learning Spanish. You have to practice constantly. For the same reasons that it is difficult/impossible to learn Spanish just from reading a textbook, it is difficult/impossible to learn R just from reading/watching.

When I took German in 7th grade, I remember my teacher saying “to learn a language, you have to constantly tell lies”. What he meant was, you don’t just say “yesterday I went to the gym”. You say “yesterday I went to the market”, “yesterday I went to the movies”, “today she’s going to the gym”, etc. The point is to internalize conjugation, vocabulary, and the inner workings of the language. The same is true when coding. Do things different ways. Try automating regular tasks.

If you are still looking for more reading, there are some links on my website as well as many other places accessible by Google. A supremely useful text book is R4DS.

My code doesn’t run. What do I do?

This is a constant issue with code, and it happens to everyone. The following is a general workflow for debugging stuck code.

  1. If the code is running, but not doing what you want, see below.

  2. Read the Error message. It will give you some important hints. Sometimes these are hard to parse, but that’s ok.

set.seed(12345)
y = rnorm(10)
x = matrix(rnorm(20),2)
linmod = lm(y~x)
## Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE): variable lengths differ (found for 'x')

This one is a little difficult. The first stuff before the colon is telling me where the error happened, but I didn’t use a function called model.frame.default. Nonetheless, after the colon it says variable lengths differ. Well y is length 10 and x has 10 rows right? Oh wait, how many rows does x have?

  1. Read the documentation for the function in the error message. For the above, I should try ?matrix.

  2. Google!! If the first few steps didn’t help, copy the error message into Google. This almost always helps. Best to remove any overly specific information first.

  3. Ask your classmates. In order to ask most effectively, you should probably provide them some idea of how the error happened. See the section on MWEs for how to do this.

  4. See me or the TA. Note that it is highly likely that I will ask if you did the above steps first. And I will want to see your minimal working example (MWE).

If you meet with me, be prepared to show me your code! Or message me your MWE. Or both. But not neither. If the error cannot be reproduced in my presence, it is very unlikely that I can fix it.

Minimal working examples

An MWE is a small bit of code which will work on anyone’s machine and reproduce the error that you are getting. This is a key component of getting help debugging. When you do your homework, there’s lots of stuff going on that will differ from most other students. To allow them (or me, or the TA) to help you, you need to be able to get their machine to reproduce your error (and only your error) without much hassle.

I find that, in the process of preparing an MWE, I can often answer my own question. So it is a useful exercise even if you aren’t ready to call in the experts yet. The process of stripping your problem down to its bare essence often reveals where the root issue lies. My above code is an MWE: I set a seed, so we both can use exactly the same data, and it’s only a few lines long without calling any custom code that you don’t have.

For a good discussion of how to do this, see stackexchange.

How to write good code

This is covered in much greater detail in the lectures, so see there. Here is my basic advice.

  1. Write script files (which you save) and source them. Don’t do everything in the console. R (and python and Matlab and SAS) is much better as a scripting language than as a calculator.
  2. Don’t write anything more than once. This has three corollaries: a. If you are tempted to copy/paste, don’t. b. Don’t use magic numbers. Define all constants at the top of the script. c. Write functions.
  3. The third is very important. Functions are easy to test. You give different inputs and check whether the output is as expected. This helps catch mistakes.
  4. There are two kinds of errors: syntax and function.
    • The first R can find (missing close parenthesis, wrong arguments, etc.)
    • The second you can only catch by thorough testing
  5. Don’t use magic numbers.
  6. Use meaningful names. Don’t do this:
data("ChickWeight")
out = lm(weight~Time+Chick+Diet, data=ChickWeight)
  1. Comment things that aren’t clear from the (meaningful) names.
  2. Comment long formulas that don’t immediately make sense:
garbage = with(ChickWeight, 
               by(weight, Chick, 
                  function(x) (x^2+23)/length(x))) ## WTF???

Resources for learning to code better