Skip to content

R Style Guide

A Ho edited this page Jun 5, 2018 · 1 revision

Introduction

This style guide is designed to help collaborators to write R code in a consistent and clear way. It is not an exhaustive guide, but should give an idea of principles to adopt.

Future developments will be aimed at using the lintr package to apply coding style rules in an automated way.

This style guide is a work in progress.


Principles

The style guide below is based on the following principles for writing good R code, which should be treated roughly in order of importance:

  1. Your code should work
  2. You should know why your code works
  3. You should be able to easily explain why your code works
  4. Someone else, reading your code without you, should be able to see why your code works

Note the emphasis on the why of the code. The what is usually fairly clear from context, but the way you write your code should be indicative of the overall purpose you're trying to achieve.


Naming Objects

The following are some guidelines for naming objects. There are other conventions available, but we suggest:

Case

Use snake_case rather than camelCase for defining objects. There is no need to use a different convention for functions, as they are clear from their definitions.

Names

Functions should be given imperative verb names, which describe what they do.

a function called list_files should return a list of files; a function called filter_ofsted should filter a table of Ofsted grades.

Objects (like data frames) should be given noun names which describe what they are:

an object called provider_details should be a table (or vector) of details of providers an object called static_map should be a map which doesn't change

For objects of unusual classes (such as SQL Connections, SpatialPointsDataFrames and other objects associated with particular packages), it may be helpful to prefix the name with a useful indicator of what it is. For example:

spdf_la_boundaries   # a SpatialPointsDataFrame of Local Authority boundaries (sp package)
conn_rat_data        # an RODBC connection to the RATData database

This is not necessary for common objects, such as data.frames and function()s, which should be obvious from the context of the code. It is essential to use good comments to describe objects when they are created (see below).

If possible, try to avoid using very generic terms in your object names. Try to avoid data, func and so on, and focus on making the name more descriptive.

In general, use full names rather than abbreviations. provider_profile is clearer than pp or prov_prof. You can make use of RStudio's autocomplete functionality to speed up typing.

Function Arguments

R is quite clever with function arguments. They can either be fully named, partially named or not named at all provided they're in the right order. When using a function, we recommend trying to fully name the arguments rather than rely on having them in the right order or using abbreviations. For example:

sqlQuery(channel = conn_amp, 
         query = ofsted_query)

rather than sqlQuery(conn_amp, ofsted_query) or sqlQuery(ch = conn_amp, qu = ofsted_query).

Exceptions to this are:

  1. When using the pipe (%>%), you don't need to name the argument which is being piped. So: c("Apple", "Banana", "Cherry") %>% str_detect(pattern = "a") is better than: c("Apple", "Banana", "Cherry") %>% str_detect(string = ., pattern = "a")

  2. dplyr functions which don't take named arguments. For example, the logical filters to dplyr::filter are not named, nor are the columns used when using dplyr::select. E.g.

ofsted_grades %>% 
  select(URN, Grade, EndDate) %>%     # These arguments to `select()` and `filter() are not named.
  filter(Grade == "Outstanding")
  1. Some base R functions have the first (and most important) argument called x or X. Examples are sapply, lapply (and the rest of this family); grep, grepl, sub, gsub etc. You don't need to specify this unless it adds clarity.

  2. Functions with only one argument are generally fine to use without naming the argument. Again, consider whether it adds clarity to your code to add the name or not.


Comments

Use a single # to write a comment, and leave a space after the hash to make it easier to read.

good: # this is a comment bad: #this is a comment

In RStudio, use sections to break up your code by pressing Ctrl+Shift+R. This will insert a section of the name you specify, and then dashes up to 75 characters, visually separating the code and adding a section to the navigation pane at the right hand edge of the editor, like below:

# Data Import -------------------------------------------------------------

When leaving a comment on code, put the comment on the line above and describe what the following lines are doing (including the why!). For example:

# Reactive Objects ---------------------------------------------------------

# Reactive spatial dataset for visualising catchment are on the map
  lsoas_to_map <- eventReactive(
    
    # only react to change in school, nothing else
    eventExpr = input$school,
    
    valueExpr = {
      
      # get list of laestabs required (possibly more than one)
      laestabs <- schools$LAESTAB[schools$id == input$school]
      
      # filter lsoa_pupils to only those lsoas relevant to the school
      lsoas_needed <- lsoa_pupils$LSOA[lsoa_pupils$LAESTAB %in% laestabs]
      
      # remove NAs
      lsoas_needed <- lsoas_needed[complete.cases(lsoas_needed)] 
      
      # filter the shapefile and merge the result with the pupil census data
      school_shapes <- lsoa_shapes_11[lsoa_shapes_11@data$lsoa11nm %in% lsoas_needed, ] %>% 
        sp::merge(., lsoa_pupils[lsoa_pupils$LAESTAB %in% laestabs, ],
                  by.x = "lsoa11nm", by.y = "LSOA")
      
      # return this filtered SpatialPointsDataFrame
      school_shapes
    }
  )

.....

# Outputs ---------------------------------------------------------------------

Try not to let your comments get too long. If you need really detailed explanations of methods (which you seldom will!), you should try putting that into some supporting documentation

You will find that commenting your code like this helps you catch errors, and writing the comments helps you fully understand what your code is doing and why (principles 1, 2 and 3). Others coming to read your code in the future will be grateful for concise, informative comments (principle 4).


Spacing

You are not limited by space! Use blank lines to visually separate out pieces of code which do separate things. This (combined with good commenting) makes it much easier to read what is going on (principle 4).

See the example in the Comments section above. It is easy to see the different stages involved in that process. The group of lines under the # filter the shapefile... comment are linked by a pipe (%>%), and so the spatial grouping of code reflects the logical grouping of the actions.

If in doubt, err towards having more blank lines rather than fewer to separate out your code.


Indenting

Indenting is another useful visual tool for making your code more readable, by aligning lines of code with the same precedence. This is usually a factor when nesting functions (which happens all the time when using r shiny, for example).

The rules are:

  • When a pipeline (using %>%) runs over multiple lines, indent the lower ones by 2 spaces
  • When a list of function arguments runs over multiple lines, indent to the function's opening bracket.

For example:

# First Rule

# retrieve school urns with outstanding ofsted grades
outstanding_urns <- ofsted_data %>%
  filter(grade == "outstanding") %>%
  pull(URN)

.....

# Second Rule

# define a tibble of ofsted grades as a lookup table
ofsted_lookup <- tibble(grade_numeric   = c(1, 2, 3, 4),
                        grade_character = c("outstanding", "good",
                                            "requires improvement", "inadequate")
)

Note that it's quite helpful to put the final closing bracket (in this case of the tibble() function, on its own line, so it lines up with the start of the definition. This makes it easier to see when you've missed a bracket off.

It's often good to add in extra blank spaces (e.g. after grade_numeric) if it can help to align other things, such as the = signs.


Further Guidance

As mentioned, this is not by any means an exhaustive list of rules! Other useful style guides include:

  • The google style guide

    • Note that it recommends avoiding snake_case - please ignore this and use snake_case.
    • Certainly don't use dot.case, as this can be confusing if working with S3 Methods
  • Hadley Wickham's style guide, which is a modification of Google's, and is better in my opinion!


References

  • DfE R Style Guide