Ceci n’est pas une pipe

Erin

2017-04-04

What is the Pipe?

Magritte’s The Treachery of Images

Introduction to R’s Not-a-Pipe: %>%

I was once told me that %>% is called not-a-pipe since pipe was already taken for another symbol (|).

What is the pipe?

The pipe leads the OUTPUT of one command into the INPUT of the next one.
Functions in the Tidyverse are written to work well with the pipe

For example, to filter a data frame and add a new column you would…

x <- filter(mpg, hwy >= 31)
newx <- mutate(x, hwy.kpl = hwy * 0.4251)

But you could also pipe in the input to the next command.

x <- mpg %>% 
  filter(hwy >= 31) %>%
  mutate(hwy.kpl = hwy * 0.4251)

Data First

Notice how the the data frame is always the first argument to these functions.

When writing your OWN functions, if you want to use them in a pipe sequence remember that the dataframe must be the first argument
A surprsing number of other functions in R are NOT desgined with data frist, so you have to do some work to use them in a pipe:
lm(y ~ x, data=data) does not have the data as the first argument

Examples

The pipe makes combining output from various different functions much easier to write and read.

#Option A:
mutate( 
      filter(iris, Sepal.Width > 4), 
    total_width = Sepal.Width + Petal.Width
  )

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_width
## 1          5.7         4.4          1.5         0.4  setosa         4.8
## 2          5.2         4.1          1.5         0.1  setosa         4.2
## 3          5.5         4.2          1.4         0.2  setosa         4.4

This code computes the same output as before, but is much easier to read.

Remember, code should be written to be understood by future you. Make future you happy!

#Option B:
iris %>% 
  filter(Sepal.Width > 4) %>%
  mutate(total_width = Sepal.Width + Petal.Width)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_width
## 1          5.7         4.4          1.5         0.4  setosa         4.8
## 2          5.2         4.1          1.5         0.1  setosa         4.2
## 3          5.5         4.2          1.4         0.2  setosa         4.4

Story Time

The story

Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head

Story Time

The story

Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head

Could look like this in code. Where each step Foo Foo takes is a new function.

foo_foo <- little_bunny()
foo_foo_1 <- hop(foo_foo, through = forest)
foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)

Story Time

bop(
  scoop(
    hop(foo_foo, through = forest),
    up = field_mice
  ), 
  on = head
)

is much more confusing than…

Story Time

bop(
  scoop(
    hop(foo_foo, through = forest),
    up = field_mice
  ), 
  on = head
)

is much more confusing than…

foo_foo %>%
  hop(through = forest) %>%
  scoop(up = field_mouse) %>%
  bop(on = head)
)

Your Turn

What quick story can you tell using the pipe?

Your Turn

What quick story can you tell using the pipe?

humpty_dumpty %>%
  sat(on = wall) %>%
  fall(type = great) %>%
  left_join(kings_horses) %>%
  left_join(kings_men) %>%
  put_together(FALSE)

Difference bettween piping and not piping.

You tell me, what is happening in this code?!

hourly_delay <- filter(
  summarise(
    group_by(
      filter(flights, !is.na(dep_delay)), 
      date, hour), 
    delay = mean(dep_delay), 
    n = n()), 
  n > 10
)

Is this easier to understand?

hourly_delay <- flights %>% 
  filter(!is.na(dep_delay)) %>%
  group_by(date, hour) %>%
  summarise(
    delay = mean(dep_delay),
    n = n()
  ) %>% 
  filter(n > 10)

Rules for Piping

as borrowed from Bob Rudis

Rules for Piping

The pipe chain should be larger than 1
A pipe group should be designed to complete a specific task
It is okay to change objects along the pipe (i.e pass in a data frame and output a plot)
Be data source aware
Pipe operations should be “atomic”
Keep your pipes “short”

Pipe Chain should be larger than 1

Do not do this

df %>% str()

when

str(df)

is so much more compact and easier to read.

A pipe group should be designed to complete a specific task

Keeping a pipe group to one task will help with further rules (specifically length).

It is okay to change objects along the pipe

Most commonly, you might go from a data frame to a plot, or a list to a data frame.

iris %>% 
  count(Species, sort=TRUE) %>%
  mutate(Species=factor(Species, levels=Species)) %>%
  ggplot(aes(Species, n, fill=Species)) + geom_col()

Pipe operators should be “atomic”

This means that your pipe can have as many steps as want, but in the end should be doing ONE thing. This is to prevent wasting time if something goes wrong at the end of the pipe, making you start again.
The other gotcha is getting a nice plot but losing the data behind it. It’s a similar problem, but it’s an easy trap to fall into. (I do this all the time! Bad Erin.)

Keep your pipes “short”

All this to say, keep your pipes long enough to not add more confusion, and short enough to avoid breaking things. You’ll find that you’ll quickly discover what the right “length” for you is.

Discussion Questions

When should you use the pipe vs not?
Beside ease of writing/reading what is useful about using the pipe structure?
What should you never do in a pipe group?