I was once told me that %>%
is called not-a-pipe since pipe was already taken for another symbol (|).
For example, to filter a data frame and add a new column you would…
x <- filter(mpg, hwy >= 31)
newx <- mutate(x, hwy.kpl = hwy * 0.4251)
But you could also pipe in the input to the next command.
x <- mpg %>%
filter(hwy >= 31) %>%
mutate(hwy.kpl = hwy * 0.4251)
Notice how the the data frame is always the first argument to these functions.
lm(y ~ x, data=data)
does not have the data as the first argumentThe pipe makes combining output from various different functions much easier to write and read.
#Option A:
mutate(
filter(iris, Sepal.Width > 4),
total_width = Sepal.Width + Petal.Width
)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_width
## 1 5.7 4.4 1.5 0.4 setosa 4.8
## 2 5.2 4.1 1.5 0.1 setosa 4.2
## 3 5.5 4.2 1.4 0.2 setosa 4.4
This code computes the same output as before, but is much easier to read.
Remember, code should be written to be understood by future you. Make future you happy!
#Option B:
iris %>%
filter(Sepal.Width > 4) %>%
mutate(total_width = Sepal.Width + Petal.Width)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_width
## 1 5.7 4.4 1.5 0.4 setosa 4.8
## 2 5.2 4.1 1.5 0.1 setosa 4.2
## 3 5.5 4.2 1.4 0.2 setosa 4.4
The story
Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head
The story
Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head
Could look like this in code. Where each step Foo Foo takes is a new function.
foo_foo <- little_bunny()
foo_foo_1 <- hop(foo_foo, through = forest)
foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)
bop(
scoop(
hop(foo_foo, through = forest),
up = field_mice
),
on = head
)
is much more confusing than…
bop(
scoop(
hop(foo_foo, through = forest),
up = field_mice
),
on = head
)
is much more confusing than…
foo_foo %>%
hop(through = forest) %>%
scoop(up = field_mouse) %>%
bop(on = head)
)
What quick story can you tell using the pipe?
What quick story can you tell using the pipe?
humpty_dumpty %>%
sat(on = wall) %>%
fall(type = great) %>%
left_join(kings_horses) %>%
left_join(kings_men) %>%
put_together(FALSE)
You tell me, what is happening in this code?!
hourly_delay <- filter(
summarise(
group_by(
filter(flights, !is.na(dep_delay)),
date, hour),
delay = mean(dep_delay),
n = n()),
n > 10
)
Is this easier to understand?
hourly_delay <- flights %>%
filter(!is.na(dep_delay)) %>%
group_by(date, hour) %>%
summarise(
delay = mean(dep_delay),
n = n()
) %>%
filter(n > 10)
Do not do this
df %>% str()
when
str(df)
is so much more compact and easier to read.
Keeping a pipe group to one task will help with further rules (specifically length).
Most commonly, you might go from a data frame to a plot, or a list to a data frame.
iris %>%
count(Species, sort=TRUE) %>%
mutate(Species=factor(Species, levels=Species)) %>%
ggplot(aes(Species, n, fill=Species)) + geom_col()
All this to say, keep your pipes long enough to not add more confusion, and short enough to avoid breaking things. You’ll find that you’ll quickly discover what the right “length” for you is.