How many consecutive elements in a row?

Whenever I discover a nice trick in R, I tweet about it. More often than not, future me will face the very same coding problem that past Juli already tweeted about. However, future Juli has a foggy memory: She knows past Juli solved this, but she doesn’t know how. Future Juli will then search Twitter for her own tweet containing the solution, and we all know that Twitter’s search function isn’t exactly ideal. To make the search a little easier for future Juli, I will from now on post these things on my blog as well, making them easier for me to find. Maybe others will benefit as well.

The things you will find here are neither especially clever nor especially original, but they came in handy when I needed them. I found some of these gems on Stack Overflow, and whenever my source memory doesn’t let me down and I’m able to recover the corresponding Stack Overflow post, I will link it.

How many consecutive elements?

Today’s piece of code is about finding out how many consecutive elements there are in a vector. I tweeted about this here. (I can’t find the original Stack Overflow post anymore, but here’s a similar one). Here’s some code:

(fruit <- c(rep("apple", 2), rep("orange", 2), "apple", "lemon", rep("orange", 4), "lemon"))
##  [1] "apple"  "apple"  "orange" "orange" "apple"  "lemon"  "orange" "orange"
##  [9] "orange" "orange" "lemon"

If we want to count how many times a fruit occurs in a row, we can simply run:

rle(fruit)$lengths
## [1] 2 2 1 1 4 1

There’s 2 apples, then there’s 2 oranges, then 1 apple … We can then use this to number the elements like this.

sequence(rle(fruit)$lengths)
##  [1] 1 2 1 2 1 1 1 2 3 4 1

How is this useful? Let’s suppose I want to add a trial number for a data frame, where every participant has a different number of trials. Here, I have separated the participants by colour, so it’s easier to see which rows belong to which participant:

participant score
1 -2.2450117
1 -1.2684645
1 -0.0285545
1 -1.8421782
1 -0.8529983
2 -0.3123817
2 0.2148599
2 -0.0320687
2 -1.8577839
3 2.0818096
3 -0.4396770
3 -0.7284104
4 0.4365058
5 0.4630120
5 -1.7608192
5 -0.0187533
5 2.1912900
6 0.0630289

Then I can number the data points for each participant like this:

data$trial_no <- sequence(rle(data$participant)$lengths)
participant score trial_no
1 -2.2450117 1
1 -1.2684645 2
1 -0.0285545 3
1 -1.8421782 4
1 -0.8529983 5
2 -0.3123817 1
2 0.2148599 2
2 -0.0320687 3
2 -1.8577839 4
3 2.0818096 1
3 -0.4396770 2
3 -0.7284104 3
4 0.4365058 1
5 0.4630120 1
5 -1.7608192 2
5 -0.0187533 3
5 2.1912900 4
6 0.0630289 1

Leave a Reply

Your email address will not be published. Required fields are marked *