I'm trying to find a way of using pipes to group data by part of a character vector using a function. The data is in this format:ampXXi
or ampXXXi
, where XX
or XXX
. Are the unique site codes and the i
denotes sub-sites within each site. Is there a way of grouping the data by each ampXXi
or ampXXXi
? I tried to sort this with function using grepl()
, but that didn't work. Thanks for any advice.
Asked
Active
Viewed 198 times
0

Dennis Kozevnikoff
- 2,078
- 3
- 19
- 29

user14014863
- 5
- 1
-
2you need to provide a [small reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Onyambu Aug 07 '20 at 15:29
1 Answers
2
substr() to get part of string variable for grouping
You could use substr()
to extract the unique site ids, and use the variable to group your data.
Example dataframe:
df <- data.frame(
x = c("amp22i", "amp333i", "amp11i", "amp22i", "amp11i", "amp333i"),
y = c(1:6),
stringsAsFactors = FALSE)
df
# x y
# 1 amp22i 1
# 2 amp333i 2
# 3 amp11i 3
# 4 amp22i 4
# 5 amp11i 5
# 6 amp333i 6
substr()
to make group id variable from portion of string
library(dplyr)
library(magrittr)
df %<>%
mutate(id = substr(x,4, nchar(x)))
df
# x y id
# 1 amp22i 1 22i
# 2 amp333i 2 333i
# 3 amp11i 3 11i
# 4 amp22i 4 22i
# 5 amp11i 5 11i
# 6 amp333i 6 333i
Grouping using pipes/group_by
and get group means.
df %>%
group_by(id) %>%
summarize(mean = mean(y))
# # A tibble: 3 x 2
# id mean
# <chr> <dbl>
# 1 11i 4
# 2 22i 2.5
# 3 333i 4
There are tidyverse
alternatives for the above, e.g. str_sub()
and str_length()
within mutate()
.

Tfsnuff
- 181
- 6