0

I'm trying to find a way of using pipes to group data by part of a character vector using a function. The data is in this format:ampXXi or ampXXXi , where XX or XXX. Are the unique site codes and the i denotes sub-sites within each site. Is there a way of grouping the data by each ampXXi or ampXXXi? I tried to sort this with function using grepl(), but that didn't work. Thanks for any advice.

Dennis Kozevnikoff
  • 2,078
  • 3
  • 19
  • 29
  • 2
    you need to provide a [small reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Onyambu Aug 07 '20 at 15:29

1 Answers1

2

substr() to get part of string variable for grouping

You could use substr() to extract the unique site ids, and use the variable to group your data.

Example dataframe:

df <- data.frame(
          x = c("amp22i", "amp333i", "amp11i", "amp22i", "amp11i", "amp333i"),
          y = c(1:6), 
          stringsAsFactors = FALSE)

df

#         x y
# 1  amp22i 1
# 2 amp333i 2
# 3  amp11i 3
# 4  amp22i 4
# 5  amp11i 5
# 6 amp333i 6

substr() to make group id variable from portion of string

library(dplyr)
library(magrittr)

df %<>% 
  mutate(id = substr(x,4, nchar(x)))

df

#          x y   id
#  1  amp22i 1  22i
#  2 amp333i 2 333i
#  3  amp11i 3  11i
#  4  amp22i 4  22i
#  5  amp11i 5  11i
#  6 amp333i 6 333i

Grouping using pipes/group_by and get group means.

df %>% 
  group_by(id) %>% 
  summarize(mean = mean(y))

# # A tibble: 3 x 2
#   id     mean
#   <chr> <dbl>
# 1 11i     4  
# 2 22i     2.5
# 3 333i    4 

There are tidyverse alternatives for the above, e.g. str_sub() and str_length() within mutate().

Tfsnuff
  • 181
  • 6