-1

I am working with a dataset(df) that is gene x cell line identifier. The gene names are annotated with an additional character string that I want to remove. For example SP1 is annotated SP1..6667. I want to remove the ..6667 to have the column names only SP1.

The following code worked to do this:

colnames(df) <- gsub("\\..*","",colnames(df)) # remove character string after gene name

The problem is that a few genes have a single . in their names and that I do not want to remove. For example HLA.A is labeled HLA.A..3105. I want to remove the ..3105 to give HLA.A but my current code removes .A..3105 to give HLA.

How can I modify my gsub function to specify .. instead of any . ?

1 Answers1

2

All you need to do is alter the regex call like below:

colnames(df) <- gsub("\\.{2}.*","",colnames(df))

This tells it to start the substitution once it spots exactly two periods.

Todd Burus
  • 963
  • 1
  • 6
  • 20