How to I remove .. and all characters that follow in column names?

Question

I am working with a dataset(df) that is gene x cell line identifier. The gene names are annotated with an additional character string that I want to remove. For example SP1 is annotated SP1..6667. I want to remove the ..6667 to have the column names only SP1.

The following code worked to do this:

colnames(df) <- gsub("\\..*","",colnames(df)) # remove character string after gene name

The problem is that a few genes have a single . in their names and that I do not want to remove. For example HLA.A is labeled HLA.A..3105. I want to remove the ..3105 to give HLA.A but my current code removes .A..3105 to give HLA.

How can I modify my gsub function to specify .. instead of any . ?

Do you mean `"\\.{2}.*"`? – Wiktor Stribiżew Jul 08 '20 at 13:49 — Wiktor Stribiżew, Jul 08 '20 at 13:49

score 2 · Answer 1 · answered Jul 08 '20 at 13:52

2

All you need to do is alter the regex call like below:

colnames(df) <- gsub("\\.{2}.*","",colnames(df))

This tells it to start the substitution once it spots exactly two periods.

answered Jul 08 '20 at 13:52

Todd Burus

963
1
6
20

How to I remove .. and all characters that follow in column names?

1 Answers1