1

I have the following line:

>XXX-220_5004_COVID-A6
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGTCAAATCAATGATATGATTTTATCTCTTCTTAGTAAAGGTAGACTTATAATTAG
AGAAAACAAC

I would like to convert the first line as follows:

>INITWORD/XXX-220_5004_COVID-A6/FINALWORD
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGT...

So far I have managed to add the first word as follows:

sed 's/>/>INITTWORD\//I'

That returns:

>INITWORD/XXX-220_5004_COVID-A6
    TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
    AGAAGGT

How can i add the FINALWORD at the end of the first line?

tripleee
  • 175,061
  • 34
  • 275
  • 318
david
  • 805
  • 1
  • 9
  • 21

1 Answers1

2

Just substitute more. sed conveniently allows you to recall the text you matched with a back reference, so just embed that between the things you want to add.

sed 's%^>\(.*\)%>INITWORD/\1/FINALWORD%I' file.fasta

I also added a ^ beginning-of-line anchor, and switched to % delimiters so the slashes don't need to be escaped.

In some more detail, the s command's syntax is s/regex/replacement/flags where regex is a regular expression to match the text you want to replace, and replacement is the text to replace it with. In the regex, you can use grouping parentheses \(...\) to extract some of the matched text into the replacement; so \1 refers to whatever matched the first set of grouping parentheses, \2 to the second, etc. The /flags are optional single-character specifiers which modify the behavior of the command; so for example, a /g flag says to replace every match on a line, instead of just the first one (but we only expect one match per line so it's not necessary or useful here).

The I flag is non-standard but since you are using that, I assume it does something useful for you.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I would suggest to use the beginng of line `^` instead of `<` – Ivan Jul 27 '21 at 08:56
  • Great !!! I thought the I was to add the word to the beg of the file. – david Jul 27 '21 at 09:06
  • @ivan Huh? I use `^` but the `>` is also necessary because that's what marks a line as a FASTA header line. – tripleee Jul 27 '21 at 09:12
  • 1
    I can't tell you what `I` flag does because it's non-standard; I'm guessing maybe it makes the match case-insensitive (which doesn't do anything in this particular case, because you have no alphabetics in the regex). Probably review your local `sed` manual page. – tripleee Jul 27 '21 at 09:14
  • @tripleee my bad! I thought OP wanted `INITWORD>` instead of `>INITWORD` – Ivan Jul 27 '21 at 09:32