Here is the code (i took it from this discussion Translation DNA to Protein, but here i'm using RNA instead of DNA file):
from itertools import takewhile
def translate_rna(sequence, d, stop_codons=('UAA', 'UGA', 'UAG')):
start = sequence.find('AUG')
# Take sequence from the first start codon
trimmed_sequence = sequence[start:]
# Split it into triplets
codons = [trimmed_sequence[i:i + 3] for i in range(0, len(trimmed_sequence), 3)]
# Take all codons until first stop codon
coding_sequence = takewhile(lambda x: x not in stop_codons and len(x) == 3, codons)
# Translate and join into string
protein_sequence = ''.join([codontable[codon] for codon in coding_sequence])
# This line assumes there is always stop codon in the sequence
return "{0}".format(protein_sequence)
Calling the translate_rna
function:
sequence = ''
for line in open("to_rna", "r"):
sequence += line.strip()
translate_rna(sequence, d)
My to_rna
file looks like:
CCGCCCCUCUGCCCCAGUCACUGAGCCGCCGCCGAGGAUUCAGCAGCCUCCCCCUUGAGCCCCCUCGCUU
CCCGACGUUCCGUUCCCCCCUGCCCGCCUUCUCCCGCCACCGCCGCCGCCGCCUUCCGCAGGCCGUUUCC
ACCGAGGAAAAGGAAUCGUAUCGUAUGUCCGCUAUCCAG.........
The function translate only the first proteine (from the first AUG
to the first stop_codon
)
I think the problem is in this line:
# Take all codons until first stop codon
coding_sequence = takewhile(lambda x: x not in stop_codons and len(x) == 3 , codons)
My question is : How can i tell python (after finding the first
AUG
and store it intocoding_sequence
as a list) to search again the nextAUG
in the RNA file and sotre it in the next position.
As a result, i wanna have a list like that:
['here_is_the_1st_coding_sequence', 'here_is_the_2nd_coding_sequence', ...]
PS : This is a homework, so i can't use Biopython.
EDIT:
A simple way to describe the problem:
From this code:
from itertools import takewhile
lst = ['N', 'A', 'B', 'Z', 'C', 'A', 'V', 'V' 'Z', 'X']
ch = ''.join(lst)
stop = 'Z'
start = ch.find('A')
seq = takewhile(lambda x: x not in stop, ch)
I want to get this:
['AB', 'AVV']
EDIT 2:
For instance, from this string:
UUUAUGCGCCGCUAACCCAUGGUUCCCUAGUGGUCCUGACGCAUGUGA
I should get as result:
['AUGCGCCGC', 'AUGGUUCCC', 'AUG']