I am trying to translate from English to Arabic using Fairseq. But the interactive.py script translate pieces of text fragment on-the-fly. But I need to use it as reading an input text file and writing output text file write. I referred this GitHub issue - https://github.com/pytorch/fairseq/issues/858 But it doesn't clearly explain on how to do it in general. Any suggestions ?
2 Answers
fairseq-interactive
can read lines from a file with the --input
parameter, and it outputs translations to standard output.
So let's say I have this input text file source.txt
(where every sentence to translate is on a separate line):
Hello world!
My name is John
You can run:
fairseq-interactive --input=source.txt [all-your-fairseq-parameters] > target.txt
Where > target.txt
means "put in the target.txt
file all (standard) output generated by fairseq-interactive
". The file will be created if it doesn't exist yet.
With an English to French model it would generate a file target.txt
that looks something like this (actual output may vary depending on your model, configuration and Fairseq version):
S-0 Hello world!
W-0 0.080 seconds
H-0 -0.43813419342041016 Bonj@@ our le monde !
D-0 -0.43813419342041016 Bonjour le monde !
P-0 -0.1532 -1.7157 -0.0805 -0.0838 -0.1575
S-1 My name is John
W-1 0.080 seconds
H-1 -0.3272092938423157 Je m' appelle John .
D-1 -0.3272092938423157 Je m'appelle John.
P-2 -0.3580 -0.2207 -0.0398 -0.1649 -1.0216 -0.1583
To keep only the translations (lines starting with D-
), you would have to filter the content of this file. You could use this command for example:
grep -P "D-[0-9]+" target.txt | cut -f3 > only_translations.txt
but you can merge all commands in one line:
fairseq-interactive --input=source.txt [all-your-fairseq-parameters] | grep -P "D-[0-9]+" | cut -f3 > target.txt
(Actual command will depend on the actual structure of target.txt
.)
Finally, know that you can use --input=-
to read input from standard input.

- 441
- 3
- 7
-
Can you write some example of Fairseq parameters (all-your-fairseq-parameters) in line below line, so I can just run this command ? – Alex Dolton Jan 21 '21 at 19:33
-
cat source.txt | fairseq-interactive [all-your-fairseq-parameters] > target.txt – Alex Dolton Jan 21 '21 at 19:35
-
@user1599262 the list of parameters really depend on the model that was trained and how it was trained. The [Fairseq documentation](https://fairseq.readthedocs.io/en/latest/getting_started.html) has a simple example use of `fairseq-interactive`. – Xavier Feb 05 '21 at 22:28
-
This answer might be obsolete by now, but for future reference: there is also a flag (`--input`) that allows you to pass an input file directly without piping. – Dieuwke May 17 '21 at 12:31
I found that fairseq-interactive is a bit slow. I think there is another potential solution if you just want input and output files using the fairseq pretrained model. (but not sure if it will be faster)
Basically, you can load the model in python and use model.translate
from fairseq.models.transformer import TransformerModel
trans = TransformerModel.from_pretrained(
'models/',
checkpoint_file='checkpoint_best.pt',
data_name_or_path='bin/',
is_gpu=True
).cuda()
inputs = "Di-mairt Clodh-bhualadh a cheud leabhair,"
print(trans.translate(inputs))
Following this idea, you can read the file and translate it easily. But maybe there is a better way to translate the file directly.

- 37
- 4