3

I am trying to translate from English to Arabic using Fairseq. But the interactive.py script translate pieces of text fragment on-the-fly. But I need to use it as reading an input text file and writing output text file write. I referred this GitHub issue - https://github.com/pytorch/fairseq/issues/858 But it doesn't clearly explain on how to do it in general. Any suggestions ?

2 Answers2

6

fairseq-interactive can read lines from a file with the --input parameter, and it outputs translations to standard output.

So let's say I have this input text file source.txt (where every sentence to translate is on a separate line):

Hello world!
My name is John

You can run:

fairseq-interactive --input=source.txt [all-your-fairseq-parameters] > target.txt

Where > target.txt means "put in the target.txt file all (standard) output generated by fairseq-interactive". The file will be created if it doesn't exist yet.

With an English to French model it would generate a file target.txt that looks something like this (actual output may vary depending on your model, configuration and Fairseq version):

S-0     Hello world!
W-0     0.080   seconds
H-0     -0.43813419342041016    Bonj@@ our le monde !
D-0     -0.43813419342041016    Bonjour le monde !
P-0     -0.1532 -1.7157 -0.0805 -0.0838 -0.1575
S-1     My name is John
W-1     0.080   seconds
H-1     -0.3272092938423157     Je m' appelle John .
D-1     -0.3272092938423157     Je m'appelle John.
P-2     -0.3580 -0.2207 -0.0398 -0.1649 -1.0216 -0.1583

To keep only the translations (lines starting with D-), you would have to filter the content of this file. You could use this command for example:

grep -P "D-[0-9]+" target.txt | cut -f3 > only_translations.txt

but you can merge all commands in one line:

fairseq-interactive --input=source.txt [all-your-fairseq-parameters] | grep -P "D-[0-9]+" | cut -f3 > target.txt

(Actual command will depend on the actual structure of target.txt.)

Finally, know that you can use --input=- to read input from standard input.

Xavier
  • 441
  • 3
  • 7
  • Can you write some example of Fairseq parameters (all-your-fairseq-parameters) in line below line, so I can just run this command ? – Alex Dolton Jan 21 '21 at 19:33
  • cat source.txt | fairseq-interactive [all-your-fairseq-parameters] > target.txt – Alex Dolton Jan 21 '21 at 19:35
  • @user1599262 the list of parameters really depend on the model that was trained and how it was trained. The [Fairseq documentation](https://fairseq.readthedocs.io/en/latest/getting_started.html) has a simple example use of `fairseq-interactive`. – Xavier Feb 05 '21 at 22:28
  • This answer might be obsolete by now, but for future reference: there is also a flag (`--input`) that allows you to pass an input file directly without piping. – Dieuwke May 17 '21 at 12:31
0

I found that fairseq-interactive is a bit slow. I think there is another potential solution if you just want input and output files using the fairseq pretrained model. (but not sure if it will be faster)

Basically, you can load the model in python and use model.translate

from fairseq.models.transformer import TransformerModel
trans = TransformerModel.from_pretrained(
  'models/',
  checkpoint_file='checkpoint_best.pt',
  data_name_or_path='bin/',
  is_gpu=True
).cuda()
inputs = "Di-mairt Clodh-bhualadh a cheud leabhair,"
print(trans.translate(inputs))

Following this idea, you can read the file and translate it easily. But maybe there is a better way to translate the file directly.

xihajun
  • 37
  • 4