I'm learning how to use fairseq
to implement a simple translation model based on Transformer.
I would like to use 2 GeForce RTX 3090 GPUs on my lab server. Which option should I select for --ddp-backend
of fairseq-train
?
Furthermore, could you explain about the meaning of all following options for --ddp-backend
and when to use them respectively?
From
fairseq
Documentation: Command-line Tools =>fairseq-train
=> distributed_training
--ddp-backend
:Possible choices: c10d, fully_sharded, legacy_ddp, no_c10d, pytorch_ddp, slowmo
DistributedDataParallel backend
Default: “pytorch_ddp”
I'm new to stack exchange community, sorry if there is any inappropriate action.