1

I'm learning how to use fairseq to implement a simple translation model based on Transformer.

I would like to use 2 GeForce RTX 3090 GPUs on my lab server. Which option should I select for --ddp-backend of fairseq-train?

Furthermore, could you explain about the meaning of all following options for --ddp-backend and when to use them respectively?

From fairseq Documentation: Command-line Tools => fairseq-train => distributed_training


--ddp-backend:

Possible choices: c10d, fully_sharded, legacy_ddp, no_c10d, pytorch_ddp, slowmo

DistributedDataParallel backend

Default: “pytorch_ddp”

I'm new to stack exchange community, sorry if there is any inappropriate action.

2 Answers2

1

I am not too sure, but I found this on GitHub

DDP_BACKEND_CHOICES = ChoiceEnum(
    [
        "c10d",  # alias for pytorch_ddp
        "fully_sharded",  # FullyShardedDataParallel from fairscale
        "legacy_ddp",
        "no_c10d",  # alias for legacy_ddp
        "pytorch_ddp",
        "slowmo",
    ]
)

Might be helpful, but I am also struggling with this

xihajun
  • 37
  • 4
0

You can find this in the options.py file, hope it's helpful. But they only describe the difference between "c10d" and "no_c10d". So we should keep going to find more.

This is the link.

Yuan Gao
  • 1
  • 1