2

I am using fairseq (version: 1.0.0a0+14c5bd0) to fine-tune a model as per this link. However, there are lots of parameters used that I cannot find in the docs nor when I run fairseq-train --help. Examples include:

--warmup-updates
--encoder-normalize-before
--label-smoothing

Are they replaced by some other params?

M.A.G
  • 559
  • 2
  • 6
  • 21

1 Answers1

0

When you train your models, you can call general training parameters (documented in the CLI help) or component-specific parameters. You often need to look for the latter using the search bar on the top left of the documentation site.

Concerning the specific ones you highlighted, some are documented with their components in the documentation:

  • --warmup-updates is an attribute of the learning rate scheduler (doc)
  • --encoder-normalize-before is a Transformer model parameter (doc)

And some are documented only in the code (if at all):

  • --label-smoothing is a parameter of the label-smoothed cross-entropy loss (code)
Clef.
  • 487
  • 3
  • 14
  • But I am confused on how I can set the warmup-updates, is it based on the number of epochs or the number of train steps? For example if I set the warmup-updates to 4000, will that use the warmup-init-lr for 4000 train steps or 4000 epochs? – M.A.G Nov 23 '21 at 20:27
  • In the code, it updates "the learning rate at the beginning of the given epoch." (with step_begin_epoch). This is another question than the one you originally asked, though. – Clef. Nov 24 '21 at 10:20