You might think that Doc2Vec
(aka the 'Paragraph Vector' algorithm of Mikolov/Le) requires word-vectors as a 1st step. That's a common belief, and perhaps somewhat intuitive, by analogy to how humans learn a new language: understand the smaller units before the larger, then compose the meaning of the larger from the smaller.
But that's a common misconception, and Doc2Vec
doesn't do that.
One mode, pure PV-DBOW (dm=0
in gensim), doesn't use conventional per-word input vectors at all. And, this mode is often one of the fastest-training and best-performing options.
The other mode, PV-DM (dm=1
in gensim, the default) does make use of neighboring word-vectors, in combination with doc-vectors in a manner analgous to word2vec's CBOW mode – but any word-vectors it needs will be trained-up simultaneously with doc-vectors. They are not trained 1st in a separate step, so there's not a easy splice-in point where you could provide word-vectors from elsewhere.
(You can mix skip-gram word-training into the PV-DBOW, with dbow_words=1
in gensim, but that will train word-vectors from scratch in an interleaved, shared-model process.)
To the extent you could pre-seed a model with word-vectors from elsewhere, it wouldn't necessarily improve results: it could easily send their quality sideways or worse. It might in some lucky well-managed cases speed model convergence, or be a way to enforce vector-space-compatibility with an earlier vector-set, but not without extra gotchas and caveats that aren't a part of the original algorithms, or well-described practices.