How to load pre-trained model with in gensim and train doc2vec with it?

Question

I am having a ready to go word2vec model that I already trained. I have serialized it as a CSV file:

word,  v0,     v1,     ..., vN
house, 0.1234, 0.4567, ..., 0.3461
car,   0.456,  0.677,  ..., 0.3461

What I'd like to know is how I can load that word vector model in gensim and use that to train a paragraph or doc2vec model.

This Doc2Vec tutorial says I can load a model in form of a "# C text format" but I have no idea what that actually means. What is "C text format" in the first place but more important:

How can I load my word2vec model and use it for doc2vec training?

How do I build the vocabulary from my word2vec model?

Some one asked a similar question here: https://stackoverflow.com/questions/27470670/how-to-use-gensim-doc2vec-with-pre-trained-word-vectors?rq=1 — Anushka--x, Jul 09 '18 at 15:52

score 1 · Accepted Answer · answered Jul 29 '16 at 02:38

Doc2Vec does not need word-vectors as an input: it will create any word-vectors that are needed during its own training. (And some modes, like pure DBOW – dm=0, dbow_words=0 – don't use or train word-vectors at all.)

Seeding a Doc2Vec model with word-vectors might help or hurt; there's not much theory or published results to offer guidance. There's an experimental method on Word2Vec, intersect_word2vec_format(), that can merge word2vec-c-format vectors into a model with an existing vocabulary, but you'd need to review the source to really understand its assumptions:

https://github.com/RaRe-Technologies/gensim/blob/51753b95415bbc344ea6af671818277464905ea2/gensim/models/word2vec.py#L1140

I cannot proof this statement but I think the document vectors work better if one provides pre-trained word vectors. I only tested this by commenting out the intersect part and compared the results. But thanks for providing an answer :) — Stefan Falk, Jul 29 '16 at 09:52
Work better on what task, with how much data, with which pre-trained vectors? — gojomo, Jul 29 '16 at 16:41

How to load pre-trained model with in gensim and train doc2vec with it?

1 Answers1

Linked