77

I have automated my build to convert Markdown files to DOCX files using Pandoc. I have even used a reference document for the final document's styling. The command I use is:

pandoc -f markdown -t docx --data-dir=docs/rendering/ mydoc.md -o mydoc.docx

The reference.docx is picked up by Pandoc from docs/rendering and Pandoc renders mydoc.docx with the same styles as the reference doc.

However, reference.docx contains more than just styles. It contains coporate logos, preamble, etc.

How can I automate the merging of the Markdown content with both the styles and content of reference.docx. My solution needs to work on Linux.

Palec
  • 12,743
  • 8
  • 69
  • 138
Synesso
  • 37,610
  • 35
  • 136
  • 207
  • You could try MergeDocx (our commercial product; Java) – JasonPlutext Jul 28 '13 at 22:15
  • I have once tried to do something like this, but I found it was easier to just render a webpage with special print styles and let Chrome make a PDF from it. Or print the page to a PDF printer. I had to recreate the whole DOCX but that seemed easier than merging in the whole formatting (and risking mistakes). – ayke Nov 18 '13 at 16:40

4 Answers4

34

Update

Use the piped version suggested by user Christian Long:

pandoc -t latex mydoc.md | pandoc -f latex --data-dir=docs/rendering/ -o mydoc.docx

I know this is late in coming, but I'll be assuming people are still searching for solutions to this three years after the original question -- I know I was.

My solution was to use LaTeX as an intermediary between markdown and docx (actually, I was converting from org-mode, but same difference). So in your case, I believe a one-liner solution would be:

pandoc -f markdown -t latex -o mydoc.tex mydoc.md && \
pandoc -f latex -t docx --data-dir=docs/rendering/ -o mydoc.docx mydoc.tex

Which might get you closer to your goal. Of course, Pandoc has about hundred arguments it can handle, and there are probably ways to make this prettier. It has also gotten quite a few updates since you first posted your question.

Community
  • 1
  • 1
François Leblanc
  • 1,412
  • 1
  • 17
  • 23
  • 1
    This solution works. To simplify a bit, you can skip the intermediate `.tex` file, and pipe the LaTeX-formatted data from one pandoc to another. `pandoc -t latex mydoc.md | pandoc -f latex --data-dir=docs/rendering/ -o mydoc.docx ` – Christian Long Nov 16 '18 at 05:30
14

Ideally you could use a custom docx template, but pandoc doesn't support that yet. A reference.docx file only allows custom styles to be embedded in newly created docx files.

Fortunately you can approximate this using odt instead of docx. You can fairly easily modify the default OpenDocument template to include your custom logos, preamble, and other stuff. Use the custom template in conjunction with a reference.odt file to get all the styles and custom content.

Once you have the file in odt format, you can use any number of command line tools to convert from odt to docx. For example, on Linux you can run

libreoffice --invisible --convert-to docx test.odt

Or on OS X:

/Applications/LibreOffice.app/Contents/MacOS/soffice.bin --invisible --convert-to docx test.odt
Andrew
  • 36,541
  • 13
  • 67
  • 93
  • 1
    Why not automatically convert the docx files/templates to odt? And then use the odt in the pandoc conversion? (I'm a pandoc and document-template newbie, so please pardon my ignorant question.) – Johnny Utahh Jun 13 '15 at 13:11
  • 2
    Pandoc needs special variables in the odt template to work correctly (see https://github.com/andrewheiss/Global-Pandoc-files/blob/master/templates/odt.template#L34, for example). You can't create those in native docx—they have to be added by hand through a text editor. – Andrew Jun 13 '15 at 18:21
11

Ideally, PanDoc will grow this feature but it doesn't look like likely any time soon.

I don't know about any tools that will do the job directly, but you could probably achieve fall back to merging reference.docx and your PanDoc-produced mydoc.docx in code.

The .docx format is a ZIP archive of (mostly) XML files. The most important is word/document.xml. If you use an XML tool to take (most of) the document.xml from one file and insert it into the other, you'll have something closer to what you need.

I could hack together an example in, say, Ruby if an illustration would help.

RJHunter
  • 2,829
  • 3
  • 25
  • 30
9

UPDATE: this feature is incomplete

I used it on some complex templates, and found it mapped the fonts, company logos, etc very well. But going .docx -> .docx, I had to manually apply Heading styles to the chapter / section breaks. The font was correct, but the sectioning wasn't. I'll try .md -> .docx next.


This feature is now available in Pandoc, as described here:

Markdown to docx, including complex template

From the link above:

pandoc  input --reference-docx=my-reference.docx -o out.docx

where my-reference.docx (n.b. not a .dotx) can be:

  • the current folder OR
  • a folder which is defined by --data-dir OR
  • the system default folder for data-dir which is
    • $HOME/.pandoc on UNIX-like systems
    • C:\Documents And Settings\USERNAME\Application Data\pandoc on Windows XP you should not use any more
    • C:\Users\USERNAME\AppData\Roaming\pandoc on Windows Vista or later.
Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
Jason
  • 2,507
  • 20
  • 25
  • This line of code seems to be incomplete and doesn't answer the question above. – AdamO Dec 30 '19 at 16:23
  • 1
    The tag has been changed to `--reference-doc=my-reference.docx` – MyICQ Feb 10 '23 at 14:10
  • I'm assuming `input` must mean `from-file.md`; I see zero mentions of `input` being used on the command line in the [Pandoc User Guide](https://pandoc.org/MANUAL.html). – Todd Partridge May 14 '23 at 19:58