When is the content flushed to a PDF File by itextsharp?

Question

I assumed that Document.Add() flushed content to the PDF file (the file stream) immediately, but it looks like that's not the case.

If you merely add a paragraph to the document, you add it to some page. Only when the page is finished and the document starts the next page, there is reason to hope the page to be (at least partially) written to disk. But you cannot really be sure before closing the document. — mkl, Feb 18 '14 at 13:09
I'm curious as to why you would want to know this and what you'd do with this information. Not that its a secret, I'm just curious. — Chris Haas, Feb 18 '14 at 14:11
@Chris, I was writing 100K plus records to a pdf and this was taking time, So i thought only flushing content to pdf file in a batch of say 1K would optimize the creation time of pdf file. — Sameer, Feb 18 '14 at 18:38
You say "100K plus records", does that mean you are writing to a table? If so, see this post: http://stackoverflow.com/a/15483598/231316 — Chris Haas, Feb 18 '14 at 19:45
You are most welcome. Btw I am not writing the data to a table but to a paragraph. — Sameer, Feb 19 '14 at 04:26
Obviously you shouldn't expect any content of a paragraph to be written to the output stream as long as the paragraph isn't added to the document. Such a paragraph doesn't "know" about the existence of the output stream. But you already knew that since you mentioned the `Add()` method. — Bruno Lowagie, Feb 19 '14 at 06:36

score 0 · Answer 1 · answered Feb 18 '14 at 13:49

PDF is a Page Description Language. Every page is an autonomous set of objects. The content is stored in one or more streams. There is no such thing as a paragraph or a table etc in a PDF. It's just a sequence of lines, shapes and glyphs drawn on a page.

When you add content to a document using the Add() method, this content is converted into PDF syntax that is appended to the content stream of a page. As soon as the page is full, this content stream and the corresponding page dictionary are written to the output stream and flushed.

Not sooner!

Several objects, such as fonts, the cross-reference table, Form XObjects,... are kept into memory, because they can change during the document creation process.

In some cases you can release these objects early. For instance: there a "release template" method to write Form XObject to the output stream immediately. Image XObjects are always written immediately.

Your question isn't really a question. It's a wrong assumption. Please clarify if you have a real question.

I was interested in knowing when the contents of the pdf objects are flushed to the file and does one can control it in itextsharp(say only flush the content to file after n number of pages instead flushing content after each page is full. Doesn't this make a sense? I am new to pdf language. — Sameer, Feb 18 '14 at 19:08
There's no point in waiting to flush a page content stream once it has been completed, so it isn't possible to put that "on hold". Some objects however, have to be kept in memory. For instance: when we embed a subset of a font, we keep that font in memory because a "future" page may need to add a glyph to that subset that hasn't been used on a previous page. Your question makes sense, but the phrasing was awkward (because it made an allegation that wasn't proven). — Bruno Lowagie, Feb 19 '14 at 06:34

When is the content flushed to a PDF File by itextsharp?

1 Answers1