6

Using iTextSharp, how can I merge multiple PDFs into one PDF without losing the Form Fields and their properties in each individual PDF?

(I would prefer an example using streams from a database but file system is ok as well)

I found this code that works but it flattens out my PDFs so I can't use it.

UPDATE

@Mark Storer - This is the code I am using now based on your feedback (see below) but it gives me a corrupt document after the save. I tested each of the code parts separately and it seems to be failing in the MergePdfForms function shown below. I obviously don't want to use the renameFields part of your example because I need the field names to remain "as is".

Public Sub MergePdfForms(ByVal pdfFiles As ArrayList, ByVal outputPath As String)
    Dim ms As New IO.MemoryStream()
    Dim copier As New PdfCopyFields(ms)
    For Each pfile As String In pdfFiles
        Dim reader As New PdfReader(pfile)
        copier.AddDocument(reader)
    Next
    SaveMemoryStream(ms, outputPath)
    copier.Close()
End Sub

Public Sub SaveMemoryStream(ms As IO.MemoryStream, FileName As String)
    Dim outStream As IO.FileStream = IO.File.OpenWrite(FileName)
    ms.WriteTo(outStream)
    outStream.Flush()
    outStream.Close()
End Sub
RichC
  • 7,829
  • 21
  • 85
  • 149
  • If you don't rename the fields, then all the fields that share a name are **the same field**. That's how AcroForms work. If you want editable fields with the same names and different values, PDF can't help you. You could probably pull it off with JS hackery in the "Page Open" event, where you're 'manually' setting all the fields to the appropriate value based on some internal array. Printing probably wouldn't work, and you'd need a custom submit script (you'd want to make N separate submits each with their own set of values, which means building your own submission each time. Nontrivial. – Mark Storer Nov 20 '19 at 18:55

2 Answers2

11

Fields in PDFs have an Unusual Property: All fields with the same name are the same field. They share a value. This is handy when the form refers to the same person and you have a nice naming scheme across forms. It's Not Handy when you want to put 20 instances of a single form into a single PDF.

This makes merging multiple forms challenging, to say the least. The most common option (thanks to iText), is to flatten the forms prior to merging them, at which point you're no long merging forms, and the problem Goes Away.

The other option is to rename your fields prior to merging them. This can make data extraction difficult later, can break scripts, and is generally a PITA. That's why flattening is so much more popular.

There's a class in iText called PdfCopyFields, and it will correctly copy fields from one document to another... it will also merge fields with the same name correctly, such that they really share a single value and Acrobat/Reader doesn't have to do a bunch of extra work on the file to get it that way before displaying it to a user.

However, PdfCopyFields will not rename fields for you. To do that, you need to get the AcroFields object from the PdfReader in question, and call renameField(String, String) on Each And Every Field prior to merging the documents with PdfCopyFields.

All this is for "AcroForm"-based PDF forms. If you're dealing with XFA forms (forms from LiveCycle Designer), all bets are off. You have to muck with the XML, A Lot.

And heaven help you if you have to combine forms from both.

So ass-u-me-ing that you're working with AcroForm fields, the code might look something like this (forgive my Java):

public void mergeForms(String outpath, String inPaths[]) {
  PdfCopyFields copier = new PdfCopyFields(new FileOutputStream(outpath) );
  for (String curInPath : inPaths) {
    PdfReader reader = new PdfReader(curInPath);
    renameFields(reader.getAcroFields());

    copier.addDocument(reader);
  }
  copier.close();
}

private static int counter = 0;
private void renameFields(AcroFields fields) {
  Set<String> fieldNames = fields.getFields().keySet();
  String prepend = String.format("_%d.", counter++);

  for(String fieldName : fieldNames) {
    fields.rename(fieldName, prepend + fieldName);
  }
}

Ideally, renameFields would also create a generic field object named prepend's-value and make all the other fields in the document it's children. This would make Acrobat/Reader's life easier and avoid an apparently unnecessary "save changes?" request when closing the resulting PDF from Acrobat.

Yes, that's why Acrobat will sometimes ask you to save changes when You Didn't Do Anything! Acrobat did something behind the scenes.

Mark Storer
  • 15,672
  • 3
  • 42
  • 80
  • Great post!! Hopefully, I can clear some of this up. Yes, it would only involve AcroForms and the fields that would be named the same, would be named the same on purpose because they are supposed to share values and be of the same field type. Some of the same-named fields are just placeholders for X,Y coordinates that other objects will be stamped at as well. So, PdfCopyFields sounds like it will fit my scenario if I'm understanding your answer correctly, yes? – RichC Jun 13 '11 at 22:00
  • Doesn't seem to be working. I updated my post with the VB.NET code I'm using based on your Java example. – RichC Jun 14 '11 at 02:06
  • Ah! You need to call `close` before you save the memory stream. That's where most of the work is done (as far as actually writing to the output stream). – Mark Storer Jun 14 '11 at 16:55
  • If your fields look like this "topmostSubform[0].Page1[0].CheckBox[2]", then prepending doesn't work as you can see [here](https://stackoverflow.com/a/43957708/79339). Instead of prepending to the beginning, "suffix" to the end, i.e `fields.rename(fieldName, fieldName + prepend);`. This should work in all cases. – Chaitanya Jun 30 '17 at 04:54
  • @chaitanya Unless iText's XFA support has changed radically (and it's been 5+ years since I last checked, so it's certainly possible), simply renaming the field is still going to leave all manner of things broken in the XML. – Mark Storer Jul 05 '17 at 16:04
  • Thanks @MarkStorer . To be honest, I was only looking at one of my problems where I had to combine multiple copies of the form that had fields like I mentioned in previous comment. "Flattening" had the correct data in the fields, but the forms were no longer editable. Replacing "prepend" with the "suffix" version worked for me. Probably because they are "AcroForm" based pdfs. – Chaitanya Jul 05 '17 at 22:43
  • FWIW, I just ran into this recently. I was using a pdf form as a template and generating a batch of documents using that template by merging with the iText7 library. The interesting behavior I saw was that, the form looks normal in Chrome and IE, but when viewing the merged document in Acrobat, only the first page had form values. The rest were blank. – Adriang Nov 20 '19 at 15:02
  • Did you remember to rename the fields? – Mark Storer Nov 20 '19 at 18:47
2

you can also use this code.... it will merge all the pdf file without losing field value..

    Document document = new Document();
    try
        {         
           string destinationfile = desktopPath.Replace(@"d:\outputfile.pdf");
           PdfCopyFields copier = new PdfCopyFields(new FileStream(destinationfile,     FileMode.Create));
            PdfImportedPage page;

            //Loops for each file that has been listed
            foreach (string filename in fileList)
            {
                flag++;
                try
                {
                    //The current file path
                    string filePath = sourcefolder + filename;

                    PdfReader reader = new PdfReader(filePath);
                    copier.AddDocument(reader);

                }
                catch
                {

                }
            }
            copier.Close();
        }
manu
  • 43
  • 1
  • 9