0

I'm writing a report with a lot of content, and the size of pdf is getting way bigger to write on the server memory.

Right now I'm running a test with over 5 million records (real production case), the data itself takes almost 4.5GB, the pdf is now writing the registries around the 3.5 million, and the memory use of the application is on 35GB.

It's running very slow now because the memory of the server is on 95% of use.

My code looks like that:

Console.WriteLine("Gathering Data");
DataTable dt = functionthatgathersdata();

Document document = new Document(PageSize.A4.Rotate(), 0f, 0f, 140f, 70f);

using (FileStream ms = new FileStream(@"E:\rel.pdf", FileMode.Create))
{
    PdfWriter writer = PdfWriter.GetInstance(document, ms);
    writer.PageEvent = new PageTemplateReport()
    {
        attr = "report header"
    };

    document.Open();
    iTextSharp.text.Font font5 = iTextSharp.text.FontFactory.GetFont(FontFactory.HELVETICA, 8);

    PdfPTable table = new PdfPTable(dt.Columns.Count - 1);
    float[] widths = new float[] { 50f, 100f, 40f, 50f, 50f, 50f, 130f, 50f, 50f, 28f, 35f };
    table.SetWidths(widths);
    table.WidthPercentage = 95f;
    table.HeaderRows = 1;

    PdfPCell cell = new PdfPCell(new Phrase("Report"));

    cell.Colspan = dt.Columns.Count - 1;

    table.AddCell(new Phrase("col1", font5));
    table.AddCell(new Phrase("col2", font5));
    table.AddCell(new Phrase("col3", font5));
    table.AddCell(new Phrase("col4", font5));
    table.AddCell(new Phrase("col5", font5));
    table.AddCell(new Phrase("col6", font5));
    table.AddCell(new Phrase("col7", font5));
    table.AddCell(new Phrase("col8", font5));
    table.AddCell(new Phrase("col9", font5));
    table.AddCell(new Phrase("col10", font5));
    table.AddCell(new Phrase("col11", font5));
    var i = 0;

    foreach (DataRow r in dt.Rows)
    {
        i++;
        Console.WriteLine(i.ToString() + " of " + dt.Rows.Count +  " " + "size: " + ms.Length / 1024 + "KB");

        if (dt.Rows.Count > 0)
        {
            table.AddCell(new Phrase(r[0].ToString(), font5));
            table.AddCell(new Phrase(r[1].ToString(), font5));
            table.AddCell(new Phrase(r[2].ToString(), font5));
            table.AddCell(new Phrase(r[3].ToString(), font5));
            table.AddCell(new Phrase(r[4].ToString(), font5));
            table.AddCell(new Phrase(r[5].ToString(), font5));
            table.AddCell(new Phrase(r[7].ToString(), font5));
            table.AddCell(new Phrase(r[8].ToString(), font5));
            table.AddCell(new Phrase("", font5));
            table.AddCell(new Phrase(r[10].ToString(), font5));
            table.AddCell(new Phrase((r[11].ToString() == string.Empty) ? "" : Convert.ToDecimal(r[11].ToString()).ToString("N2") + " KB", font5));



        }
    }
    document.Add(table);


    ms.Close();

}

For the data part I can solve the problem gathering chunks of data, writing it and them disposing to get another chunk.

For the pdf part, I want to know if there is a way to write the data directly on the disk, instead of manipulate it from memory.

  • 1
    Serious question: what use is a PDF with 5 million records in it? – DavidG Aug 03 '16 at 15:37
  • another serious question. how many pdf editors are there, or is there an editor that would be able to open the file? – Irdis Aug 03 '16 at 15:40
  • @DavidG If you have a lot of accounting information that needs to be extracted from a software for summary reasons you could need it. – Broco Aug 03 '16 at 15:49
  • @Broco Then output to CSV. A PDF this large is no use to anyone. – DavidG Aug 03 '16 at 15:50
  • @DavidG The question wasn't if its useful and he didn't say what he needs it for. There are a lot of special cases in businesses and sometimes you have to adapt to bad standards. Otherwise there wouldn't be one office with Windows left. – Broco Aug 03 '16 at 15:55
  • 1
    @DavidG I've seen many such PDFs. There are banks who take a snap shot of all accounts in PDF every day. Those PDFs have 100 thousands of pages. No one ever reads them, until something goes wrong. In that case, a human being goes over the 100 thousands of pages. Strange but true. This being said: you can add the rows of a table gradually, instead of building the complete table first. See the answer to the duplicate question. – Bruno Lowagie Aug 04 '16 at 07:54
  • @BrunoLowagie I get that there are niche places that do it, but it's mainly for some historical/political reason. I've seen (and created, ugh) similar things myself, like "reports" in Excel that are hundreds of Mb in size. There's no good reason these days to not have the data in a more sensible format (e.g. CSV in this case) – DavidG Aug 04 '16 at 11:04
  • @DavidG Well, I'm old enough to remember the days that these snap shots were printed on paper. From that perspective, I am happier with a solution that uses PDF. But I hear you: there are better solutions than PDF for this kind of requirement. – Bruno Lowagie Aug 04 '16 at 11:06
  • @BrunoLowagie Haha indeed! I think PDF used to be the format of choice as it was considered fixed and un-editable. I've had to educate people about this on more than one occasion. – DavidG Aug 04 '16 at 11:09
  • Answering the questions. About the pdf size, it's a huge log about what people do on my system. Basically every action a user do in the system creates a row in this report. About changing the report type, sadly this is not in my power, i realy have to adapt to a bad standard (C'mon guys, look, I'm working with .NET). – Luiz Eduardo Simões Aug 04 '16 at 11:43
  • Just to close this case, @BrunoLowagie solution worked for me! The pdf came out as a 700MB file, but I didn't manage to open it yet, Firefox's PDF.JS couldn't, so I am transfering it to my machine to find something that can. Thanks for the help! – Luiz Eduardo Simões Aug 04 '16 at 13:45

0 Answers0