3

I have a book in PDF format and I can't change anything in it with sed. I can't really use LibreOffice to edit it because it's a 300-page document with lots of images, and my PC would crash during the opening. All I need to do is change one or two characters at only one spot (for example, changing "+2" to "+3", where "+2" has only one occurrence in the entire book) so the solution should be basic.

I uncompressed it and then tried to use sed on it, sed didn't change anything, although echo $? would return 0.

pdftk file.pdf output uncompressed_file.pdf uncompress
sed -i 's/foo/bar/g' uncompressed_file.pdf
pdftk uncompressed_file.pdf output corrected_file.pdf compress

This very code worked with another file, I suspect that some PDF files prevent editing and I'm looking for a way to bypass that.

Pippin
  • 323
  • 1
  • 9
  • Possible duplicate of [How to find and replace text in a existing PDF file with PDFTK (or other command line application)](https://stackoverflow.com/questions/9871585/how-to-find-and-replace-text-in-a-existing-pdf-file-with-pdftk-or-other-command) – nullPointer Feb 18 '19 at 14:37
  • 1
    The exit code from `sed` merely reflects whether the script was able to execute sucessfully; whether or not `sed` actually found any text to substitute will not be reflected in the status. – tripleee Feb 18 '19 at 14:53
  • 1
    PDF is pesky, there is no guarantee that `foo` is present literally. Can you find the text with `grep -a foo uncompressef_file.pdf`? – tripleee Feb 18 '19 at 14:54
  • Yes I did find it with `pdfgrep` first. I made sure that the occurrence happened only once before trying to modify it with `sed`. – Pippin Feb 18 '19 at 15:01
  • 2
    But pdfgrep will presumably decode the pdf file before/while grepping it. Seqd will not do this. A simple grep will confirm that sed has a chance. – Gem Taylor Feb 18 '19 at 15:11
  • I see my mistake now. `grep` alone can't find `foo`. What should I do now? – Pippin Feb 18 '19 at 15:14
  • I'm afraid I'm voting to close this question. It's not a programming question per-se, and the premise that you can safely do what you're asking in sed is incorrect. So the only way to answer this question reasonably is to make it a request for a tool recommendation, which makes it [solidly off-topic](https://stackoverflow.com/help/on-topic) for StackOverflow. I suggest you check https://SuperUser.com/. If you get a recommendation for a safe method to achieve this which requires programming, this'll be the place to ask for help. :) – ghoti Feb 18 '19 at 23:47

1 Answers1

3

I have used a shell command to do this.

qpdf --stream-data=uncompress $1 uncompressed.pdf
sed -i "s/("$2")/("$3")/g" uncompressed.pdf
qpdf --stream-data=compress uncompressed.pdf $1

So if this is myShell.sh, then a command line such as,

myShell.sh yourFile.pdf +2 +3

should do it.

VectorVortec
  • 667
  • 7
  • 10
  • You might want leave out the quotes in your `sed` pattern, as in `"s/($2)/($3)/g"`. Otherwise the shell reads the quotation marks as end of the string, which will probably cause trouble, if `$2` or `$3` contains a space character. – creativecoding Jan 19 '21 at 21:10
  • I was searching for a solution to uncompress a PDF without PDFTK. I had `qpdf` installed, so this answer helped me a lot! – creativecoding Jan 19 '21 at 21:12