0

I am trying to compare two PDF files using UTF-8, but I get the error "Invalid encoding" when I execute the code below:

 encoding = 'utf-8'
 base_path = set_up
 tear_down do
   f1 = File.read("#{TMP_DIR}/#{current_file_name}", encoding: encoding)
   f2 = File.read("#{base_path}/#{expected_file_name}", encoding: encoding)
   expect(f1).to eql f2
 end

I tried to use:

f1.force_encoding("UTF-8")
f2.force_encoding("UTF-8")

I tried this too:

f1.force_encoding("BINARY")

but, I get another error:

Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
MTO71
  • 21
  • 5
  • Personally I'd do this using a command-line tool, like `diff`. See https://stackoverflow.com/a/12118413/128421. It's easy, fast, and sidesteps the encoding problem. – the Tin Man Feb 25 '20 at 21:28

2 Answers2

2

Instead of comparing the file as strings I would just compare the files' MD5 hash:

require 'digest'

tear_down do
  md5_1 = Digest::MD5.file("#{TMP_DIR}/#{current_file_name}")
  md5_2 = Digest::MD5.file("#{base_path}/#{expected_file_name}")

  expect(md5_1).to eql md5_2
end
spickermann
  • 100,941
  • 9
  • 101
  • 131
  • Thank you spickermann. But i have this diff with your code : `RSpec::Expectations::ExpectationNotMetError: expected: # got: # ` Does this method compare text and format? – MTO71 Feb 26 '20 at 09:00
  • This method compares on a binary level - every single bit needs to be the same. Basically like your method did but yours translated the binary to a string text before comparing every single character (text and all formating in the file). – spickermann Feb 26 '20 at 09:12
  • Thx spickermann. So what is the solution ? If I understand correctly, should I transform the files into binary before comparing them? – MTO71 Feb 26 '20 at 09:34
  • Depends on what you actually want to compare. Do the files need to be identical or do they just need to include the same text? – spickermann Feb 26 '20 at 09:42
  • I have to compare that the content is the same in the 2 pdf files. is this code did the same that yours ? ``` base_path = set_up tear_down do md5_1 = `cat "#{TMP_DIR}/#{current_file_name}" | grep -a -v "/CreationDate" | md5sum`.strip md5_2 = `cat "#{base_path}/#{expected_file_name}" | grep -a -v "/CreationDate" | md5sum`.strip expect(md5_1).to eql md5_2 ``` – MTO71 Feb 26 '20 at 10:19
1

Thank you @spickermann for your help this working fine for me after deleting the diff "CreationDate" between the 2 pdf files:

base_path = set_up
tear_down do
  md5_1 = `cat "#{TMP_DIR}/#{current_file_name}" | grep -a -v "/CreationDate" | md5sum`.strip
  md5_2  = `cat "#{base_path}/#{expected_file_name}" | grep -a -v "/CreationDate" | md5sum`.strip
  expect(md5_1).to eql md5_2
end
MTO71
  • 21
  • 5