1

I have a table with a UK Postcode field and I'm going to be testing for bad formats using regular expressions only even rudimentary tests are failing to work.

I have a value of CM8 2AX.

If I run

select post_code from sor.party_person where 'CM8 2AX' ~ '.*[0-9][A-Z][A-Z]$'

and post_code = 'CM8 2AX';

I get

CM8 2AX

(1 row)

Which seems to indicate my very simple regular expression is correct for the value.

Yet if I now test the same value directly from the database - seemingly the same query in effect, having proven with the and condition that this is the value in the field:

select post_code from sor.party_person where post_code ~ '.*[0-9][A-Z][A-Z]$' and post_code = 'CM8 2AX';

(0 rows)

it fails to match.

What am I missing?

If I remove the $ at the end, it does work, implying the db is seeing something different as the end of the field character and not recognising the last text character as being the last character but this makes no sense to me.

1 Answers1

1

I think that the data type of the column is causing space characters to be added after the post code to the size of the column this is ignored when doing = but apparently not when doing a regex match.

select post_code from sor.party_person where post_code ~ '.*[0-9][A-Z][A-Z] *$' and post_code = 'CM8 2AX';

Should work proving that this is correct.

Edit

I was correct on effect but apparently not cause of = working. The SQL standard says that when comparing two strings of unequal length the shorter one is padded. So post_code = 'CM8 2AX' is actually padding 'CM8 2AX' to the length of the post_code column before doing the comparison.

JGNI
  • 3,933
  • 11
  • 21
  • Thanks - you hit the nail on the head. It's a char column. I never use char fields as they're wasteful and varchars are almost always a more suitable option but another team created that table and I had 'assumed' they'd followed my normal pattern and hadn't thought to check. Always a mistake! Thanks again. – 41jules1967 Mar 07 '19 at 17:08
  • Just another thought - Redshift/postgres seems to be inconsistent around char fields. Length and matching appear to implicitly trim whereas it seems pattern matching, it seems, doesn't. I'd personally view this as a bug as it should treat fields in a consistent manner. – 41jules1967 Mar 07 '19 at 17:31
  • @41jules1967 I think it has something to do with the SQL standard there is a SO question and answers here https://stackoverflow.com/questions/543580/equals-vs-like – JGNI Mar 08 '19 at 09:00