1

I wanna check if a url string match keyword, for example, keyword is google.com, if url string is google.com or https://google.com, then return true, if url is google.com/search or something like that, return true, if url is google.com.id, then return false as it's a different url, I tried one as below but it doesn't work, how to write regular expression? thank u

regexp.MatchString(`^(?:https?://)?([-a-z0-9]+)(?:\.`+keyword+`)*$`, urlstr)

btw, as far as I understood, regular expression will cause some performance issue, anyone can provide other solutions to handle it?

Frank
  • 977
  • 3
  • 14
  • 35

1 Answers1

3

You can use

regexp.MatchString(`^(?:https?://)?(?:[^/.\s]+\.)*` + regexp.QuoteMeta(keyword) + `(?:/[^/\s]+)*/?$`)

See the regex demo.

Details:

  • ^ - start of string
  • (?:https?://)? - an optional http:// or https://
  • (?:[^/\s]+\.)* - zero or more repetitions of
    • [^/.\s]+ - one or more chars other than /, . and whitespace
    • \. - a dot
  • google\.com - an escaped keyword
  • (?:/[^/\s]+)* - zero or more repetitions of a / and then one or more chars other than / and whitespace chars
  • /? - an optional /
  • $ - end of string

Note you need to use regexp.QuoteMeta to escape any special chars in the keyword, like a . that matches any char but line break chars by default.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • thank u, seems works, if there will be some performance issue on regular expression? any other solutions to handle it? – Frank Apr 19 '21 at 10:14
  • @Frank I think this regex is efficient enough not to cause any performance problems. Note I assumed there will be no spaces in the URL. If you need to support whitespace, remove all `\s`s. – Wiktor Stribiżew Apr 19 '21 at 10:19
  • understood, thank u, let me do benchmark for it – Frank Apr 19 '21 at 11:07
  • hi got a bug, if keyword is `https://google.com` which contains `https`, but urlstr is `google.com` which not contains `https`, result will be false, but expected should be true. how to change above expression? – Frank Apr 20 '21 at 03:33
  • @Frank This is a problem with your data/task definition. If you need to handle URLs with protocol or without in keywords, make sure you only check the parts without protocol, simply remove it with `strings.Replace(strings.Replace(urlstr, "https://", "", 1), "http://", "", 1)` before checking. – Wiktor Stribiżew Apr 20 '21 at 08:30