2

Practicing some regex. Trying to only get Regular, Expressions, and abbreviated from the below data

Regular Expressions, abbreviated as Regex or Regexp, are a string of characters created within the framework of Regex syntax rules.

With (\w+\S?), I get all words including a nonwhitespace character if present.

How would I get just Regular, Expressions, , and abbreviated ?

Edit:

To clarify, I'm looking for Regex Expressions, abbreviated separately without spaces

not Regex Expressions, abbreviated (spaces included here)

  • I've viewed https://stackoverflow.com/questions/21345973/regex-to-extract-first-3-words-from-a-string, but (\w+\S?){3} does not work.. – jakechowder Nov 04 '22 at 20:56
  • `^(?:\S+\s+){2}\S+` for the whole match, or 3 capture groups for separated parts `^(\S+)\s+(\S+)\s+(\S+)` https://regex101.com/r/5GlJGP/1 – The fourth bird Nov 04 '22 at 21:00
  • @MikeM. I'm not trying to match anything `\w\s`. That question mentions adding `{3}`. Adding that did not help. – jakechowder Nov 04 '22 at 21:45
  • If you mean [`(?<=^(?:\S+\s+){0,2})\S+`](https://regex101.com/r/tDIJ2p/2) this requires lookbehind support of variable length. (e.g. .NET/C#, JS depending on browser) – bobble bubble Nov 05 '22 at 14:20

2 Answers2

2

Regex can't "select". It can only match and capture.

This captures the first 3 words (including optional trailing comma) as groups 1, 2 and 3:

^(\w+,?)\s+(\w+,?)\s+(\w+,?)

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • it seems like he is saying the specific words: regular, expression, and abbreviated separated by spaces rather than three words separated by a comma and space – Golden Lion Nov 04 '22 at 21:08
  • That's correct. Looking to match 'regular' 'expression,' and 'abbreviated' without spaces – jakechowder Nov 04 '22 at 21:26
  • @jakechowder impossible with regex alone. You need a tool or app code to make the replacement. – Bohemian Nov 05 '22 at 00:03
2

as @Bohemian has pointed out, in regex you cannot select but rather capture. If the Regex implementation that you use supports it, then captured group will be returned as part of the match. For example in JS this will happen giving you the results separated.

Capturing groups are created by grouping in parenthesis the part of the match that you want to take out

To match those three specific words the regex would be the following

/(Regular) (Expressions), (abbreviated)/

Note that the words you care about are inside the parenthesis, while the parts of the string you don't want (like spaces and comas) are outside the string

You would use it like this (javascript code)

const string = "Regular Expressions, abbreviated as Regex or Regexp, are a string of characters created within the framework of Regex syntax rules." 
const regex = /(Regular) (Expressions), (abbreviated)/; 
string.match(regex); // returns [ "Regular Expressions, abbreviated", "Regular", "Expressions", "abbreviated" ]

Note that in the result the first element is the whole match, and the 2nd, 3rd and 4rh element are your capture groups that you can use as if you had selected them from the string

To match any three words separated by space or coma you could use

/(\w+),?\s?(\w+),?\s?(\w+),?\s?/

\w represents a char \s represents a space ? indicates that there might be 0 or 1 ocurrence of what is previews and finally the parenthesis group the word and leave out everything else the same as the example above

You would use it like this (javascript code)

const string = "Regular Expressions, abbreviated as Regex or Regexp, are a string of characters created within the framework of Regex syntax rules." 
const regex = /(\w+),?\s?(\w+),?\s?(\w+),?\s?/; 
string.match(regex); // returns [ "Regular Expressions, abbreviated", "Regular", "Expressions", "abbreviated" ]
Daniel Cruz
  • 1,437
  • 3
  • 5
  • 19