2

I need to split up a string into single words, but there are some cases which should not be splitted.

An example for type I string
An example for degree II string

So every type | degree + I | II | III | IV | V should be kept as a string

The result of the example strings should be

['An', 'example', 'for', 'type I', 'string']
['An', 'example', 'for', 'degree II', 'string']

In my regex I have to search for type or degree, followed by space, followed by a string with characters I or V with maximum length of 3. Those matches should not be splited.

consr regex = '/(type|degree)\s(I{1,3}|V{1})/' // <-- regEx is wrong as it is not working
const result = string.split(' ')

I'm not quite sure how to use the regex in combination with splitting in a way, that all matches are exceptions for splitting by space character.

user3142695
  • 15,844
  • 47
  • 176
  • 332
  • `(I{1,3}|V{1})` means *“either (from one to three `I`) or (exactly one `V`)”* – spectras Sep 20 '17 at 08:19
  • 2
    You might want to support all Roman numbers - [regex demo](https://regex101.com/r/VFlHbD/2) (I shortened [this Roman number regex](https://stackoverflow.com/questions/267399/how-do-you-match-only-valid-roman-numerals-with-a-regular-expression) a bit). – Wiktor Stribiżew Sep 20 '17 at 08:21
  • @WiktorStribiżew That regex is great. Could you provide an answer how to use this with `split()`? – user3142695 Sep 20 '17 at 08:24
  • Wrap with `(...)`. See https://jsfiddle.net/fu63Lyz5/. I think you will also need to match that as whole words, hence, I added `\b` in the demo. – Wiktor Stribiżew Sep 20 '17 at 08:24
  • Thanks. But please have a look at the example shown in the post. I need all words splitted - the regex should match to the exceptions, which should not be splitted by itself. Your JsFiddle gives an array with three items... – user3142695 Sep 20 '17 at 08:28
  • Then why split? Match! https://jsfiddle.net/fu63Lyz5/1/ / https://regex101.com/r/VFlHbD/4 – Wiktor Stribiżew Sep 20 '17 at 08:29
  • Oh... Äh... yes, that was what I wanted to say :-) Please post it as an answer, then I can accept it. – user3142695 Sep 20 '17 at 08:31

1 Answers1

2

You may match the words type and degree followed with any Roman number or any 1+ non-whitespace chars with

var s = "An example for degree II string";
var rx = /\b(?:type|degree)\s+M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})\b|\S+/g;
console.log(s.match(rx));

I borrowed and shortened the Roman number regex from here. The pattern matches

  • \b - a word boundary
  • (?:type|degree) - a non-capturing group matching either type or degree substrings
  • \s+ - 1 or more whitespaces
  • M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3}) - the Roman number regex
  • \b - a trailing word boundary (this will make sure at least 1 Roman number is present)
  • | - or
  • \S+ - 1 or more non-whitespace chars.

Note that in case any symbol or punctuation char is present in front of the degree or type words, it will be matched with \S+ branch, so you need to handle those cases before applying this regex.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563