2

I'm trying to create a one line regex array value that splits on hyphens and underscores and also splits on (and remembers) groups of numbers (one or more). I wrote a regex that achieves this, but it also inserts both empty strings and undefined into the resulting array https://jsfiddle.net/vdmom1qL/

var x = '232as-df98_rew_98_9fg9-dd988fff.jpg';

console.log(x.split(/(?:[-._])|(\d+)/));

//the output
//["", "232", "as", undefined, "df", "98", "", undefined, "rew", undefined, "", "98", "", undefined, "", "9", "fg", "9", "", undefined, "dd", "988", "fff", undefined, "jpg"]

What exactly am I doing wrong here? I mean the regex seems to be logically correct (from a readable standpoint it makes sense), but the empty strings and undefined's are very strange

UPDATE this is more about what is wrong with the regex then about just removing empty indexes from an array. the main point I'm trying to drive home is: what is wrong with this regex that makes it create those empty strings and undefineds?

zero
  • 2,999
  • 9
  • 42
  • 67
  • 1
    You can remove them along with those empty strings like this: `var res = x.split(/(?:[-._])|(\d+)/).filter(e => e);`. – ibrahim mahrir Jul 12 '17 at 21:51
  • that is cool, but what exactly is happening with the regex? – zero Jul 12 '17 at 21:53
  • because it seems like the regex is inefficient – zero Jul 12 '17 at 21:55
  • 2
    @wiktorStribizew that is not what the user is asking us about. The question is related to `split` not to empty strings in an array. – ibrahim mahrir Jul 12 '17 at 21:59
  • @ibrahimmahrir: `split` produces an array. – Wiktor Stribiżew Jul 12 '17 at 22:01
  • 1
    @WiktorStribiżew but the question is **why?** not **how?**. My comment is just a quick fix untill someone post an explanation (which will be the correct answer). – ibrahim mahrir Jul 12 '17 at 22:02
  • 2
    @WiktorStribiżew yes, but this question is about fixing what is wrong with the regex's semantics so that it won't produce empty strings nor undefined's – zero Jul 12 '17 at 22:03
  • It is all has been explained millions of times. `split` splits text on matched text. If there are capturing groups, the match is added to the result, but there is an empty string between the match and the previous split chunk, hence empty strings. undefined is returned when the capturing group does not match at the current iteration. So, use matching regex: `s.match(/\d+|[^\d_.-]+/g)`, I wrote the same regex as anubhava, but the question is a dupe, so did not post it. – Wiktor Stribiżew Jul 12 '17 at 22:06
  • 1
    @WiktorStribiżew _It is all has been explained millions of times._ If so then close it as a duplicate of one of those questions, not something that's only tangentially related. Closing it as a duplicate is not just un-useful, but harmful if the original question only addresses the side-effects of the current question, rather than the root of the problem. – Patrick Roberts Jul 12 '17 at 22:13
  • 1
    @PatrickRoberts I added 2, I can keep on... – Wiktor Stribiżew Jul 12 '17 at 22:14
  • @WiktorStribiżew just saw that. That's better. – Patrick Roberts Jul 12 '17 at 22:16

2 Answers2

5

Instead of split, you can use match by reversing the regex a bit:

var x = '232as-df98_rew_98_9fg9-dd988fff.jpg'
console.log(x.match(/\d+|[^\d_.-]+/g))
//=> ["232", "as", "df", "98", "rew", "98", "9", "fg", "9", "dd", "988", "fff", "jpg"]

Otherwise, you can use .filter(Boolean) for you resulting array to filter out undefined values:

console.log(x.split(/(?:[-._])|(\d+)/).filter(Boolean))
//=> ["232", "as", "df", "98", "rew", "98", "9", "fg", "9", "dd", "988", "fff", "jpg"]
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

The problem is that you have a group (\d+) in a conditionally executed section of your regex, after the |. When this section is not evaluated to match the string which is split on, it inserts an undefined into the array, since the group was never defined in that particular match.

To confirm this, notice that every other value in the array (with an odd index) is either a string of digits, or undefined.

The empty strings occur when you have a hyphen, underscore, or beginning or end of string adjacent to a string of digits - this part at least seems intuitive as to why it's happening.

Patrick Roberts
  • 49,224
  • 10
  • 102
  • 153