Getting used to regex here.
I have a file in the structure of
word1 word2 word3 word4 word5 "word6" "word7"
word1 word2 word3 word4 word5 "word6" "word7"
word1 word2 word3 word4 word5 "word6" "word7"
...
which I want to capture into:
arr[0] = word1
arr[1] = word2
arr[2] = word3
arr[3] = word4
arr[4] = word5
arr[5] = word6
arr[6] = word7
My regex is: (?m)(.* )(.* )(.* )(.* )(.* )(".*") (".*")
Now I'm sure there is a more elegant way to write this where I don't have to repeat the same sequence multiple times.
My understanding is something like this should work?
(?:(.* )*|(".*")*)
I believe (?:(.* )|(".*"))
means match EITHER .*
or ".*"
and the *
at the end of (.* )
and (".*")
forming (.* )*
and (".*")*
means match 0 or more times. This should do the same thing as my working regex no?
Thoughts?
EDIT After reading everything, I was simply trying to shorten my regex by capturing based on (.) or \"(.)\" without specifying the number of times the capturing will occur which is not possible. thank you!
the correct regex: (?m)(.*) (.*) (.*) (.*) (.*) \"(.*)\" \"(.*)\"