1

I have this string:

s='''
D. JUAN:
¡Cálmate, pues, vida mía!
Reposa aquí; y un momento
olvida de tu convento
la triste cárcel sombría.
¡Ah! ¿No es cierto,
ángel de amor,
que en esta apartada orilla
más pura la luna brilla
y se respira mejor?
'''

If I want all the words strarting with a vowel:

import re
print(re.findall(r'\b[aeiouAEIOU]\w*\b', s))

and the output is:

['aquí', 'un', 'olvida', 'Ah', 'es', 'amor', 'en', 'esta', 'apartada', 'orilla']

Now, I try to list all words that do not start with a vowel:

print(re.findall(r'\b[^aeiouAEIOU]\w*\b', s))

and my output is:

['D', 'JUAN', 'Cálmate', 'pues', 'vida', ' mía', 'Reposa', ' aquí', 'y', ' un', ' momento', '\nolvida', ' de', ' tu', ' convento', '\nla', ' triste', ' cárcel', ' sombría', 'No', ' es', ' cierto', 'ángel', ' de', ' amor', 'que', ' en', ' esta', ' apartada', ' orilla', '\nmás', ' pura', ' la', ' luna', ' brilla', '\ny', ' se', ' respira', ' mejor']
Ahmed Sbai
  • 10,695
  • 9
  • 19
  • 38

1 Answers1

0

The [^aeiouAEIOU] negated character class matches any character other than a, e, i, o, u, A, E, I, O and U, so a linefeed char, or a § will also be matched if they are preceded with a word character (a letter, digit or underscore in most cases) as the negated character class is preceded with a \b construct.

So, you need to use

re.findall(r'\b(?![aeiouAEIOU])\w+', s)

where (?![aeiouAEIOU]) negative lookahead will make sure the \w+ only matches one or more word chars where the first char is not equal to the letter inside the character class.

See the regex demo (note that you must select the right engine in the regex101 options).

Note you do not need any \b at the end after \w+, since the word boundary is implied at that position.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Wiktor I really appreciate your help. I am trying to understand what I did wrong, then I will skip to solve it. My steps were: 1) find a word boundary with \b 2) next the non-vowels with [^aeiouAEIOU] 3) find all \w and finally 4) another word boundary with \b. Of course I did something wrong, but can not see it yet... – Francisco Hernández Mar 06 '23 at 16:19
  • @FranciscoHernández *The [^aeiouAEIOU] negated character class matches any character other than a, e, i, o, u, A, E, I, O and U, so a linefeed char, or a § will also be matched if they are preceded with a word character (a letter, digit or underscore in most cases) as the negated character class is preceded with a \b construct.* What is not clear? – Wiktor Stribiżew Mar 06 '23 at 19:34
  • @FranciscoHernández Did you get the gist about the "any character"? `[^aeiouAEIOU]` matches `§`, `—`, etc. Not just "letter that is not a vowel". Did it finally work for you? – Wiktor Stribiżew Mar 15 '23 at 20:36