1

I try to write a python function that counts a specific word in a string.

My regex pattern doesn't work when the word I want to count is repeated multiple times in a row. The pattern seems to work well otherwise.

Here is my function

import re

def word_count(word, text):
    return len(re.findall('(^|\s|\b)'+re.escape(word)+'(\,|\s|\b|\.|$)', text, re.IGNORECASE))

When I test it with a random string

>>> word_count('Linux', "Linux, Word, Linux")
2

When the word I want to count is adjacent to itself

>>> word_count('Linux', "Linux Linux")
1
wjandrea
  • 28,235
  • 9
  • 60
  • 81
asl
  • 471
  • 2
  • 4
  • 13

2 Answers2

2

Problem is in your regex. Your regex is using 2 capture groups and re.findall will return any capture groups if available. That needs to change to non-capture groups using (?:...)

Besides there is reason to use (^|\s|\b) as \b or word boundary is suffice which covers all the cases besides \b is zero width.

Same way (\,|\s|\b|\.|$) can be changed to \b.

So you can just use:

def word_count(word, text):
     return len(re.findall(r'\b' + re.escape(word) + r'\b', text, re.I))

This will give:

>>> word_count('Linux', "Linux, Word, Linux")
2
>>> word_count('Linux', "Linux Linux")
2
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thank you for your response! I just edited my question to be more precise. I want to count a word that may be repeated *multiple* times in a row. So word_count('Linux', 'Linux Linux Linux') would return 3. – asl Mar 20 '20 at 20:13
  • I had a typo which is fixed. This will return `3` for `word_count('Linux', 'Linux Linux Linux')` – anubhava Mar 20 '20 at 20:16
0

I am not sure this is 100% because I don't understand the part about passing the function the word to search for when you are just looking for words that repeat in a string. So maybe consider...

import re

pattern = r'\b(\w+)( \1\b)+'

def word_count(text):
    split_words = text.split(' ')
    count = 0
    for split_word in split_words:
        count = count + len(re.findall(pattern, text, re.IGNORECASE))
    return count

word_count('Linux Linux Linux Linux')

Output:

4

Maybe it helps.

UPDATE: Based on comment below...

def word_count(word, text):
    count = text.count(word)
    return count

word_count('Linux', "Linux, Word, Linux")

Output:

2
MDR
  • 2,610
  • 1
  • 8
  • 18
  • OP is looking to "count a specific word in a string". For example `"Linux"` occurs in `"Linux, Word, Linux"` twice, so the function should return 2. – wjandrea Mar 20 '20 at 21:05
  • Updated the answer. Maybe that's useful? – MDR Mar 20 '20 at 22:13
  • That's counting substrings, not words. E.g. `word_count('race', 'racer')` is 1, but should be 0. – wjandrea Mar 20 '20 at 22:15
  • If you really wanted to use a `.count` method, you could split the string into a list, e.g. `re.split(r'\W+', text)`, but it makes it harder to do a case-insensitive search. – wjandrea Mar 20 '20 at 23:09