Finding every third word in a sentence and replacing only its letters with the # symbol

Question

This is my code:

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for word in redacted_sentence[2::3]:
        for i in word:
            if i.isalpha():
                word.replace(i, '#')
            else:
                continue
    return " ".join(redacted_sentence)

if the input is sentence = "You cannot drink the word 'water'." the output should be "You cannot ##### the word '#####'."

The output I get is just a list of the words in my input.

Thanks @MattDMo but that only helps make the output into a sentence instead of a list. The # symbols are still not there. — Thabo Thandekiso, Aug 22 '22 at 16:57
@ThaboThandekiso: You forgot to reassign and replace the actual `word`. `word.replace(i, '#')` constructs a replaced string, but leaves the string in `word` unchanged. And even if you changed `word`, it wouldn't change the `str` in `redacted_sentence` at that index. — ShadowRanger, Aug 22 '22 at 17:11
@ThaboThandekiso: I posted [an answer](https://stackoverflow.com/a/73449027/364696) with a couple options (one hewing as close as possible to your code, the other optimizing it significantly). — ShadowRanger, Aug 22 '22 at 17:45

score 0 · Answer 1 · answered Aug 22 '22 at 17:38

You've got two issues that sum to the same problem: Creating and discarding mutated copies of the string, while leaving the original untouched. str.replace must be assigned somewhere to be useful (usually reassigning word in this case), but also, to update the original list, you must reassign that index in the list (word is a separate alias to the object in the list, but reassigning word just rebinds word and breaks the aliasing, it doesn't change the contents of the list). So the solution is:

Keep the results from each replace operation
Put the final result back into the list at the same location

The minimalist modification to your code that achieves this result while still following the same basic design is:

from itertools import count  # So we can track the index to perform replacement at

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        for c in set(word):  # Change name to c; i is for indices, not characters/letters
                             # For minor efficiency gain, dedupe word so we don't replace same char over and over
            if c.isalpha():
                word = word.replace(c, '#')  # Reassign back to word so change not lost
        redacted_sentence[i] = word  # Replace original word in list with altered word
    return " ".join(redacted_sentence)

A faster solution would replace the inner loop with a single-pass regex substitution or (if only ASCII need be handled) str.translate call, replacing O(n²) work per word with O(n) work, e.g.:

import re
from itertools import count  # So we can track the index to perform replacement at

# Precompile regex that matches only alphabetic characters and bind its sub method
# while we're at it    
replace_alpha = re.compile(r'[^\W\d_]').sub

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        # Replace every alphabetic character with # in provided word and replace
        # list's contents at same index
        redacted_sentence[i] = replace_alpha('#', word)
    return " ".join(redacted_sentence)

I have never used itertools before. This just opened up my thinking of looping. Thank you — Thabo Thandekiso, Aug 22 '22 at 17:47
@ThaboThandekiso: Yeah, the `itertools` module has a *ton* of useful features in it. In this case, it's basically just using `itertools.count`+`zip` to handle the fact that `enumerate` doesn't allow you to use a custom `step` along with the custom `start`, so it's not that impressive, but `itertools` offers a *lot* more than that. If you learn it well, then along with `enumerate` and `zip` (and using list comprehensions over loops in some cases) you basically never need to write an unPythonic `for i in range(len(someseq)):` style loop. — ShadowRanger, Aug 22 '22 at 17:49

Shmack · Answer 2 · 2022-08-23T17:44:56.567

-1

Here is a way you can do it.

def redact_words(sentence, red_acted_words=None):
    if red_acted_words is None:
        red_acted_words = sentence.split()[2::3]
    for  rword in red_acted_words:
        nword = ""
        for c in rword:
            if c.isalpha():
                nword = nword + c
        j = "#" * len(nword)
        sentence = sentence.split(nword)
        sentence = j.join(sentence)
    return sentence

redact_words("You cannot drink the word 'water'.", red_acted_words=["water", "drink"])
redact_words("You cannot drink the word 'water'.")

def redact_words(sentence):
    red_acted_words = sentence.split()[2::3]
    for rword in red_acted_words:
        nword = ""
        for c in rword:
            if c.isalpha():
                nword = nword + c
        j = "#" * len(nword)
        sentence = sentence.split(nword)
        sentence = j.join(sentence)
    return sentence

redact_words("You cannot drink the word 'water'.")

edited Aug 23 '22 at 17:44

answered Aug 22 '22 at 17:09

Shmack

1,933
2
18
23

The redacted words are supposed to be detected automatically - every third word. – MattDMo Aug 22 '22 at 17:15
@MattDMo okay - try the new function. – Shmack Aug 22 '22 at 17:34
It doesn't work when there are repeated words in the sentence. Try `"This is a a sentence sentence with repeated repeated words in the sentence."` This is because you're using `list.index(word)`, which only returns the *first occurrence* of an entry in a list. – MattDMo Aug 22 '22 at 17:38
@MattDMo fair point... but you could easily just "detect" them like in my first function. Your first comment literally doesn't make any sense, given passing sentence.split()[2::3] as a kwarg would handle it the exact same as it would after I changed my code. – Shmack Aug 22 '22 at 17:54

MattDMo · Answer 3 · 2022-08-22T17:46:50.757

We can accomplish this with a little Python magic:

def redact_words(sentence):
    redacted_sentence = []
    sentence = sentence.split()
    for pos, word in enumerate(sentence, start=1):
        if pos % 3 == 0:
            word = "".join("#" if letter.isalpha() else letter for letter in word)
        redacted_sentence.append(word)
    return " ".join(redacted_sentence)

First, we create a list to contain the words of the new sentence. Next, after splitting the sentence into a list, we use enumerate to generate the positions of each word in the sentence, along with the word itself. By starting at 1, we can use the modulus operator to see if the position is evenly divisible by 3. If so, we use a comprehension to replace all the alphabetical characters in word with #, leaving the other characters alone, then reassign the results back to word. Finally, we append the word to the redacted_sentence list, regardless of whether it's been changed, and return a string with all the words joined together with a space.

Thank you. This makes complete sense to my novice coding brain — Thabo Thandekiso, Aug 22 '22 at 17:47

Finding every third word in a sentence and replacing only its letters with the # symbol

3 Answers3