0

I am writing a method, show_spelling_errors(), that loops through custom objects kept in a list self.tokens and uses properties of each object to change the color of some font in a Qt TextEdit widget. show_spelling_errors() is called by another method which is connected to the textChanged signal in Qt so it will run any time the user types into the widget. The method definition is shown below:

def show_spelling_errors(self):
    try:
        cursor = self.textEdit.textCursor()
        incorrect_format = cursor.charFormat()
        incorrect_format.setForeground(QtCore.Qt.GlobalColor.red)
        for token in self.tokens:
            if not token.is_spelled_correctly:
                print("is spelled correctly: " + str(token.is_spelled_correctly))
                cursor.movePosition(cursor.MoveOperation.Start, cursor.MoveMode.MoveAnchor)
                cursor.movePosition(cursor.MoveOperation.NextWord, cursor.MoveMode.MoveAnchor, token.word_index)
                cursor.movePosition(cursor.MoveOperation.EndOfWord, cursor.MoveMode.KeepAnchor)
                print("selection start " + str(cursor.selectionStart()))
                print("selection end  " + str(cursor.selectionEnd()))
                # cursor.setCharFormat(incorrect_format)
                cursor.clearSelection()

If I run the code exactly as above, things work as expected. However, if I uncomment the penultimate line (which is what actually should change the font color), The loop no longer terminates but rather loops endlessly over the first member of self.tokens and then eventually causes a stack overflow. I'm very confused that the inclusion of this statement in the loop can cause the behavior of the loop (which I thought should be unrelated?) to change this way.

Edit: Below is the code needed to reproduce this behavior

from PyQt6.QtWidgets import QApplication, QWidget, QTextEdit, QVBoxLayout
from PyQt6 import QtCore
import sys
import re
import traceback
from spellchecker import SpellChecker


class Token:
    def __init__(self, chars, spell, start=0, word_index=0):
        self.word_index = word_index
        self.content = chars
        self.token_length = len(chars)
        self.start_pos = start
        self.end_pos = start + self.token_length
        self.is_spelled_correctly = len(spell.unknown([chars])) < 1

class TextEditDemo(QWidget):
    def __init__(self, parent=None):
        super().__init__(parent)

        self.setWindowTitle("QTextEdit")
        self.resize(300, 270)
        self.spell = SpellChecker()
        self.textEdit = QTextEdit()

        layout = QVBoxLayout()
        layout.addWidget(self.textEdit)
        self.setLayout(layout)

        self.tokens = None
        self.textEdit.textChanged.connect(self.handle_text)

    def handle_text(self):
        self.tokenize_text()
        self.show_spelling_errors()

    def show_spelling_errors(self):
        try:
            cursor = self.textEdit.textCursor()
            incorrect_format = cursor.charFormat()
            incorrect_format.setForeground(QtCore.Qt.GlobalColor.red)
            for token in self.tokens:
                if not token.is_spelled_correctly:
                    print("is spelled correctly: " + str(token.is_spelled_correctly))
                    cursor.movePosition(cursor.MoveOperation.Start, cursor.MoveMode.MoveAnchor)
                    cursor.movePosition(cursor.MoveOperation.NextWord, cursor.MoveMode.MoveAnchor, token.word_index)
                    cursor.movePosition(cursor.MoveOperation.EndOfWord, cursor.MoveMode.KeepAnchor)
                    print("selection start " + str(cursor.selectionStart()))
                    print("selection end  " + str(cursor.selectionEnd()))
                    cursor.setCharFormat(incorrect_format)
                    cursor.clearSelection()

        except:
            traceback.print_exc()

    def tokenize_text(self):
        try:
            print("tokenizing...")
            text = self.textEdit.toPlainText()
            text_seps = re.findall(' .', text)
            current_pos = 0
            start_positions = [current_pos]
            for sep in text_seps:
                current_pos = text.find(sep, current_pos) + 1
                start_positions.append(current_pos)

            self.tokens = [
                Token(string, self.spell, start, word_ind) for
                word_ind, (start, string) in
                enumerate(zip(start_positions, text.split()))
            ]
        except:
            traceback.print_exc()

app = QApplication([])
win = TextEditDemo()
win.show()
sys.exit(app.exec())
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
  • @eyllanesc I have added code which can be used to reproduce this behavior. Please let me know if I can add anything else. Thank you. – mpotapenko Jun 07 '21 at 00:34

1 Answers1

1

Explanation:

If you check the docs for the textChanged signal of QTextEdit:

void QTextEdit::textChanged() This signal is emitted whenever the document's content changes; for example, when text is inserted or deleted, or when formatting is applied.

Note: Notifier signal for property html. Notifier signal for property markdown.

(emphasis mine)

This indicates that it is also triggered when the format is changed so the infinite loop is generated.

Solution:

One possible solution is to block the signals using blockSignals:

def handle_text(self):
    self.textEdit.blockSignals(True)
    self.tokenize_text()
    self.show_spelling_errors()
    self.textEdit.blockSignals(False)

But a more elegant solution is to implement using QSyntaxHighlighter:

import re
import sys
from functools import cached_property

from PyQt6.QtCore import Qt
from PyQt6.QtGui import QSyntaxHighlighter, QTextCharFormat
from PyQt6.QtWidgets import QApplication, QWidget, QTextEdit, QVBoxLayout

from spellchecker import SpellChecker


class SpellSyntaxHighlighter(QSyntaxHighlighter):
    WORD_REGEX = re.compile(
        r"\b[^\d\W]+\b"
    )  # https://stackoverflow.com/a/29375664/6622587

    @cached_property
    def text_format(self):
        fmt = QTextCharFormat()
        fmt.setUnderlineColor(Qt.GlobalColor.red)
        fmt.setUnderlineStyle(QTextCharFormat.UnderlineStyle.SpellCheckUnderline)
        return fmt

    @cached_property
    def spellchecker(self):
        return SpellChecker()

    def highlightBlock(self, text):
        misspelled_words = set()
        for match in self.WORD_REGEX.finditer(text):
            word = text[match.start() : match.end()]
            if len(word) > 1 and self.spellchecker.unknown([word]):
                misspelled_words.add(word)
        for misspelled_word in misspelled_words:
            for m in re.finditer(fr"\b{misspelled_word}\b", text):
                self.setFormat(m.start(), m.end() - m.start(), self.text_format)


class TextEditDemo(QWidget):
    def __init__(self, parent=None):
        super().__init__(parent)

        self.setWindowTitle("QTextEdit")
        self.resize(300, 270)
        self.textEdit = QTextEdit()

        layout = QVBoxLayout(self)
        layout.addWidget(self.textEdit)

        self.highlighter = SpellSyntaxHighlighter(self.textEdit.document())


def main():

    app = QApplication([])
    win = TextEditDemo()
    win.show()
    sys.exit(app.exec())


if __name__ == "__main__":
    main()
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
  • That's a silly mistake I made. I'm new to Qt and didn't see syntax highlighter when perusing the docs. This is a much better solution. Thanks for explaining the issue and for pointing to a better solution! – mpotapenko Jun 07 '21 at 01:55
  • @mpotapenko Your code never works, with my correction that avoids the infinite loop (blockSignals function) everything is marked in red. – eyllanesc Jun 07 '21 at 01:56
  • Yes, that is something I am aware of. My plan was to implement something that turns the words that are spelled correctly back to black. As it is, all words are red because they are misspelled from the beginning and nothing un-highlights them. I didn't get to doing this yet because I ran into the infinite loop thing. With syntax highlighting, there is no reason to worry about it anyway. That said, I plan to calculate some metrics on the words in the document so I will still have to tokenize them and store the results of spellcheck so that they are associated with the words. – mpotapenko Jun 07 '21 at 02:02