Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Can someone help me with the Python Vocabulary module



  • This is not specific to QT but I thought I would ask here. What happens is at some point the values returned are all empty and I do not know why.
    This is my code:

    import nltk
    from nltk.tokenize import  PunktSentenceTokenizer
    from vocabulary.vocabulary import Vocabulary as vb
    nltk.download('punkt')
    nltk.download('averaged_perceptron_tagger')
    with open('C:\\Users\\damon\\Documents\\text_generator\\textbase.txt') as f:
        textbas = f.read().lower()
    custom_tokenizer = PunktSentenceTokenizer(textbas)
    mytokens = custom_tokenizer.tokenize(textbas)
    #sys.stdout = open('C:\\Users\\damon\\Documents\\wordclass.txt', 'w')
    word = ''
    mpos = ''
    keyw = chr(34) + "text" + chr(34) + ": " + chr(34)
    compw = ''
    mcopy = False
    for i in mytokens:
        words = nltk.word_tokenize(i)
        tagged = nltk.pos_tag(words)
        for part in tagged:
            print(part[0] + ":" + part[1])
            mtags = []
            if part[0] != "." and part[0] != ",":
                mean = vb.part_of_speech(part[0])
                ctag = ''
                if mean:
                    for i in mean:
                        if mcopy:
                            if i == chr(34):
                                if ctag in mtags:
                                    pass
                                else:
                                    mtags.append(ctag)
                                mcopy = False
                            else:
                                ctag += i
                        if len(compw) < 9:
                            compw += i
                        else:
                            compw = compw[1:9] + i
                        if compw == keyw:
                            mcopy = True
                            ctag = ''
                print(mtags)
    #sys.stdout.close()
    

  • Lifetime Qt Champion

    Hi,

    Without more details the only answers that comes to mind is that you exhausted your input or that the tokenizers do not find anything anymore.

    However that's rather a question for the nltk folks.


Log in to reply