Qt World Summit: Submit your Presentation

Can someone help me with the Python Vocabulary module

  • This is not specific to QT but I thought I would ask here. What happens is at some point the values returned are all empty and I do not know why.
    This is my code:

    import nltk
    from nltk.tokenize import  PunktSentenceTokenizer
    from vocabulary.vocabulary import Vocabulary as vb
    with open('C:\\Users\\damon\\Documents\\text_generator\\textbase.txt') as f:
        textbas = f.read().lower()
    custom_tokenizer = PunktSentenceTokenizer(textbas)
    mytokens = custom_tokenizer.tokenize(textbas)
    #sys.stdout = open('C:\\Users\\damon\\Documents\\wordclass.txt', 'w')
    word = ''
    mpos = ''
    keyw = chr(34) + "text" + chr(34) + ": " + chr(34)
    compw = ''
    mcopy = False
    for i in mytokens:
        words = nltk.word_tokenize(i)
        tagged = nltk.pos_tag(words)
        for part in tagged:
            print(part[0] + ":" + part[1])
            mtags = []
            if part[0] != "." and part[0] != ",":
                mean = vb.part_of_speech(part[0])
                ctag = ''
                if mean:
                    for i in mean:
                        if mcopy:
                            if i == chr(34):
                                if ctag in mtags:
                                mcopy = False
                                ctag += i
                        if len(compw) < 9:
                            compw += i
                            compw = compw[1:9] + i
                        if compw == keyw:
                            mcopy = True
                            ctag = ''

  • Lifetime Qt Champion


    Without more details the only answers that comes to mind is that you exhausted your input or that the tokenizers do not find anything anymore.

    However that's rather a question for the nltk folks.

Log in to reply