Unsolved Can someone help me with the Python Vocabulary module
-
This is not specific to QT but I thought I would ask here. What happens is at some point the values returned are all empty and I do not know why.
This is my code:import nltk from nltk.tokenize import PunktSentenceTokenizer from vocabulary.vocabulary import Vocabulary as vb nltk.download('punkt') nltk.download('averaged_perceptron_tagger') with open('C:\\Users\\damon\\Documents\\text_generator\\textbase.txt') as f: textbas = f.read().lower() custom_tokenizer = PunktSentenceTokenizer(textbas) mytokens = custom_tokenizer.tokenize(textbas) #sys.stdout = open('C:\\Users\\damon\\Documents\\wordclass.txt', 'w') word = '' mpos = '' keyw = chr(34) + "text" + chr(34) + ": " + chr(34) compw = '' mcopy = False for i in mytokens: words = nltk.word_tokenize(i) tagged = nltk.pos_tag(words) for part in tagged: print(part[0] + ":" + part[1]) mtags = [] if part[0] != "." and part[0] != ",": mean = vb.part_of_speech(part[0]) ctag = '' if mean: for i in mean: if mcopy: if i == chr(34): if ctag in mtags: pass else: mtags.append(ctag) mcopy = False else: ctag += i if len(compw) < 9: compw += i else: compw = compw[1:9] + i if compw == keyw: mcopy = True ctag = '' print(mtags) #sys.stdout.close()
-
Hi,
Without more details the only answers that comes to mind is that you exhausted your input or that the tokenizers do not find anything anymore.
However that's rather a question for the nltk folks.