LDA model IndexError: index 4963 is out of bounds for axis 1 with size 4963
Today I’m running a LDA model in order to find the dominant topic in each sentence.
for i,row in enumerate(lda_model[corpus_in_address]):
for j,(topic_num,prop_topic) in enumerate(row):
if j==0:
wp = lda_model.show_topic(topic_num)
topic_keywords=",".join([word for word, prop in wp])
print(topic_num)
print(prop_topic)
print(topic_keywords)
sent_topics_df = sent_topics_df.append(pd.Series([int(topic_num), round(prop_topic, 4), topic_keywords]), ignore_index=True)
else:
break
However I got this error
` IndexError: index 4963 is out of bounds for axis 1 with size 4963 ` I’m really confused. and for a long time I can’t find out why. Then I read something in pyLDAvis,stackoverflow. And I realized that might because
dictionary = gensim.corpora.Dictionary(processed_docs_in_address)
dictionary.filter_extremes(no_below=15, no_above=0.5, keep_n=100000)
bow_corpus = [dictionary.doc2bow(doc) for doc in processed_docs_in_address]
lda_model = gensim.models.LdaMulticore(bow_corpus, num_topics=34, id2word=dictionary, passes=2, workers=2)
I adjusted the dictionary, and I created the corpus afterwords, that means some words in the corpus can’t find its index.
It turns out that really is the reason:)
PREVIOUSTruffle version