Faster Word Co-Occurrence Calculation In Large Document Corpus

Topic Modeling

NPMI

  1. How many documents wi occurs in;
  2. How many documents wj occurs in;
  3. How many documents both wi and wj occurs in.
Notice how vanilla NPMI has a constant space complexity, while memoing uses approx. 80KB of space for 10 words.

Introducing Matrices

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Miroslav Tushev

Miroslav Tushev

CS PhD @ LSU. Passionate about statistics, ML, and NLP.