Palmetto is a tool for measuring the quality of topics. The demo works as follows: simply choose one of the following coherences, put the top words of the topic you would like to test into the input field (space separated, 10 words are the maximum) and let the system calculate the coherence value of the word set.

If you want to know more about Palmetto, please take a look at the project page.

Coherence description

CV is based on a sliding window, a one-set segmentation of the top words and an indirect confirmation measure that uses normalized pointwise mutual information (NPMI) and the cosinus similarity.

This coherence measure retrieves cooccurrence counts for the given words using a sliding window and the window size 110. The counts are used to calculated the NPMI of every top word to every other top word, thus, resulting in a set of vectors—one for every top word. The one-set segmentation of the top words leads to the calculation of the similarity between every top word vector and the sum of all top word vectors. As similarity measure the cosinus is used. The coherence is the arithmetic mean of these similarities. (Note that this was the best coherence measure in our evalution.)

Proposed in
M. Röder, A. Both, and A. Hinneburg: Exploring the Space of Topic Coherence Measures. In Proceedings of the eighth International Conference on Web Search and Data Mining, 2015.

CP is a based on a sliding window, a one-preceding segmentation of the top words and the confirmation measure of Fitelson's coherence.

Word cooccurrence counts for the given top words are derived using a sliding window and the window size 70. For every top word, the confirmation to its preceding top word is calculated using the confirmation measure of Fitelson's coherence. The coherence is the arithmetic mean of the confirmation measure results.

Proposed in
M. Röder, A. Both, and A. Hinneburg: Exploring the Space of Topic Coherence Measures. In Proceedings of the eighth International Conference on Web Search and Data Mining, 2015.

CUCI is a coherence that is based on a sliding window and the pointwise mutual information (PMI) of all word pairs of the given top words.

The word cooccurrence counts are derived using a sliding window with the size 10. For every word pair the PMI is calculated. The arithmetic mean of the PMI values is the result of this coherence. (Note that in the original publication only the sum of these values is calculated)

Proposed in
D. Newman, J. H. Lau, K. Grieser, and T. Baldwin: Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conferenceof the North American Chapter of the Association for Computational Linguistics, pages 100-108. Association for Computational Linguistics, 2010.

CUMass is based on document cooccurrence counts, a one-preceding segmentation and a logarithmic conditional probability as confirmation measure.

The main idea of this coherence is that the occurrence of every top word should be supported by every top preceding top word. Thus, the probability of a top word to occur should be higher if a document already contains a higher order top word of the same topic. Therefore, for every word the logarithm of its conditional probability is calculated using every other top word that has a higher order in the ranking of top words as condition. The probabilities are derived using document cooccurrence counts. The single conditional probabilities are summarized using the arithmetic mean. (Note that in the original publication only the sum of these values is calculated)

Proposed in
D. Mimno, H. M. Wallach, E. Talley, M. Leenders, and A. McCallum: Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 262-272. Association for Computational Linguistics, 2011.

CNPMI is an enhanced version of the CUCI coherence using the normalized pointwise mutual information (NPMI) instead of the pointwise mutual information (PMI).

Proposed in
N. Aletras and M. Stevenson: Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS'13) Long Papers, pages 13-22, 2013.

CA is based on a context window, a pairwise comparison of the top words and an indirect confirmation measure that uses normalized pointwise mutual information (NPMI) and the cosinus similarity.

This coherence measure retrieves cooccurrence counts for the given words using a context window with the window size 5. The counts are used to calculated the NPMI of every top word to every other top word, thus, resulting in a single vector for every top word. After that the cosinus similarity between all word pairs is calculated. The coherence is the arithmetic mean of these similarities. (Note that in the original publication several other coherence measures have been described. We have chosen this one because it was the best of these measures in our evaluation)

Proposed in
N. Aletras and M. Stevenson: Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS'13) Long Papers, pages 13-22, 2013.

words coherence type result