Customizing Topic Extraction¶

We use LDA model for topic extraction.

First you need a trained LDA model using gensim . You can find a demo trained model on our the GitHub repository, or you can train one yourself.

The extraction process consists of two parts:

Tokenizing (potentially includes noun phrase extraction)
Predicting topics with trained LDA Model

Customizing Tokenization¶

The tokenizer is implemented in core.lda_engine.LdaModelWrapper.tokenize() . It is a method of a LDA model since different models may need to tokenize differently (For example, some need noun phrase extraction in addition to tokenizing).

See core.lda_engine.LdaModelWrapper.tokenize() for more details.

Loading Your Own LDA Model¶

If you are training LDA model with gensim , you can load your trained models in FMR by a few lines of configurations. See Installation for details.

Make sure you also have the following components:

Gensim’s .dictionary file, with which you trained the LDA model.
.json file, which stores the profiles of your pool of scholars.

Implementing LDA Model For Other Libraries¶

The core.lda_engine.LdaModelWrapper class serves as an abstraction layer between the rest of the application and the actual LDA model.

If you have other LDA models implemented by other libraries, or even a completely different language, you can rewrite the core.lda_engine.LdaModelWrapper to fit your need.

A minimal working core.lda_engine.LdaModelWrapper should at least consists of the following methods:

core.lda_engine.LdaModelWrapper.predict() : it takes a raw text string and return a NumPy array of topics IDs and their confidence levels.

core.lda_engine.LdaModelWrapper.get_author_top_topics()

core.lda_engine.LdaModelWrapper.get_topic_in_string()

core.lda_engine.LdaModelWrapper.authors_lib : a dictionary that contains the profile of the pool of scholars. It must work in tandem with the matching algorithm. It will be automatically loaded if the configured correctly. See LDA Models for details.

Customizing Topic Extraction¶

Customizing Tokenization¶

Loading Your Own LDA Model¶

Implementing LDA Model For Other Libraries¶

Navigation

Related Topics