.. author: Alan Chen Customizing Topic Extraction ============================ We use LDA model for topic extraction. First you need a trained LDA model using ``gensim`` . You can find a demo trained model on our the GitHub repository, or you can train one yourself. The extraction process consists of two parts: 1. Tokenizing (potentially includes noun phrase extraction) 2. Predicting topics with trained LDA Model Customizing Tokenization ------------------------ The tokenizer is implemented in :meth:`core.lda_engine.LdaModelWrapper.tokenize` . It is a method of a LDA model since different models may need to tokenize differently (For example, some need noun phrase extraction in addition to tokenizing). See :meth:`core.lda_engine.LdaModelWrapper.tokenize` for more details. Loading Your Own LDA Model -------------------------- If you are training LDA model with ``gensim`` , you can load your trained models in FMR by a few lines of configurations. See Installation for details. Make sure you also have the following components: 1. Gensim's ``.dictionary`` file, with which you trained the LDA model. 2. ``.json`` file, which stores the profiles of your pool of scholars. Implementing LDA Model For Other Libraries ------------------------------------------ The :class:`core.lda_engine.LdaModelWrapper` class serves as an abstraction layer between the rest of the application and the actual LDA model. If you have other LDA models implemented by other libraries, or even a completely different language, you can rewrite the :class:`core.lda_engine.LdaModelWrapper` to fit your need. A minimal working :class:`core.lda_engine.LdaModelWrapper` should at least consists of the following methods: - :meth:`core.lda_engine.LdaModelWrapper.predict` : it takes a raw text string and return a NumPy array of topics IDs and their confidence levels. - :meth:`core.lda_engine.LdaModelWrapper.get_author_top_topics` - :meth:`core.lda_engine.LdaModelWrapper.get_topic_in_string` - `core.lda_engine.LdaModelWrapper.authors_lib` : a dictionary that contains the profile of the pool of scholars. It must work in tandem with the matching algorithm. It will be automatically loaded if the configured correctly. See :ref:`LDA Models` for details.