LDA Model

class core.lda_engine.LdaModelWrapper(filename, force_load=False, np=True, keep_state=True)[source]
get_author_top_topics(author_id, top=10)[source]

Generates the top N relevant topics of an author in our database.

  • author_id – the author’s ID in our database.
  • top – Number of topics to be returned.

a NumPy array of topics IDs and their confidence levels.


Given a topic ID in the model, generates a list of terms.

Parameters:topic_id – The topic’s ID in the model.
Returns:A list of terms.
get_topic_in_string(topic_id, top=5)[source]

Given a topic ID in the model, generates a string representation of that topic.

  • topic_id – The topic’s ID in the model.
  • top – Top N relevant terms.

A string representation of the topic.

get_topics_in_string(topics, confidence=False)[source]

Converts a list of topics (with or without confidence levels) to a list of strings encoded in a dict.

  • topics – The list of topics to be converted.
  • confidence – If the input topics contains confidence levels, make sure this is set to True.

a list of dictionary that includes string representations (or with confidence levels)


Predicts topics from a raw text string.

Parameters:text – Raw text string.
Returns:a NumPy array of topics IDs and their confidence levels.

Turns a pure text to a bag of words using the dictionary of a trained LDA model.

Parameters:text – Raw text string.
Returns:A bag of words.

Matching Algorithm

core.matching.lda.detailed_results(results, model_name)[source]

Retrieves matched author (aka reviewers) details.

  • results – a dictionary generated by match_by_lda with detailed=True
  • model_name – the name of the current model

a dictionary of author details. You’ll see an example output when you run on the demo model.

core.matching.lda.match_by_lda(text, model_name, top=50, detailed=True, scoring_impl='default', base=0)[source]

Gives the best matching result given a string of raw text.

  • text – The text to be matched.
  • model_name – The name of the LDA model to be used.
  • top – the maximum number of results to be returned.
  • detailed – return a detailed result. It should always be True unless it is used outside the web app.
  • scoring_impl – the scoring implementation to be used.
  • base – The initial value of the vector.

the matched result in dictionary form if detailed=True. Otherwise it will return a matrix with author id and the score.

core.matching.lda.score(paper_vec, author_vec, method)[source]

Scores a paper-author match.

  • paper_vec – the vector to be matched
  • author_vec – the vector to be scored against (usually it is the vector of an author)
  • method – the name of scoring implementation

a scalar measuring the score of the match