Implementations¶

LDA Model¶

class core.lda_engine.LdaModelWrapper(filename, force_load=False, np=True, keep_state=True)[source]¶

get_author_top_topics(author_id, top=10)[source]¶

Generates the top N relevant topics of an author in our database.

Parameters:	author_id – the author’s ID in our database. top – Number of topics to be returned.
Returns:	a NumPy array of topics IDs and their confidence levels.

get_topic_in_list(topic_id)[source]¶

Given a topic ID in the model, generates a list of terms.

Parameters:	topic_id – The topic’s ID in the model.
Returns:	A list of terms.

get_topic_in_string(topic_id, top=5)[source]¶

Given a topic ID in the model, generates a string representation of that topic.

Parameters:	topic_id – The topic’s ID in the model. top – Top N relevant terms.
Returns:	A string representation of the topic.

get_topics_in_string(topics, confidence=False)[source]¶

Converts a list of topics (with or without confidence levels) to a list of strings encoded in a dict.

Parameters:	topics – The list of topics to be converted. confidence – If the input topics contains confidence levels, make sure this is set to True.
Returns:	a list of dictionary that includes string representations (or with confidence levels)

predict(text)[source]¶

Predicts topics from a raw text string.

Parameters:	text – Raw text string.
Returns:	a NumPy array of topics IDs and their confidence levels.

tokenize(text)[source]¶

Turns a pure text to a bag of words using the dictionary of a trained LDA model.

Parameters:	text – Raw text string.
Returns:	A bag of words.

core.matching.lda.detailed_results(results, model_name)[source]¶

Retrieves matched author (aka reviewers) details.

Parameters:	results – a dictionary generated by match_by_lda with detailed=True model_name – the name of the current model
Returns:	a dictionary of author details. You’ll see an example output when you run on the demo model.

core.matching.lda.match_by_lda(text, model_name, top=50, detailed=True, scoring_impl='default', base=0)[source]¶

Gives the best matching result given a string of raw text.

Parameters:	text – The text to be matched. model_name – The name of the LDA model to be used. top – the maximum number of results to be returned. detailed – return a detailed result. It should always be True unless it is used outside the web app. scoring_impl – the scoring implementation to be used. base – The initial value of the vector.
Returns:	the matched result in dictionary form if detailed=True. Otherwise it will return a matrix with author id and the score.

core.matching.lda.score(paper_vec, author_vec, method)[source]¶

Scores a paper-author match.

Parameters:	paper_vec – the vector to be matched author_vec – the vector to be scored against (usually it is the vector of an author) method – the name of scoring implementation
Returns:	a scalar measuring the score of the match