摘要:
A system and method for providing orientation into digital information is provided. A plurality of evergreen indexes for subject areas are maintained. The evergreen indexes include digital information and are each organized by topics that include a topic model matched to the digital information. A user interest within the digital information is determined. The topic models for the evergreen indexes are evaluated against the user interest and those topics models that best match the user interest are identified. Access to the digital information is provided via at least one of the topic models in at least one of the evergreen indexes.
摘要:
A system and method for performing discovery of digital information in a subject area is provided. Each of topics in a subject area, training material for the topics, and a corpus comprising digital information are designated. Topic models for each of the topics are built. The topic models are evaluated against the training material. The digital information from the corpus is organized by the topics using the topic models into an evergreen index.
摘要:
A computer-implemented method for providing robust topic identification in social indexes is described. Electronically-stored articles and one or more indexes are maintained. Each index includes topics that each relate to one or more of the articles. A random sampling and a selective sampling of the articles are both selected. For each topic, characteristic words included in the articles in each of the random sampling and the selective sampling are identified. Frequencies of occurrence of the characteristic words in each of the random sampling and the selective sampling are determined. A ratio of the frequencies of occurrence for the characteristic words included in the random sampling and the selective sampling is identified. Finally, for each topic, a coarse-grained topic model is built, which includes the characteristic words included in the articles relating to the topic and scores assigned to those characteristic words.
摘要:
A system and method for providing default hierarchical training for social indexing is provided. Articles of digital information for social indexing are maintained. A hierarchically-structured tree of topics is specified. Each topic includes a label that includes one or more words. Constraints inherent in the literal structure of the topic tree are identified. For each topic in the topic tree, a topic model that includes at least one term derived from the words in at least one of the labels is created. The topic models for the topic tree are evaluated against the constraints. Those of the topic models, which best satisfy the constraints are identified.
摘要:
A system and method for performing discovery of digital information in a subject area is provided. Each of topics in a subject area, training material for the topics, and a corpus comprising digital information are designated. Topic models for each of the topics are built. The topic models are evaluated against the training material. The digital information from the corpus is organized by the topics using the topic models into an evergreen index.
摘要:
A system and method for prospecting digital information is provided. A home evergreen index for a home subject area within a corpus of digital information is maintained and includes topic models matched to the corpus. A frontier evergreen index for a frontier subject area within the corpus topically distinct from the home subject area is identified. Quality assessments for frontier articles from the corpus identified by the topic models of the frontier evergreen index are obtained. The frontier articles with positive quality assessments are reclassified against the topic models in the home evergreen index. The frontier articles are provided in a display with home articles previously classified against the topic models in the home evergreen index.
摘要:
A system and method for managing user attention by detecting hot topics in social indexes is provided. Articles of digital information and at least one social index are maintained. The social index includes topics that each relate to one or more of the articles. Topic models matched to the digital information are retrieved for each topic. The articles are classified under the topics using the topic models. Each of the topics in the social index is evaluated for hotness. A plurality of time periods projected from the present is defined. Counts of the articles appearing under each time period are evaluated. The topics exhibiting a rising curve in the count of the articles that increases with recency during the time periods are chosen. Quality of the articles within the topics chosen is analyzed, The topics including the articles having acceptable quality are presented.
摘要:
A system and method for providing default hierarchical training for social indexing is provided. Articles of digital information for social indexing are maintained. A hierarchically-structured tree of topics is specified. Each topic includes a label that includes one or more words. Constraints inherent in the literal structure of the topic tree are identified. For each topic in the topic tree, a topic model that includes at least one term derived from the words in at least one of the labels is created. The topic models for the topic tree are evaluated against the constraints. Those of the topic models, which best satisfy the constraints are identified.
摘要:
A system and method for managing user attention by detecting hot topics in social indexes is provided. Articles of digital information and at least one social index are maintained. The social index includes topics that each relate to one or more of the articles. Topic models matched to the digital information are retrieved for each topic. The articles are classified under the topics using the topic models. Each of the topics in the social index is evaluated for hotness. A plurality of time periods projected from the present is defined. Counts of the articles appearing under each time period are evaluated. The topics exhibiting a rising curve in the count of the articles that increases with recency during the time periods are chosen. Quality of the articles within the topics chosen is analyzed, The topics including the articles having acceptable quality are presented.