摘要:
A system for automatically generating user interest profiles and delivering information to users learns a user's interests by monitoring the user's outbound communication streams, i.e., the information that the user produces either by typing (e.g., while a user is composing an e-mail message or editing a word processor document) or by speaking (e.g., while a user is engaged in a phone conversation or listening to a lecture). The system uses the monitored text to build (and possibly update) a user interest profile. The profile is constructed from current text generated by the user, so that the retrieved information reflects present user interests. In addition, the profile may also retain past user interests, so that the profile reflects a combination of past and present user interests. The system then automatically queries diverse databases for information relevant to the interest profile. The databases may include internet web pages, files stored on the user's local network, and other local or remote data repositories. The queries may use a combination of internet search engines, the specific selection of which may depend upon the nature and/or content of the queries. The information retrieved in response to the queries is then presented to the user. The retrieved information may contain, for example, answers to questions that the user might ask and/or data related to the user's current and continuing interests. Because a user's current speech or typed text is highly correlated with the user's current interests, the retrieved information will be relevant to the user's actual interests. The communication stream monitoring, interest profile building, data base querying, and presentation of retrieved information are all performed automatically, in real time, and in the background of current user activities.
摘要:
A system generates user interest profiles by monitoring and analyzing a user's access to a variety of hierarchical levels within a set of structured documents, e.g., documents available at a web site. Each information document has parts associated with it and the documents are classified into categories using a known taxonomy. The user interest profiles are automatically generated based on the type of content viewed by the user. The type of content is determined by the text within the parts of the documents viewed and the classifications of the documents viewed. In addition, the profiles also are generated based on other factors including the frequency and currency of visits to documents having a given classification, and/or the hierarchical depth of the levels or parts of the documents viewed. User profiles include an interest category code and an interest score to indicate a level of interest in a particular category. The profiles are updated automatically to accurately reflect the current interests of an individual, as well as past interests. A time-dependent decay factor is applied to the past interests. The system presents to the user documents or references to documents that match the current profile.
摘要:
A method and apparatus for efficiently matching a large collection of user profiles against a large volume of data in a webcasting system. The invention generally includes in one embodiment four steps to parallelize the profiles. First, an initial profile set is partitioned into several subsets also referred to as sub-partitions using various heuristic methods. Second, each sub-partition is mapped onto one or more independent processing units. Each processing unit is not required to have equal processing performance. However, for best performance results, subset data should be mapped in one embodiment where the subset with a highest cost is mapped to a fastest processor, and the next highest cost subset mapped to the next fastest processor. Where appropriate, the invention evaluates the relative subset processing speed of each processor and adjusts future subset mapping based upon these evaluations. For each information item I that needs to be matched with a profile predicate, a third and a fourth step are executed. The third step broadcasts I to all processing units, and a fourth step performs a sequential profile match on I.