摘要:
Embodiments are directed towards a Modified Sequitur algorithm (MSA) using pipelining and indexed arrays to identify trending topics within a plurality of documents having user generated content (UGC). The documents are parallelized and distributed across a plurality of network devices, which place at least some of the received documents into a buffer for which the MSA may then be applied to the documents within the buffer to identify n-grams or phrases within the documents' contents. The identified phrases are further analyzed to remove extraneous co-occurrences of phrases, and/or words based on a part of speech analysis. A weighting of the remaining phrases is used to identify trending topic phrases. Links to content in the plurality of UGC documents that is associated with the trending topic phrases may then be displayed to a client device.
摘要:
Embodiments are directed towards a Modified Sequitur algorithm (MSA) using pipelining and indexed arrays to identify trending topics within a plurality of documents having user generated content (UGC). The documents are parallelized and distributed across a plurality of network devices, which place at least some of the received documents into a buffer for which the MSA may then be applied to the documents within the buffer to identify n-grams or phrases within the documents' contents. The identified phrases are further analyzed to remove extraneous co-occurrences of phrases, and/or words based on a part of speech analysis. A weighting of the remaining phrases is used to identify trending topic phrases. Links to content in the plurality of UGC documents that is associated with the trending topic phrases may then be displayed to a client device.
摘要:
Techniques are provided for selecting a diverse mix of content items that may be displayed to a user. Content items such as user-generated events are received from a variety of sources. One or more content items are added to a set of content items based on a diversity of characteristics. The diversity of characteristics for the one or more content items may be calculated by measuring a diversity of characteristics of the set as if the one or more content items were added to the set. Content items that produce a greater diversity are selected for addition to the set. The set is displayed to the user, who is provided with a more meaningful mix of content due to the greater diversity in content.
摘要:
Techniques are provided for selecting a diverse mix of content items that may be displayed to a user. Content items such as user-generated events are received from a variety of sources. One or more content items are added to a set of content items based on a diversity of characteristics. The diversity of characteristics for the one or more content items may be calculated by measuring a diversity of characteristics of the set as if the one or more content items were added to the set. Content items that produce a greater diversity are selected for addition to the set. The set is displayed to the user, who is provided with a more meaningful mix of content due to the greater diversity in content.
摘要:
The present invention is directed towards systems and method for organization of bookmarks. The method according to one embodiment comprises retrieving one or more bookmarks associated with one or more content items, a given bookmark generated by a user of a client device and identifying one or more tags associated with one or uniform resource locators corresponding to the or more bookmarks. A bookmark folder hierarchy is created through use of a clustering algorithm on the basis of the one or more tags associated with the one or more uniform resource locators corresponding to the one or more bookmarks.
摘要:
A method and a computer-readable medium are provided which perform screen scraping via grammar induction. The computer-readable medium stores instructions of the method, the instructions directing a computer processor to intercept display information transmitted to a computer-implemented display device representing information stored in a data source; induce a grammar via statistical analysis of the intercepted display information; provide the grammar to a parser-generator to generate a parser corresponding to the induced grammar; and perform screen scraping using the generated parser.
摘要:
Techniques are described for identifying items that have recently undergone an interest burst. Items that have recently undergone an interest burst are identified by comparing how many interest-actions have been performed on the items during a current time window against how many interest-actions have been performed on the items historically. Various tests are performed to rule out candidates that are not likely to be of interest to other users. In addition, various spam detection techniques are described for reducing the possibility that the items that are listed as interest burst items are listed because of spam.
摘要:
A set of general criteria have been defined to improve the efficacy of a tagging system, and have been applied to present collaborative tag suggestions to a user. The collaborative tag suggestions are based on a goodness measure for tags derived from collective user authorities to combat spam. The goodness measure is iteratively adjusted by a reward-penalty algorithm during tag selection. The collaborative tag suggestions can also incorporate other sources of tags, e.g., content-based auto-generated tags.
摘要:
A set of general criteria have been defined to improve the efficacy of a tagging system, and have been applied to present collaborative tag suggestions to a user. The collaborative tag suggestions are based on a goodness measure for tags derived from collective user authorities to combat spam. The goodness measure is iteratively adjusted by a reward-penalty algorithm during tag selection. The collaborative tag suggestions can also incorporate other sources of tags, e.g., content-based auto-generated tags.
摘要:
The present invention is directed towards systems and method for organization of bookmarks. The method according to one embodiment comprises retrieving one or more bookmarks associated with one or more content items, a given bookmark generated by a user of a client device and identifying one or more tags associated with one or uniform resource locators corresponding to the or more bookmarks. A bookmark folder hierarchy is created through use of a clustering algorithm on the basis of the one or more tags associated with the one or more uniform resource locators corresponding to the one or more bookmarks.