-
公开(公告)号:US08280877B2
公开(公告)日:2012-10-02
申请号:US11859461
申请日:2007-09-21
申请人: Benyu Zhang , Jilin Chen , Zheng Chen , HuaJun Zeng , Jian Wang
发明人: Benyu Zhang , Jilin Chen , Zheng Chen , HuaJun Zeng , Jian Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/2745 , G06F17/278
摘要: Systems and methods for implementing diverse topic phrase extraction are disclosed. According to one implementation, multiple word candidate phrases are extracted from a corpus and weighed. One or more documents are re-weighed to identify less obvious candidate topics using latent semantic analysis (LSA). Phrase diversification is then used to remove redundancy and select informative and distinct topic phrases.
摘要翻译: 公开了实现不同主题短语提取的系统和方法。 根据一个实现,从语料库中提取多个单词候选词组并称重。 使用潜在语义分析(LSA),重新衡量一个或多个文档以识别较不明显的候选主题。 然后使用短语多样化来消除冗余并选择信息丰富且不同的主题短语。
-
公开(公告)号:US20080208840A1
公开(公告)日:2008-08-28
申请号:US11859461
申请日:2007-09-21
申请人: Benyu Zhang , Jilin Chen , Zheng Chen , HuaJun Zeng , Jian Wang
发明人: Benyu Zhang , Jilin Chen , Zheng Chen , HuaJun Zeng , Jian Wang
IPC分类号: G06F7/10
CPC分类号: G06F17/2745 , G06F17/278
摘要: Systems and methods for implementing diverse topic phrase extraction are disclosed. According to one implementation, multiple word candidate phrases are extracted from a corpus and weighed. One or more documents are re-weighed to identify less obvious candidate topics using latent semantic analysis (LSA). Phrase diversification is then used to remove redundancy and select informative and distinct topic phrases.
摘要翻译: 公开了实现不同主题短语提取的系统和方法。 根据一个实现,从语料库中提取多个单词候选词组并称重。 使用潜在语义分析(LSA),重新衡量一个或多个文档以识别较不明显的候选主题。 然后使用短语多样化来消除冗余并选择信息丰富且不同的主题短语。
-
公开(公告)号:US20080281834A1
公开(公告)日:2008-11-13
申请号:US11801404
申请日:2007-05-09
申请人: Min Wu , Chenxi Lin , Benyu Zhang , Huajun Zeng , Zheng Chen , Jian Wang
发明人: Min Wu , Chenxi Lin , Benyu Zhang , Huajun Zeng , Zheng Chen , Jian Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30861
摘要: Described is a technology by which blocks of web pages may be selected, such as for building a user-personalized web page containing selected blocks. A selection mechanism, such as a browser toolbar add-on, provides a user interface for selecting blocks, and records information about selected blocks. A block tracking mechanism (e.g., a daemon program) uses the information to locate selected blocks of the web pages, including when the web page containing the block is updated with respect to content and/or layout. The block tracking mechanism may update a local gadget that when invoked, such as by browsing to a particular web page, which shows updated versions of the block on a personalized web page. Blocks may be efficiently located by processing trees representing web pages into reduced trees, and then by performing a minimum distance mapping algorithm on the reduced trees.
摘要翻译: 描述了可以选择网页块的技术,诸如用于构建包含所选块的用户个性化网页。 诸如浏览器工具栏附件的选择机制提供用于选择块的用户界面,并且记录关于所选块的信息。 块跟踪机制(例如,守护程序)使用该信息来定位网页的所选块,包括当包含块的网页相对于内容和/或布局被更新时。 块跟踪机制可以更新当调用时的本地小工具,诸如通过浏览到特定网页,其显示个性化网页上块的更新版本。 可以通过将表示网页的树处理成缩小的树,然后通过在缩小的树上执行最小距离映射算法来有效地定位块。
-
公开(公告)号:US07925644B2
公开(公告)日:2011-04-12
申请号:US12038652
申请日:2008-02-27
申请人: Chenxi Lin , Lei Ji , HuaJun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Chenxi Lin , Lei Ji , HuaJun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
CPC分类号: G06F17/30675 , G06Q10/10
摘要: A method and system for use in information retrieval includes, for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term. When a plurality of terms are received, optionally as a query, the system ranks, using an inverse document frequency algorithm, the plurality of terms for importance based on the document sets for the plurality of terms. Then a number of ranked terms are selected based on importance and a union set is formed based on the document sets associated with the selected number of ranked terms.
摘要翻译: 用于信息检索的方法和系统包括对于多个术语中的每一个,为术语选择预定数量的最高评分文档以形成用于该术语的相应文档集合。 当接收到多个术语时,可选地作为查询,系统使用逆文档频率算法基于多个术语的文档集来排列多个重要术语。 然后,基于重要性选择多个排名项,并且基于与所选择的排序项数相关联的文档集合形成联合集合。
-
公开(公告)号:US07822752B2
公开(公告)日:2010-10-26
申请号:US11804627
申请日:2007-05-18
申请人: Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
CPC分类号: G06F17/30675
摘要: Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
摘要翻译: 描述了一种有效的检索机制,其基于查询词辨别快速定位文档(例如,对应于在线广告)。 根据其排序的重要性来选择搜索项的最顶层子集(例如,两个),例如按照倒排的文档频率排序。 然后使用最上面的术语来缩小被搜索以查找文档标识符和相关分数的反向查询索引的行数,例如通过BM25算法离线计算。 例如,对于每个重要术语的每个文档标识符,可以通过比较文档标识符来跳过每个其他行中的指针,然后是一个指针,来执行每个狭窄的行子集(也包含该文档标识符)的快速搜索 二进制搜索查找特定文档。 然后可以使用该组特定文件的分数来排列其作为结果返回的相对重要性。
-
公开(公告)号:US20080215997A1
公开(公告)日:2008-09-04
申请号:US12038687
申请日:2008-02-27
申请人: Min Wu , Chenxi Lin , Benyu Zhang , HuaJun Zeng , Zheng Chen , Jian Wang
发明人: Min Wu , Chenxi Lin , Benyu Zhang , HuaJun Zeng , Zheng Chen , Jian Wang
IPC分类号: G06F3/048
CPC分类号: G06F3/0481
摘要: An exemplary web browser system includes a selection module for selecting a webpage block and recording information about a selected webpage block; a tracking module for tracking changes to a selected webpage block based at least in part on the recorded information for that webpage block; and a display module for displaying a selected webpage block wherein the tracking module updates the display module as to changes to the selected webpage block. Various other exemplary systems, methods, devices are also disclosed.
摘要翻译: 示例性网络浏览器系统包括用于选择网页块并记录关于所选网页块的信息的选择模块; 跟踪模块,用于至少部分地基于所述网页块的记录信息跟踪对所选网页块的改变; 以及用于显示所选网页块的显示模块,其中所述跟踪模块更新所述显示模块以改变所选择的网页块。 还公开了各种其它示例性系统,方法,装置。
-
公开(公告)号:US20080215574A1
公开(公告)日:2008-09-04
申请号:US12038652
申请日:2008-02-27
申请人: Chenxi Lin , Lei Ji , HuaJun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Chenxi Lin , Lei Ji , HuaJun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30675 , G06Q10/10
摘要: An exemplary method for use in information retrieval includes, for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term; receiving a plurality of terms, optionally as a query; ranking the plurality of terms for importance based at least in part on the document sets for the plurality of terms where the ranking comprises using an inverse document frequency algorithm; selecting a number of ranked terms based on importance where each selected, ranked term comprises its corresponding document set wherein each document in a respective document set comprises a document identification number; forming a union set based on the document sets associated with the selected number of ranked terms; and, for a document identification number in the union set, scanning a document set corresponding to an unselected term for a matching document identification number. Various other exemplary systems, methods, devices, etc. are also disclosed.
摘要翻译: 用于信息检索的示例性方法包括对于多个术语中的每一个,为该术语选择预定数量的最高评分文档以形成用于该术语的对应文档集合; 接收多个术语,可选地作为查询; 至少部分地基于所述多个术语的文档集来排序所述多个重要项,所述术语的排序包括使用逆文档频率算法; 基于重要性选择多个排名项,其中每个所选择的排名项包括其对应的文档集,其中相应文档集中的每个文档包括文档标识号; 基于与选定数量的排名项相关联的文档集合来形成联合集合; 并且对于联合集合中的文档识别号码,扫描与匹配文档识别号码的未选择的术语相对应的文档集。 还公开了各种其它示例性系统,方法,装置等。
-
公开(公告)号:US07818330B2
公开(公告)日:2010-10-19
申请号:US11801404
申请日:2007-05-09
申请人: Min Wu , Chenxi Lin , Benyu Zhang , Huajun Zeng , Zheng Chen , Jian Wang
发明人: Min Wu , Chenxi Lin , Benyu Zhang , Huajun Zeng , Zheng Chen , Jian Wang
IPC分类号: G06F7/00
CPC分类号: G06F17/30861
摘要: Described is a technology by which blocks of web pages may be selected, such as for building a user-personalized web page containing selected blocks. A selection mechanism, such as a browser toolbar add-on, provides a user interface for selecting blocks, and records information about selected blocks. A block tracking mechanism (e.g., a daemon program) uses the information to locate selected blocks of the web pages, including when the web page containing the block is updated with respect to content and/or layout. The block tracking mechanism may update a local gadget that when invoked, such as by browsing to a particular web page, which shows updated versions of the block on a personalized web page. Blocks may be efficiently located by processing trees representing web pages into reduced trees, and then by performing a minimum distance mapping algorithm on the reduced trees.
摘要翻译: 描述了可以选择网页块的技术,诸如用于构建包含所选块的用户个性化网页。 诸如浏览器工具栏附件的选择机制提供用于选择块的用户界面,并且记录关于所选块的信息。 块跟踪机制(例如,守护程序)使用该信息来定位网页的所选块,包括当包含块的网页相对于内容和/或布局被更新时。 块跟踪机制可以更新当调用时的本地小工具,诸如通过浏览到特定网页,其显示个性化网页上块的更新版本。 可以通过将表示网页的树处理成缩小的树,然后通过在缩小的树上执行最小距离映射算法来有效地定位块。
-
9.
公开(公告)号:US20080288348A1
公开(公告)日:2008-11-20
申请号:US11803461
申请日:2007-05-15
申请人: Huajun Zeng , Chenxi Lin , Dingyi Han , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Huajun Zeng , Chenxi Lin , Dingyi Han , Benyu Zhang , Zheng Chen , Jian Wang
IPC分类号: G06Q30/00
CPC分类号: G06Q30/02 , G06Q30/0254 , G06Q30/0256 , G06Q30/0263 , G06Q30/0277
摘要: A method for ranking online advertisements using retailer reputation and product reputation. In one implementation, a query may be received. Advertisements may be selected by determining a level of relevance between the query and each advertisement and selecting the advertisements with a level of relevance above a pre-determined level of relevance. A predicted reputation for a retailer and a predicted reputation for a product may be retrieved for each of the selected advertisements. The selected advertisements may then be ranked based on the predicted reputation for the retailer and the predicted reputation of the product. The ranking of the selected advertisements may be accomplished by calculating a ranking score for each selected advertisement based on the retailer predicted reputation and the product predicted reputation. The selected advertisements may then be displayed according to the ranking.
摘要翻译: 使用零售商信誉和产品信誉对在线广告进行排名的方法。 在一个实现中,可以接收查询。 可以通过确定查询和每个广告之间的相关性级别并且选择具有相关性水平高于预定相关性水平的广告来选择广告。 可以为每个选定的广告检索零售商的预测声誉和产品的预测声誉。 所选择的广告然后可以基于零售商的预测信誉和产品的预测声誉进行排名。 所选择的广告的排名可以通过基于零售商预测的声誉和产品预测的声誉来计算每个所选广告的排名得分来实现。 然后可以根据排名显示所选择的广告。
-
公开(公告)号:US20080288483A1
公开(公告)日:2008-11-20
申请号:US11804627
申请日:2007-05-18
申请人: Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30675
摘要: Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
摘要翻译: 描述了一种有效的检索机制,其基于查询词辨别快速定位文档(例如,对应于在线广告)。 根据其排序的重要性来选择搜索项的最顶层子集(例如,两个),例如按照倒排的文档频率排序。 然后使用最上面的术语来缩小被搜索以查找文档标识符和相关分数的反向查询索引的行数,例如通过BM25算法离线计算。 例如,对于每个重要术语的每个文档标识符,可以通过比较文档标识符来跳过每个其他行中的指针,然后是一个指针,来执行每个狭窄的行子集(也包含该文档标识符)的快速搜索 二进制搜索查找特定文档。 然后可以使用该组特定文件的分数来排列其作为结果返回的相对重要性。
-
-
-
-
-
-
-
-
-