-
公开(公告)号:US07555480B2
公开(公告)日:2009-06-30
申请号:US11456753
申请日:2006-07-11
申请人: Benyu Zhang , Chenxi Lin , Hua-Jun Zeng , Jian Wang , Ke Tang , Zheng Chen
发明人: Benyu Zhang , Chenxi Lin , Hua-Jun Zeng , Jian Wang , Ke Tang , Zheng Chen
CPC分类号: G06F17/30864 , Y10S707/99935
摘要: The invention provides a method of interactively crawling data records on a web page. Users may select various data records of interest on a web page to generate templates to search for similar data items on the same web page or on different web pages. A tree matching algorithm may be used to compare and extract data matching the generated template.
摘要翻译: 本发明提供了一种在网页上交互地爬行数据记录的方法。 用户可以在网页上选择感兴趣的各种数据记录,以生成在同一网页或不同网页上搜索类似数据项的模板。 树匹配算法可用于比较和提取与生成的模板匹配的数据。
-
92.
公开(公告)号:US20080103886A1
公开(公告)日:2008-05-01
申请号:US11553897
申请日:2006-10-27
申请人: Hua Li , Zheng Chen , Benyu Zhang , Hua-Jun Zeng , Jian Wang
发明人: Hua Li , Zheng Chen , Benyu Zhang , Hua-Jun Zeng , Jian Wang
IPC分类号: G06Q30/00
CPC分类号: G06Q30/02 , G06Q30/0275 , G06Q30/0277
摘要: A method and system for generating and using a combined model to identify whether a bid term is relevant to an advertisement is provided. A relevance system trains a combined model that includes an initial model and a decision tree model that are trained using features that represent relationships between bid terms and advertisements. The relevance system trains the initial model to map initial model features to a modeled relevance. The relevance system trains the decision tree model to map the decision tree features and the modeled relevance to a final relevance. The trained initial model and decision tree model represent the combined model. The relevance system then uses the combined model to determine the relevance of bid terms to advertisements.
摘要翻译: 提供了一种用于生成和使用组合模型以识别出价项是否与广告相关的方法和系统。 相关系统训练包括初始模型和决策树模型的组合模型,该模型使用表示投标条款和广告之间关系的特征来训练。 相关系统训练初始模型以将初始模型特征映射到建模相关性。 相关系统训练决策树模型,将决策树特征和建模相关性映射到最终相关性。 训练初始模型和决策树模型代表组合模型。 相关系统然后使用组合模型来确定投标条款与广告的相关性。
-
公开(公告)号:US20070239643A1
公开(公告)日:2007-10-11
申请号:US11378095
申请日:2006-03-17
申请人: Ning Liu , Benyu Zhang , Jun Yan , Zheng Chen , Hua-Jun Zeng , Jian Wang
发明人: Ning Liu , Benyu Zhang , Jun Yan , Zheng Chen , Hua-Jun Zeng , Jian Wang
IPC分类号: G06N3/00
CPC分类号: G06N5/02 , G06F17/30705
摘要: Computer-readable media having computer-executable instructions and apparatuses categorize documents or corpus of documents. A Tensor Space Model (TSM), which models the text by a higher-order tensor, represents a document or a corpus of documents. Supported by techniques of multilinear algebra, TSM provides a framework for analyzing the multifactor structures. TSM is further supported by operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for a reduction of the dimensions of the higher-order tensor. The dimensionally reduced tensor is compared with tensors that represent possible categories. Consequently, a category is selected for the document or corpus of documents. Experimental results on the dataset for 20 Newsgroups suggest that TSM is advantageous to a Vector Space Model (VSM) for text classification.
摘要翻译: 具有计算机可执行指令和设备的计算机可读介质将文档或语料库分类。 张量空间模型(TSM),其通过高阶张量对文本进行建模,表示文档或文档语料库。 由多线代数技术支持,TSM为多因素结构分析提供了框架。 TSM还受到操作和提出的工具的支持,如高阶奇异值分解(HOSVD),用于降低高阶张量的尺寸。 将尺寸减小的张量与表示可能类别的张量进行比较。 因此,文档或文档的语料库选择一个类别。 20个新闻组的数据集的实验结果表明,TSM对于文本分类的向量空间模型(VSM)是有利的。
-
公开(公告)号:US20070005649A1
公开(公告)日:2007-01-04
申请号:US11173098
申请日:2005-07-01
申请人: Jian Wang , Fengping Zeng , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Chenxi Lin , Bing Sun
发明人: Jian Wang , Fengping Zeng , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Chenxi Lin , Bing Sun
IPC分类号: G06F17/00
CPC分类号: G06F16/957
摘要: The invention provides a method of creating contextual titles for web pages or documents. The method includes the extracting of phrases from a web page or document. The phrases are evaluated for use as contextual titles for the web page or document. The contextual title is utilized to access the web page or document by users.
摘要翻译: 本发明提供了一种为网页或文档创建上下文标题的方法。 该方法包括从网页或文档中提取短语。 这些短语被评估用作网页或文档的上下文标题。 使用上下文标题来访问用户的网页或文档。
-
公开(公告)号:US20060271834A1
公开(公告)日:2006-11-30
申请号:US11136029
申请日:2005-05-24
申请人: Jian Wang , Hua-Jun Zeng , Chenxi Lin , Zheng Chen , Benyu Zhang , Bing Sun
发明人: Jian Wang , Hua-Jun Zeng , Chenxi Lin , Zheng Chen , Benyu Zhang , Bing Sun
IPC分类号: G06F17/00
CPC分类号: G06F17/3089
摘要: The invention provides a method of creating a personal home page containing information of interest assembled from various web sites. The method includes the partitioning of web pages into web blocks. Users may collect various web blocks from different web pages and utilize those web blocks to define the dynamic personal homepage. In addition, the web blocks may be tracked to update content in the personal home page based on corresponding changes in the original web page.
摘要翻译: 本发明提供了一种创建包含从各种网站组装的感兴趣的信息的个人主页的方法。 该方法包括将网页划分成网页块。 用户可以从不同的网页收集各种网页块,并利用这些网页块定义动态个人主页。 此外,可以基于原始网页中的相应变化来跟踪网页块以更新个人主页中的内容。
-
公开(公告)号:US07925644B2
公开(公告)日:2011-04-12
申请号:US12038652
申请日:2008-02-27
申请人: Chenxi Lin , Lei Ji , HuaJun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Chenxi Lin , Lei Ji , HuaJun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
CPC分类号: G06F17/30675 , G06Q10/10
摘要: A method and system for use in information retrieval includes, for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term. When a plurality of terms are received, optionally as a query, the system ranks, using an inverse document frequency algorithm, the plurality of terms for importance based on the document sets for the plurality of terms. Then a number of ranked terms are selected based on importance and a union set is formed based on the document sets associated with the selected number of ranked terms.
摘要翻译: 用于信息检索的方法和系统包括对于多个术语中的每一个,为术语选择预定数量的最高评分文档以形成用于该术语的相应文档集合。 当接收到多个术语时,可选地作为查询,系统使用逆文档频率算法基于多个术语的文档集来排列多个重要术语。 然后,基于重要性选择多个排名项,并且基于与所选择的排序项数相关联的文档集合形成联合集合。
-
公开(公告)号:US07822752B2
公开(公告)日:2010-10-26
申请号:US11804627
申请日:2007-05-18
申请人: Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
CPC分类号: G06F17/30675
摘要: Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
摘要翻译: 描述了一种有效的检索机制,其基于查询词辨别快速定位文档(例如,对应于在线广告)。 根据其排序的重要性来选择搜索项的最顶层子集(例如,两个),例如按照倒排的文档频率排序。 然后使用最上面的术语来缩小被搜索以查找文档标识符和相关分数的反向查询索引的行数,例如通过BM25算法离线计算。 例如,对于每个重要术语的每个文档标识符,可以通过比较文档标识符来跳过每个其他行中的指针,然后是一个指针,来执行每个狭窄的行子集(也包含该文档标识符)的快速搜索 二进制搜索查找特定文档。 然后可以使用该组特定文件的分数来排列其作为结果返回的相对重要性。
-
公开(公告)号:US07693908B2
公开(公告)日:2010-04-06
申请号:US11770358
申请日:2007-06-28
申请人: Ning Liu , Jun Yan , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Ning Liu , Jun Yan , Benyu Zhang , Zheng Chen , Jian Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06Q30/02
摘要: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
摘要翻译: 用于分析和建模查询频率的技术由查询分析系统提供。 查询分析系统分析查询的频率,以确定查询是时间依赖还是时间无关。 查询分析系统根据其周期性预测与时间相关的查询的频率。 查询分析系统根据与其他查询的因果关系预测与时间无关的查询的频率。 为了预测时间无关查询的频率,查询分析系统随时间分析查询的频率,以识别频率的显着增加,这被称为“查询事件”或“事件”。查询分析系统预测频率 基于具有事件倾向于在要预测的查询的事件之前的查询的与时间无关的查询。
-
公开(公告)号:US07685099B2
公开(公告)日:2010-03-23
申请号:US11770445
申请日:2007-06-28
申请人: Ning Liu , Jun Yan , Benyu Zhang , Zheng Chen , Jian Wang
发明人: Ning Liu , Jun Yan , Benyu Zhang , Zheng Chen , Jian Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06Q30/02
摘要: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
摘要翻译: 用于分析和建模查询频率的技术由查询分析系统提供。 查询分析系统分析查询的频率,以确定查询是时间依赖还是时间无关。 查询分析系统根据其周期性预测与时间相关的查询的频率。 查询分析系统根据与其他查询的因果关系预测与时间无关的查询的频率。 为了预测时间无关查询的频率,查询分析系统随时间分析查询的频率,以识别频率的显着增加,这被称为“查询事件”或“事件”。查询分析系统预测频率 基于具有事件倾向于在要预测的查询的事件之前的查询的与时间无关的查询。
-
公开(公告)号:US07634471B2
公开(公告)日:2009-12-15
申请号:US11392760
申请日:2006-03-30
申请人: Zheng Chen , Lei Li , Chenxi Lin , Qiaoling Liu , Jian Wang , Benyu Zhang
发明人: Zheng Chen , Lei Li , Chenxi Lin , Qiaoling Liu , Jian Wang , Benyu Zhang
IPC分类号: G06F17/30
CPC分类号: G06F17/30112 , G06F17/3012 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935 , Y10S707/99936 , Y10S707/99937 , Y10S707/99938
摘要: Extraction of semantic information and the generation of semantic attributes allows for improved organization and management of data. Semantic attributes are automatically generated and eliminate the need for manual entry of attribute information. A semantic file network may further be constructed based on similarities between files that are based on the semantic attribute information. Semantic links representing a semantic relationship may be built between similar or relevant files. In addition, user operations and user operation patterns may also be considered in building the file network. Semantic attributes and information may further facilitate browsing the file systems as well as improve the accuracy and speed of queries.
摘要翻译: 语义信息的提取和语义属性的产生可以改善数据的组织和管理。 自动生成语义属性,无需手动输入属性信息。 还可以基于基于语义属性信息的文件之间的相似性来构建语义文件网络。 表示语义关系的语义链接可以在相似或相关文件之间建立。 此外,在构建文件网络时也可以考虑用户操作和用户操作模式。 语义属性和信息可以进一步促进文件系统的浏览以及提高查询的准确性和速度。
-
-
-
-
-
-
-
-
-