-
公开(公告)号:US07555480B2
公开(公告)日:2009-06-30
申请号:US11456753
申请日:2006-07-11
申请人: Benyu Zhang , Chenxi Lin , Hua-Jun Zeng , Jian Wang , Ke Tang , Zheng Chen
发明人: Benyu Zhang , Chenxi Lin , Hua-Jun Zeng , Jian Wang , Ke Tang , Zheng Chen
CPC分类号: G06F17/30864 , Y10S707/99935
摘要: The invention provides a method of interactively crawling data records on a web page. Users may select various data records of interest on a web page to generate templates to search for similar data items on the same web page or on different web pages. A tree matching algorithm may be used to compare and extract data matching the generated template.
摘要翻译: 本发明提供了一种在网页上交互地爬行数据记录的方法。 用户可以在网页上选择感兴趣的各种数据记录,以生成在同一网页或不同网页上搜索类似数据项的模板。 树匹配算法可用于比较和提取与生成的模板匹配的数据。
-
82.
公开(公告)号:US20090119284A1
公开(公告)日:2009-05-07
申请号:US12145222
申请日:2008-06-24
申请人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
发明人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
CPC分类号: G06F16/345 , G06F16/951
摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。
-
公开(公告)号:US07437382B2
公开(公告)日:2008-10-14
申请号:US11130803
申请日:2005-05-16
申请人: Benyu Zhang , Zheng Chen , Wensi Xi , Hua-Jun Zeng , Wei-Ying Ma
发明人: Benyu Zhang , Zheng Chen , Wensi Xi , Hua-Jun Zeng , Wei-Ying Ma
IPC分类号: G06F17/30
CPC分类号: H04L51/26 , H04L51/16 , H04L51/34 , Y10S707/99933 , Y10S707/99943
摘要: A method and system for ranking messages of discussion threads based on relationships between messages and authors is provided. The ranking system defines an equation for attributes of a message and an author. The equations define the attribute values and are based on relationships between the attribute and the attributes associated with the same type of object, and different types of objects. The ranking system iteratively calculates the attribute values for the objects using the equations until the attribute values converge on a solution. The ranking system then ranks the messages based on attribute values.
摘要翻译: 提供了一种基于消息和作者之间的关系对讨论线程的消息进行排序的方法和系统。 排名系统定义了消息和作者属性的方程式。 方程定义属性值,并且基于属性和与相同类型对象相关联的属性以及不同类型对象之间的关系。 排序系统使用等式迭代地计算对象的属性值,直到属性值收敛于解。 然后,排名系统根据属性值排列消息。
-
84.
公开(公告)号:US20080103886A1
公开(公告)日:2008-05-01
申请号:US11553897
申请日:2006-10-27
申请人: Hua Li , Zheng Chen , Benyu Zhang , Hua-Jun Zeng , Jian Wang
发明人: Hua Li , Zheng Chen , Benyu Zhang , Hua-Jun Zeng , Jian Wang
IPC分类号: G06Q30/00
CPC分类号: G06Q30/02 , G06Q30/0275 , G06Q30/0277
摘要: A method and system for generating and using a combined model to identify whether a bid term is relevant to an advertisement is provided. A relevance system trains a combined model that includes an initial model and a decision tree model that are trained using features that represent relationships between bid terms and advertisements. The relevance system trains the initial model to map initial model features to a modeled relevance. The relevance system trains the decision tree model to map the decision tree features and the modeled relevance to a final relevance. The trained initial model and decision tree model represent the combined model. The relevance system then uses the combined model to determine the relevance of bid terms to advertisements.
摘要翻译: 提供了一种用于生成和使用组合模型以识别出价项是否与广告相关的方法和系统。 相关系统训练包括初始模型和决策树模型的组合模型,该模型使用表示投标条款和广告之间关系的特征来训练。 相关系统训练初始模型以将初始模型特征映射到建模相关性。 相关系统训练决策树模型,将决策树特征和建模相关性映射到最终相关性。 训练初始模型和决策树模型代表组合模型。 相关系统然后使用组合模型来确定投标条款与广告的相关性。
-
公开(公告)号:US07305389B2
公开(公告)日:2007-12-04
申请号:US10826161
申请日:2004-04-15
申请人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
发明人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
IPC分类号: G06F17/30
CPC分类号: G06F17/30631 , G06F17/30722 , G06F17/30864 , Y10S707/99935 , Y10S707/99942
摘要: Systems and methods providing computer-implemented content propagation for enhanced document retrieval are described. In one aspect, reference information directed to one or more documents is identified. The reference information is identified from one or more sources of data that are independent of a data source that includes the one or more documents. Metadata that is proximally located to the reference information is extracted from the one or more sources of data. Relevance between respective features of the metadata to content of associated ones of the one or more documents is calculated. For each document of the one or more documents, associated portions of the metadata is indexed with the relevance of features from the respective portions into original content of the document. The indexing generates one or more enhanced documents.
摘要翻译: 描述了提供用于增强文档检索的计算机实现的内容传播的系统和方法。 在一个方面,指定针对一个或多个文档的参考信息。 参考信息从一个或多个独立于包括一个或多个文档的数据源的数据来源识别。 从一个或多个数据来源提取近端位于参考信息的元数据。 计算元数据的各个特征与一个或多个文档中相关联的内容的相关性。 对于一个或多个文档的每个文档,将元数据的相关部分与来自相应部分的特征与文档的原始内容的相关性进行索引。 索引生成一个或多个增强文档。
-
公开(公告)号:US07289985B2
公开(公告)日:2007-10-30
申请号:US10826168
申请日:2004-04-15
申请人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
发明人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
CPC分类号: G06F17/30864 , G06F17/30616 , G06F17/30899 , Y10S707/917 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935
摘要: Systems and methods for enhanced document retrieval are described. In one aspect, a search query from an end-user is received. Responsive to receiving the search query, search results are retrieved. The search results include an enhanced document and a set of non-enhanced documents. The enhanced document and the non-enhanced documents include term(s) of the search query. The enhanced document is derived from a base document. The base document was modified with metadata mined from one or more different documents. The metadata is associated with one or more respective references to the base document. The one or more different documents are independent of the base document.
摘要翻译: 描述用于增强文档检索的系统和方法。 在一个方面,接收来自最终用户的搜索查询。 响应于接收搜索查询,搜索结果被检索。 搜索结果包括增强文档和一组非增强文档。 增强文档和非增强文档包括搜索查询的术语。 增强的文档是从基础文档派生的。 使用从一个或多个不同文档挖掘的元数据对基本文档进行了修改。 元数据与对基本文档的一个或多个相应的引用相关联。 一个或多个不同的文档独立于基本文档。
-
公开(公告)号:US20070239643A1
公开(公告)日:2007-10-11
申请号:US11378095
申请日:2006-03-17
申请人: Ning Liu , Benyu Zhang , Jun Yan , Zheng Chen , Hua-Jun Zeng , Jian Wang
发明人: Ning Liu , Benyu Zhang , Jun Yan , Zheng Chen , Hua-Jun Zeng , Jian Wang
IPC分类号: G06N3/00
CPC分类号: G06N5/02 , G06F17/30705
摘要: Computer-readable media having computer-executable instructions and apparatuses categorize documents or corpus of documents. A Tensor Space Model (TSM), which models the text by a higher-order tensor, represents a document or a corpus of documents. Supported by techniques of multilinear algebra, TSM provides a framework for analyzing the multifactor structures. TSM is further supported by operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for a reduction of the dimensions of the higher-order tensor. The dimensionally reduced tensor is compared with tensors that represent possible categories. Consequently, a category is selected for the document or corpus of documents. Experimental results on the dataset for 20 Newsgroups suggest that TSM is advantageous to a Vector Space Model (VSM) for text classification.
摘要翻译: 具有计算机可执行指令和设备的计算机可读介质将文档或语料库分类。 张量空间模型(TSM),其通过高阶张量对文本进行建模,表示文档或文档语料库。 由多线代数技术支持,TSM为多因素结构分析提供了框架。 TSM还受到操作和提出的工具的支持,如高阶奇异值分解(HOSVD),用于降低高阶张量的尺寸。 将尺寸减小的张量与表示可能类别的张量进行比较。 因此,文档或文档的语料库选择一个类别。 20个新闻组的数据集的实验结果表明,TSM对于文本分类的向量空间模型(VSM)是有利的。
-
公开(公告)号:US20070005649A1
公开(公告)日:2007-01-04
申请号:US11173098
申请日:2005-07-01
申请人: Jian Wang , Fengping Zeng , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Chenxi Lin , Bing Sun
发明人: Jian Wang , Fengping Zeng , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Chenxi Lin , Bing Sun
IPC分类号: G06F17/00
CPC分类号: G06F16/957
摘要: The invention provides a method of creating contextual titles for web pages or documents. The method includes the extracting of phrases from a web page or document. The phrases are evaluated for use as contextual titles for the web page or document. The contextual title is utilized to access the web page or document by users.
摘要翻译: 本发明提供了一种为网页或文档创建上下文标题的方法。 该方法包括从网页或文档中提取短语。 这些短语被评估用作网页或文档的上下文标题。 使用上下文标题来访问用户的网页或文档。
-
公开(公告)号:US20060271834A1
公开(公告)日:2006-11-30
申请号:US11136029
申请日:2005-05-24
申请人: Jian Wang , Hua-Jun Zeng , Chenxi Lin , Zheng Chen , Benyu Zhang , Bing Sun
发明人: Jian Wang , Hua-Jun Zeng , Chenxi Lin , Zheng Chen , Benyu Zhang , Bing Sun
IPC分类号: G06F17/00
CPC分类号: G06F17/3089
摘要: The invention provides a method of creating a personal home page containing information of interest assembled from various web sites. The method includes the partitioning of web pages into web blocks. Users may collect various web blocks from different web pages and utilize those web blocks to define the dynamic personal homepage. In addition, the web blocks may be tracked to update content in the personal home page based on corresponding changes in the original web page.
摘要翻译: 本发明提供了一种创建包含从各种网站组装的感兴趣的信息的个人主页的方法。 该方法包括将网页划分成网页块。 用户可以从不同的网页收集各种网页块,并利用这些网页块定义动态个人主页。 此外,可以基于原始网页中的相应变化来跟踪网页块以更新个人主页中的内容。
-
公开(公告)号:US20060259480A1
公开(公告)日:2006-11-16
申请号:US11125839
申请日:2005-05-10
申请人: Benyu Zhang , Gui-Rong Xue , Hua-Jun Zeng , Wei-Ying Ma , Xue-Mei Jiang , Zheng Chen
发明人: Benyu Zhang , Gui-Rong Xue , Hua-Jun Zeng , Wei-Ying Ma , Xue-Mei Jiang , Zheng Chen
IPC分类号: G06F17/30
CPC分类号: G06F17/30882 , G06F17/30867 , Y10S707/99935
摘要: A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.
-
-
-
-
-
-
-
-
-