-
公开(公告)号:US07565372B2
公开(公告)日:2009-07-21
申请号:US11225861
申请日:2005-09-13
申请人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen , Hua Li
发明人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen , Hua Li
IPC分类号: G06F17/00
CPC分类号: G06F17/277 , G06F17/30719 , Y10S707/99943
摘要: A summary system for evaluating summaries of documents and for generating summaries of documents based on normalized probabilities of portions of the document. A summarization system generates a summary by selecting sentences for the summary based on their normalized probabilities as derived from a document model. An evaluation system evaluates the effectiveness of a summary based on a normalized probability for the summary that is derived from a document model.
摘要翻译: 基于文件部分的归一化概率来评估文件摘要和文档摘要的汇总系统。 摘要系统通过从文档模型导出的归一化概率选择摘要的句子来生成摘要。 评估系统基于从文档模型导出的摘要的归一化概率来评估摘要的有效性。
-
42.
公开(公告)号:US20090119284A1
公开(公告)日:2009-05-07
申请号:US12145222
申请日:2008-06-24
申请人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
发明人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
CPC分类号: G06F16/345 , G06F16/951
摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。
-
公开(公告)号:US07529735B2
公开(公告)日:2009-05-05
申请号:US11057100
申请日:2005-02-11
申请人: Benyu Zhang , Wei-Ying Ma , Gu Xu , Hongbin Gao , Zheng Chen , Randy Hinrichs , Hua-Jun Zeng
发明人: Benyu Zhang , Wei-Ying Ma , Gu Xu , Hongbin Gao , Zheng Chen , Randy Hinrichs , Hua-Jun Zeng
CPC分类号: G06N5/022 , G06F17/30616 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要: A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.
摘要翻译: 提供了一种用于识别人的信息的方法和系统。 信息系统根据与文档的关系或更一般的对象来识别具有关系的人群。 信息系统最初被提供指示哪些人与哪些文档有哪些关系。 然后,信息系统基于与相同对象的关系来识别人群。 信息系统还可以识别与一群人相关联的相关对象的群集。 当用户想要识别关于某人的信息时,用户可以向该信息系统提供该人的姓名。 然后,信息系统可以检索和显示与该人在同一集群中的其他人的姓名。
-
公开(公告)号:US07437382B2
公开(公告)日:2008-10-14
申请号:US11130803
申请日:2005-05-16
申请人: Benyu Zhang , Zheng Chen , Wensi Xi , Hua-Jun Zeng , Wei-Ying Ma
发明人: Benyu Zhang , Zheng Chen , Wensi Xi , Hua-Jun Zeng , Wei-Ying Ma
IPC分类号: G06F17/30
CPC分类号: H04L51/26 , H04L51/16 , H04L51/34 , Y10S707/99933 , Y10S707/99943
摘要: A method and system for ranking messages of discussion threads based on relationships between messages and authors is provided. The ranking system defines an equation for attributes of a message and an author. The equations define the attribute values and are based on relationships between the attribute and the attributes associated with the same type of object, and different types of objects. The ranking system iteratively calculates the attribute values for the objects using the equations until the attribute values converge on a solution. The ranking system then ranks the messages based on attribute values.
摘要翻译: 提供了一种基于消息和作者之间的关系对讨论线程的消息进行排序的方法和系统。 排名系统定义了消息和作者属性的方程式。 方程定义属性值,并且基于属性和与相同类型对象相关联的属性以及不同类型对象之间的关系。 排序系统使用等式迭代地计算对象的属性值,直到属性值收敛于解。 然后,排名系统根据属性值排列消息。
-
公开(公告)号:US07305389B2
公开(公告)日:2007-12-04
申请号:US10826161
申请日:2004-04-15
申请人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
发明人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
IPC分类号: G06F17/30
CPC分类号: G06F17/30631 , G06F17/30722 , G06F17/30864 , Y10S707/99935 , Y10S707/99942
摘要: Systems and methods providing computer-implemented content propagation for enhanced document retrieval are described. In one aspect, reference information directed to one or more documents is identified. The reference information is identified from one or more sources of data that are independent of a data source that includes the one or more documents. Metadata that is proximally located to the reference information is extracted from the one or more sources of data. Relevance between respective features of the metadata to content of associated ones of the one or more documents is calculated. For each document of the one or more documents, associated portions of the metadata is indexed with the relevance of features from the respective portions into original content of the document. The indexing generates one or more enhanced documents.
摘要翻译: 描述了提供用于增强文档检索的计算机实现的内容传播的系统和方法。 在一个方面,指定针对一个或多个文档的参考信息。 参考信息从一个或多个独立于包括一个或多个文档的数据源的数据来源识别。 从一个或多个数据来源提取近端位于参考信息的元数据。 计算元数据的各个特征与一个或多个文档中相关联的内容的相关性。 对于一个或多个文档的每个文档,将元数据的相关部分与来自相应部分的特征与文档的原始内容的相关性进行索引。 索引生成一个或多个增强文档。
-
公开(公告)号:US07289985B2
公开(公告)日:2007-10-30
申请号:US10826168
申请日:2004-04-15
申请人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
发明人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Hsiao-Wuen Hon , Daniel B. Cook , Gabor Hirschler , Karen Fries , Kurt Samuelson
CPC分类号: G06F17/30864 , G06F17/30616 , G06F17/30899 , Y10S707/917 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935
摘要: Systems and methods for enhanced document retrieval are described. In one aspect, a search query from an end-user is received. Responsive to receiving the search query, search results are retrieved. The search results include an enhanced document and a set of non-enhanced documents. The enhanced document and the non-enhanced documents include term(s) of the search query. The enhanced document is derived from a base document. The base document was modified with metadata mined from one or more different documents. The metadata is associated with one or more respective references to the base document. The one or more different documents are independent of the base document.
摘要翻译: 描述用于增强文档检索的系统和方法。 在一个方面,接收来自最终用户的搜索查询。 响应于接收搜索查询,搜索结果被检索。 搜索结果包括增强文档和一组非增强文档。 增强文档和非增强文档包括搜索查询的术语。 增强的文档是从基础文档派生的。 使用从一个或多个不同文档挖掘的元数据对基本文档进行了修改。 元数据与对基本文档的一个或多个相应的引用相关联。 一个或多个不同的文档独立于基本文档。
-
公开(公告)号:US20060259480A1
公开(公告)日:2006-11-16
申请号:US11125839
申请日:2005-05-10
申请人: Benyu Zhang , Gui-Rong Xue , Hua-Jun Zeng , Wei-Ying Ma , Xue-Mei Jiang , Zheng Chen
发明人: Benyu Zhang , Gui-Rong Xue , Hua-Jun Zeng , Wei-Ying Ma , Xue-Mei Jiang , Zheng Chen
IPC分类号: G06F17/30
CPC分类号: G06F17/30882 , G06F17/30867 , Y10S707/99935
摘要: A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.
-
48.
公开(公告)号:US20060112068A1
公开(公告)日:2006-05-25
申请号:US10997749
申请日:2004-11-23
申请人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen , Ning Liu , Jun Yan
发明人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen , Ning Liu , Jun Yan
IPC分类号: G06F17/30
CPC分类号: G06F17/3069 , Y10S707/99934 , Y10S707/99935 , Y10S707/99936 , Y10S707/99937
摘要: A method and system for determining similarity between items is provided. To calculate similarity scores for pairs of items, the similarity system initializes a similarity score for each pair of objects and each pair of features. The similarity system then iteratively calculates the similarity scores for each pair of objects based on the similar scores of the pairs of features calculated during a previous iteration and calculates the similarity scores for each pair of features based on the similarity scores of the pairs of objects calculated during a previous iteration. The similarity system implements an algorithm that is based on a recursive definition of the similarities between objects and between features. The similarity system continues the iterations of recalculating the similarity scores until the similarity scores converge on a solution.
-
公开(公告)号:US20060095430A1
公开(公告)日:2006-05-04
申请号:US10978232
申请日:2004-10-29
申请人: Hua-Jun Zeng , Zheng Chen , Benyu Zhang , Wei-Ying Ma , Guirong Xue
发明人: Hua-Jun Zeng , Zheng Chen , Benyu Zhang , Wei-Ying Ma , Guirong Xue
IPC分类号: G06F17/30
CPC分类号: G06F17/30864
摘要: The described systems, methods and data structures are directed to ranking Web pages with hierarchical considerations. The hierarchical structures and the linking relationships of the World Wide Web are used to provide a page importance ranking for Web searches. The linking relationships are aggregated to a high level node at each of the hierarchical structures. A link graph analysis is performed on the aggregated linking relationships to determine the importance of each node. The importance of each node may be propagated to pages associated with that node. For each page, the importance of that page and the importance of the node associated with the page are used to calculate the page importance ranking.
摘要翻译: 所描述的系统,方法和数据结构针对分级考虑对网页排序。 万维网的层次结构和链接关系用于为Web搜索提供页面重要性排名。 链接关系在每个分层结构中聚合到高级节点。 对聚合的链接关系执行链接图分析,以确定每个节点的重要性。 每个节点的重要性可以传播到与该节点相关联的页面。 对于每个页面,使用该页面的重要性和与页面相关联的节点的重要性来计算页面重要性排名。
-
公开(公告)号:US20060005247A1
公开(公告)日:2006-01-05
申请号:US10881867
申请日:2004-06-30
申请人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen
发明人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen
IPC分类号: H04L9/00
CPC分类号: G06F17/30616 , G06F17/30628 , G06F17/30678 , G06F21/6245 , H04L51/12 , Y10S707/99933
摘要: A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.
-
-
-
-
-
-
-
-
-