-
公开(公告)号:US20060059121A1
公开(公告)日:2006-03-16
申请号:US10930617
申请日:2004-08-31
申请人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen
发明人: Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma , Zheng Chen
CPC分类号: G06F16/313 , G06F16/38
摘要: A system that identifies a person associated with a document is provided. The system retrieves a name associated with a document and reduces the name to a canonical form. The system then compares the canonical form of the name to the canonical form of the names of known persons. If a match is not found, then the system indicates that the person whose name is associated with the document is a previously unknown person. If a match is found, then the system compares attributes of the document with attributes of documents associated with the matching known person. If those attributes are similar, then the system indicates that the person whose name is associated with the document is the matching known person. Otherwise, the system indicates that the person whose name is associated with the document is a previously unknown person.
摘要翻译: 提供了识别与文档相关联的人的系统。 系统检索与文档关联的名称,并将名称缩小为规范形式。 然后,系统将名称的规范形式与已知人员的名称的规范形式进行比较。 如果没有找到匹配项,则系统指示姓名与该文档相关联的人员是以前未知的人员。 如果找到匹配项,则系统将文档的属性与匹配的已知人员相关联的文档的属性进行比较。 如果这些属性相似,系统会指出姓名与文档相关联的人员是匹配的已知人员。 否则,系统表示姓名与文档相关联的人员是以前未知的人员。
-
公开(公告)号:US20060036596A1
公开(公告)日:2006-02-16
申请号:US10918242
申请日:2004-08-13
申请人: Benyu Zhang , Wei-Ying Ma , Zheng Chen , Hua-Jun Zeng , Dou Shen
发明人: Benyu Zhang , Wei-Ying Ma , Zheng Chen , Hua-Jun Zeng , Dou Shen
IPC分类号: G06F17/30
CPC分类号: G06F17/30705 , G06F17/30719
摘要: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.
摘要翻译: 提供了一种用于计算文档中句子的重要性的方法和系统。 总结系统计算文档的句子的重要性,并选择最重要的句子作为文档的摘要。 总结系统根据文本中包含的“重要”字来计算句子的意义。 总结系统使用各种评分技术计算文档的单词的重要性,然后将分数组合成一个单词重要或不重要。 然后,总结系统可以用于基于句子包含的重要词语来识别文档的重要句子,并且将重要句子作为文档的摘要来选择。
-
公开(公告)号:US20060026152A1
公开(公告)日:2006-02-02
申请号:US10889841
申请日:2004-07-13
申请人: Hua-Jun Zeng , Qicai He , Guimei Liu , Zheng Chen , Benyu Zhang , Wei-Ying Ma
发明人: Hua-Jun Zeng , Qicai He , Guimei Liu , Zheng Chen , Benyu Zhang , Wei-Ying Ma
IPC分类号: G06F17/30
CPC分类号: G06F17/3071 , G06F17/30864 , Y10S707/99931
摘要: A clustering architecture that dynamically groups the search result documents into clusters labeled by phrases extracted from the search result snippets. Documents related to the same topic usually share a common vocabulary. The words are first clustered based on their co-occurrences and each cluster forms a potentially interesting topic. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters.
-
公开(公告)号:US20050246410A1
公开(公告)日:2005-11-03
申请号:US10836319
申请日:2004-04-30
申请人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
发明人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
CPC分类号: G06F17/30719 , G06F17/30864
摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。
-
公开(公告)号:US20050234973A1
公开(公告)日:2005-10-20
申请号:US10826160
申请日:2004-04-15
申请人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Ji-Rong Wen , Hang Li , Wei-Ying Ma , Gabor Hirschler , Kurt Samuelson
发明人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Ji-Rong Wen , Hang Li , Wei-Ying Ma , Gabor Hirschler , Kurt Samuelson
摘要: Systems and methods for mining service requests for product support are described. In one aspect, unstructured service requests are converted to one or more structured answer objects. Each structured answer object includes hierarchically structured historic problem diagnosis data. In view of a product problem description, a set of the one or more structured answer data objects is identified. Each structured solution data object in the set includes term(s) and/or phrase(s) related to the product problem description. Historic and hierarchically structured problem diagnosis data from the set is provided to an end-user for product problem diagnosis.
摘要翻译: 描述了产品支持挖掘服务请求的系统和方法。 在一个方面,非结构化服务请求被转换成一个或多个结构化答案对象。 每个结构化答案对象包括分层结构的历史问题诊断数据。 鉴于产品问题描述,识别一组一个或多个结构化答案数据对象。 该集合中的每个结构化解决方案数据对象包括与产品问题描述相关的术语和/或短语。 将集合中的历史和分层结构的问题诊断数据提供给最终用户进行产品问题诊断。
-
公开(公告)号:US20050234953A1
公开(公告)日:2005-10-20
申请号:US10826162
申请日:2004-04-15
申请人: Benyu Zhang , Hua-Jun Zeng , Zheng Chen , Wei-Ying Ma , Li Li , Ying Li , Tarek Najm
发明人: Benyu Zhang , Hua-Jun Zeng , Zheng Chen , Wei-Ying Ma , Li Li , Ying Li , Tarek Najm
CPC分类号: G06F17/30687 , G06F17/30663 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要: Systems and methods for verifying relevance between terms and Web site contents are described. In one aspect, site contents from a bid URL are retrieved. Expanded term(s) semantically and/or contextually related to bid term(s) are calculated. Content similarity and expanded similarity measurements are calculated from respective combinations of the bid term(s), the site contents, and the expanded terms. Category similarity measurements between the expanded terms and the site contents are determined in view of a trained similarity classifier. The trained similarity classifier having been trained from mined web site content associated with directory data. A confidence value providing an objective measure of relevance between the bid term(s) and the site contents is determined from the content, expanded, and category similarity measurements evaluating the multiple similarity scores in view of a trained relevance classifier model.
摘要翻译: 描述了用于验证术语和网站内容之间的相关性的系统和方法。 一方面,检索出价网址中的网站内容。 计算语法上和/或与投标期相关的扩展术语。 内容相似性和扩展的相似度测量是根据投标条件,站点内容和扩展条款的各自组合计算的。 考虑到经过训练的相似性分类器,确定扩展术语和站点内容之间的类别相似度测量。 经过训练的相似性分类器已经从与目录数据相关联的挖掘的网站内容训练。 考虑到训练有素的相关性分类器模型,从评估多重相似度分数的内容,扩展和类别相似度测度中确定提供投标项和站点内容之间的相关性的客观量度的置信度值。
-
公开(公告)号:US20050234879A1
公开(公告)日:2005-10-20
申请号:US10825894
申请日:2004-04-15
申请人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Li Li , Ying Li , Tarek Najm
发明人: Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Wei-Ying Ma , Li Li , Ying Li , Tarek Najm
CPC分类号: G06F17/30864 , G06F17/3064 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935 , Y10S707/99936 , Y10S707/99937
摘要: Systems and methods for related term suggestion are described. In one aspect, term clusters are generated as a function of calculated similarity of term vectors. Each term vector having been generated from search results associated with a set of high frequency of occurrence (FOO) historical queries previously submitted to a search engine. Responsive to receiving a term/phrase from an entity, the term/phrase is evaluated in view of terms/phrases in the term clusters to identify one or more related term suggestions.
摘要翻译: 描述相关术语建议的系统和方法。 在一个方面,产生术语簇作为计算的项目向量相似度的函数。 从与先前提交给搜索引擎的一组高频率(FOO)历史查询相关联的搜索结果生成每个词向量。 响应于从实体接收术语/短语,根据术语集合中的术语/短语来评估术语/短语以识别一个或多个相关术语建议。
-
88.
公开(公告)号:US07584100B2
公开(公告)日:2009-09-01
申请号:US10880662
申请日:2004-06-30
申请人: Benyu Zhang , Wei-Ying Ma , Zheng Chen , Hua-Jun Zeng
发明人: Benyu Zhang , Wei-Ying Ma , Zheng Chen , Hua-Jun Zeng
CPC分类号: G06F17/3071 , Y10S707/99931 , Y10S707/99933
摘要: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.
摘要翻译: 提供了一种基于文档主题的广义句子模式对文档进行聚类的方法和系统。 广义句型(“GSP”)系统识别描述文档主题的“句子”。 为了集群文件,GSP系统生成描述每个文档主题的句子的“广义句子”形式。 广义句是对句子的单词的抽象。 GSP系统根据其广义句子的模式识别文档簇。 GSP系统在其主题的广义句子表示具有相似模式时对文档进行聚类。
-
89.
公开(公告)号:US20090119284A1
公开(公告)日:2009-05-07
申请号:US12145222
申请日:2008-06-24
申请人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
发明人: Zheng Chen , Dou Shen , Benyu Zhang , Hua-Jun Zeng , Wei-Ying Ma
CPC分类号: G06F16/345 , G06F16/951
摘要: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要翻译: 一种基于自动生成的显示页面摘要来分类显示页面的方法和系统。 网页分类系统使用网页摘要系统来生成网页摘要。 网页的摘要可以包括与网页的主要主题最密切相关的网页的句子。 总结系统可以结合多个汇总技术的优点来识别代表网页的主要主题的网页的句子。 一旦生成摘要,分类系统可以将常规分类技术应用于摘要以对网页进行分类。 分类系统可以使用诸如朴素贝叶斯分类器或支持向量机的常规分类技术来基于由汇总系统生成的摘要来识别网页的分类。
-
公开(公告)号:US07529735B2
公开(公告)日:2009-05-05
申请号:US11057100
申请日:2005-02-11
申请人: Benyu Zhang , Wei-Ying Ma , Gu Xu , Hongbin Gao , Zheng Chen , Randy Hinrichs , Hua-Jun Zeng
发明人: Benyu Zhang , Wei-Ying Ma , Gu Xu , Hongbin Gao , Zheng Chen , Randy Hinrichs , Hua-Jun Zeng
CPC分类号: G06N5/022 , G06F17/30616 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要: A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.
摘要翻译: 提供了一种用于识别人的信息的方法和系统。 信息系统根据与文档的关系或更一般的对象来识别具有关系的人群。 信息系统最初被提供指示哪些人与哪些文档有哪些关系。 然后,信息系统基于与相同对象的关系来识别人群。 信息系统还可以识别与一群人相关联的相关对象的群集。 当用户想要识别关于某人的信息时,用户可以向该信息系统提供该人的姓名。 然后,信息系统可以检索和显示与该人在同一集群中的其他人的姓名。
-
-
-
-
-
-
-
-
-