专利检索 ap:("Neeraj Agrawal" OR "Sachindra Joshi" OR "Raghuram Krishnapuram" OR "Sumit Negi") AND inv:"Sachindra Joshi" 第 1 页

1.

发明申请
Determining structural similarity in semi-structured documents 有权
标题翻译：确定半结构文件的结构相似性

公开(公告)号：US20050038785A1

公开(公告)日：2005-02-17

申请号：US10629133

申请日：2003-07-29

申请人： Neeraj Agrawal , Sachindra Joshi , Raghuram Krishnapuram , Sumit Negi

发明人： Neeraj Agrawal , Sachindra Joshi , Raghuram Krishnapuram , Sumit Negi

IPC分类号： G06F17/22 , G06F17/30

CPC分类号： G06F17/30911 , G06F17/2211 , G06F17/2247 , Y10S707/99932 , Y10S707/99933 , Y10S707/99936 , Y10S707/99942

摘要： Documents are represented based on their structure, which arises from the relationship between various elements in the document. After representing documents based on their structure in vector form, a method of measuring similarity between vectors is used to obtain the measure of structural similarity between two given documents.

摘要翻译： 文件基于它们的结构来表示，这些结构源于文档中各种元素之间的关系。在以向量形式的结构表示文档之后，使用测量向量之间的相似性的方法来获得两个给定文档之间的结构相似度的度量。

2.

发明授权
Determining structural similarity in semi-structured documents 有权
标题翻译：确定半结构文件的结构相似性

公开(公告)号：US07203679B2

公开(公告)日：2007-04-10

申请号：US10629133

申请日：2003-07-29

申请人： Neeraj Agrawal , Sachindra Joshi , Raghuram Krishnapuram , Sumit Negi

发明人： Neeraj Agrawal , Sachindra Joshi , Raghuram Krishnapuram , Sumit Negi

IPC分类号： G06F17/30

CPC分类号： G06F17/30911 , G06F17/2211 , G06F17/2247 , Y10S707/99932 , Y10S707/99933 , Y10S707/99936 , Y10S707/99942

摘要： Documents are represented based on their structure, which arises from the relationship between various elements in the document. After representing documents based on their structure in vector form, a method of measuring similarity between vectors is used to obtain the measure of structural similarity between two given documents.

摘要翻译： 文件基于它们的结构来表示，这些结构源于文档中各种元素之间的关系。在以向量形式的结构表示文档之后，使用测量向量之间的相似性的方法来获得两个给定文档之间的结构相似度的度量。

3.

发明授权
System and method for extraction of factoids from textual repositories 失效
标题翻译：从文本库中提取事实的系统和方法

公开(公告)号：US08706730B2

公开(公告)日：2014-04-22

申请号：US11321177

申请日：2005-12-29

申请人： Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott R Holmes

发明人： Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott R Holmes

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , G06F17/30705

摘要： A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognize factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.

摘要翻译： 公开了一种从文本存储库中提取事实框架的方法（400），其中事实框架与给定的类别类别相关联。方法（400）通过训练分类器（230）开始，以识别与该给定的类别类别相关的因子。接下来从文本存储库收集与文件类型相关的文档或文档摘要（410）。具有与给定类别类别的预定关联的句子从文档或所述文档摘要中提取（420）。这些句子在嘈杂的环境中被分类（440），使用分类器（230）提取包含与给定类别类别相关的短语的片段。提取的片段是与给定类实体类别相关联的实例。

4.

发明申请
System and method for extraction of factoids from textual repositories 失效
标题翻译：从文本库中提取事实的系统和方法

公开(公告)号：US20070162447A1

公开(公告)日：2007-07-12

申请号：US11321177

申请日：2005-12-29

申请人： Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott Holmes

发明人： Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott Holmes

IPC分类号： G06F7/00

CPC分类号： G06F17/30864 , G06F17/30705

摘要： A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognise factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.

摘要翻译： 公开了一种从文本存储库中提取事实框架的方法（400），其中事实框架与给定的类别类别相关联。方法（400）通过训练分类器（230）开始，以识别与该给定的类别类别相关的因子。接下来从文本存储库收集与文件类型相关的文档或文档摘要（410）。具有与给定类别类别的预定关联的句子从文档或所述文档摘要中提取（420）。这些句子在嘈杂的环境中被分类（440），使用分类器（230）提取包含与给定类别类别相关的短语的片段。提取的片段是与给定类实体类别相关联的实例。

5.

发明授权
Clustering a collection using an inverted index of features 有权

公开(公告)号：US10083230B2

公开(公告)日：2018-09-25

申请号：US12966698

申请日：2010-12-13

申请人： Danish Contractor , Thomas Hampp-Bahnmueller , Sachindra Joshi , Raghuram Krishnapuram , Kenney Ng

发明人： Danish Contractor , Thomas Hampp-Bahnmueller , Sachindra Joshi , Raghuram Krishnapuram , Kenney Ng

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F16/355 , G06F16/285 , G06F16/319

摘要： Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.

6.

发明申请
Methods, apparatus and computer programs for characterizing web resources 失效

公开(公告)号：US20060026496A1

公开(公告)日：2006-02-02

申请号：US10901275

申请日：2004-07-28

申请人： Sachindra Joshi , Raghuram Krishnapuram , Shourya Roy

发明人： Sachindra Joshi , Raghuram Krishnapuram , Shourya Roy

IPC分类号： G06F17/21

CPC分类号： G06F17/30864 , G06F17/30896

摘要： Methods, apparatus and computer programs are provided for characterizing Web-based information resources based on their interactions. A Web-based information resource is a single Web document or a collection of related Web documents. Unlike simple text documents, Web documents contain hyperlinks and other HTML tags. Different types of interactions, including inbound hyperlinks, outbound hyperlinks and internal links associated with a Web-based information resource, are used to characterize the Web-based information resource. A DOM tree representing the tag structure of a Web-based information resource is used to identify text items likely to be useful as context for a hyperlink anchor text, and the anchor text is combined with the context to generate a representation. The representation of Web-based information resources based on interactions can be used for clustering and classification, and in Web mining applications such as query disambiguation and automatic taxonomy generation.

7.

发明申请
CLUSTERING A COLLECTION USING AN INVERTED INDEX OF FEATURES 审中-公开
标题翻译：使用反转的特征索引集合收集

公开(公告)号：US20120150867A1

公开(公告)日：2012-06-14

申请号：US12966698

申请日：2010-12-13

申请人： Danish Contractor , Thomas Hampp-Bahnmueller , Sachindra Joshi , Raghuram Krishnapuram , Kenney Ng

发明人： Danish Contractor , Thomas Hampp-Bahnmueller , Sachindra Joshi , Raghuram Krishnapuram , Kenney Ng

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , G06F17/30598 , G06F17/30622

摘要： Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.

摘要翻译： 提供了用于为一组数据元素的特征创建反向索引的技术，其中每个数据元素由特征向量表示，其中当用特征查询时，反向索引输出一个或多个包含特征。该组数据元素的特征被排序。对于排序列表中的每个特征，对具有该特征并且没有任何先前选择的特征的数据元素查询反向索引，并且基于响应于该查询返回的结果来创建数据元素的集群。

8.

发明授权
Mining of generalized disjunctive association rules 有权
标题翻译：广义分离关联规则挖掘

公开(公告)号：US06754651B2

公开(公告)日：2004-06-22

申请号：US09836118

申请日：2001-04-17

申请人： Amit Anil Nanavati , Krishna Prasad Chitrapura , Sachindra Joshi , Raghuram Krishnapuram

发明人： Amit Anil Nanavati , Krishna Prasad Chitrapura , Sachindra Joshi , Raghuram Krishnapuram

IPC分类号： G06F1730

CPC分类号： G06F17/30539 , G06F17/3061 , Y10S707/954 , Y10S707/961 , Y10S707/99932 , Y10S707/99933 , Y10S707/99943 , Y10S707/99945

摘要： The present invention provides a system and a method for mining a new kind of association rules called disjunctive association rules, where the antecedent or the consequent of a rule may contain disjuncts of terms (XY or X⊕Y). Such rules are a natural generalisation to the kind of rules that have been mined hitherto. Furthermore, disjunctive association rules are generalised in the sense that the algorithm also mines rules which have disjunctions of conjuncts (C(AB)(DE)). Since the number of combinations of disjuncts is explosive, we use clustering to find a generalized subset. The said clustering is preferably performed using agglomerative clustering methods for finding the greedy subset.

摘要翻译： 本发明提供了一种用于挖掘称为分离关联规则的新型关联规则的系统和方法，其中规则的先决条件或结果可以包含术语的分离（X Y或X⊕Y）。这样的规则是对迄今为止开采的那种规则的自然概括。此外，分离关联规则在一般意义上是泛化的，即该算法还采用具有联结分离的规则（C （A B）（D E）。由于分离组合的数量是爆炸性的，我们使用聚类来找到广义子集。所述聚类优选使用用于发现贪婪子集的聚集聚类方法进行。

9.

发明授权
Methods, apparatus and computer programs for characterizing web resources 失效
标题翻译：用于表征网络资源的方法，设备和计算机程序

公开(公告)号：US07516397B2

公开(公告)日：2009-04-07

申请号：US10901275

申请日：2004-07-28

申请人： Sachindra Joshi , Raghuram Krishnapuram , Shourya Roy

发明人： Sachindra Joshi , Raghuram Krishnapuram , Shourya Roy

IPC分类号： G06F17/00

CPC分类号： G06F17/30864 , G06F17/30896

摘要： Methods, apparatus and computer programs are provided for characterizing Web-based information resources based on their interactions. A Web-based information resource is a single Web document or a collection of related Web documents. Unlike simple text documents, Web documents contain hyperlinks and other HTML tags. Different types of interactions, including inbound hyperlinks, outbound hyperlinks and internal links associated with a Web-based information resource, are used to characterize the Web-based information resource. A DOM tree representing the tag structure of a Web-based information resource is used to identify text items likely to be useful as context for a hyperlink anchor text, and the anchor text is combined with the context to generate a representation. The representation of Web-based information resources based on interactions can be used for clustering and classification, and in Web mining applications such as query disambiguation and automatic taxonomy generation.

摘要翻译： 提供方法，装置和计算机程序，用于基于它们的相互作用来表征基于Web的信息资源。基于Web的信息资源是单个Web文档或相关Web文档的集合。与简单的文本文档不同，Web文档包含超链接和其他HTML标签。使用不同类型的交互，包括入站超链接，出站超链接和与基于Web的信息资源相关联的内部链接，用于表征基于Web的信息资源。代表基于Web的信息资源的标签结构的DOM树用于识别可能作为超链接锚文本的上下文有用的文本项，并且锚文本与上下文组合以生成表示。基于互动的基于Web的信息资源的表示可以用于聚类和分类，以及Web挖掘应用程序，如查询消歧和自动分类法生成。

10.

发明授权
Method and apparatus for populating a predefined concept hierarchy or other hierarchical set of classified data items by minimizing system entrophy 失效
标题翻译：用于通过最小化系统萎缩来填充预定义概念层级或其他分层数据集合的方法和装置

公开(公告)号：US07320000B2

公开(公告)日：2008-01-15

申请号：US10309612

申请日：2002-12-04

申请人： Krishna Prasad Chitrapura , Raghuram Krishnapuram , Sachindra Joshi

发明人： Krishna Prasad Chitrapura , Raghuram Krishnapuram , Sachindra Joshi

IPC分类号： G06F7/10

CPC分类号： G06F17/30 , Y10S707/99937

摘要： A system and method for automated populating of an existing concept hierarchy of items with new items, using entropy as a measure of the correctness of a potential classification. User-defined concept hierarchies include, for example, document hierarchies such as directories for the Internet, library catalogues, patent databases and journals, and product hierarchies. These concept hierarchies can be huge and are usually maintained manually. An internet directory may have, for example, millions of Web sites, thousands of editors and hundreds of thousands of different categories. The method for populating a concept hierarchy includes calculating conditional ‘entropy’ values representing the randomness of distribution of classification attributes for the hierarchical set of classes if a new item is added to specific classes of the hierarchy and then selecting whichever class has the minimum randomness of distribution when calculated as a condition of insertion of the new data item.

摘要翻译： 一种使用熵作为潜在分类正确性的量度来自动填充具有新项目的项目的现有概念层次结构的系统和方法。用户定义的概念层次结构包括例如文档层次结构，例如因特网的目录，图书馆目录，专利数据库和期刊以及产品层次结构。这些概念层次结构可以是巨大的，通常是手动维护的。互联网目录可能具有数百万个网站，数千个编辑者和数十万个不同类别。用于填充概念层次的方法包括：如果将新项目添加到层级的特定类别，然后选择哪个类别具有最小随机性，则计算表示分级集合类的分类属性的分布随机性的条件“熵值” 当作为插入新数据项的条件计算时的分配。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类