Patent search ap:("Sachindra Joshi" OR "Shantanu Godbole") AND inv:"Sachindra Joshi" Page 4

31.

发明授权
Cross-domain clusterability evaluation for cross-guided data clustering based on alignment between data domains 失效
Title translation: 基于数据域之间的对齐的交叉引导数据聚类的跨域可聚集性评估

公开(公告)号：US08229929B2

公开(公告)日：2012-07-24

申请号：US12683095

申请日：2010-01-06

Applicant: Jeffrey M. Achtermann , Indrajit Bhattacharya , Kevin W. English, Jr. , Shantanu R. Godbole , Sachindra Joshi , Ashwin Srinivasan , Ashish Verma

Inventor： Jeffrey M. Achtermann , Indrajit Bhattacharya , Kevin W. English, Jr. , Shantanu R. Godbole , Sachindra Joshi , Ashwin Srinivasan , Ashish Verma

IPC: G06F17/30

CPC classification number: G06F17/30598 , G06F17/3071 , G06F17/30864

Abstract: A system and associated method for evaluating cross-domain clusterability upon a target domain and a source domain. The cross-domain clusterability is calculated as a linear combination of a target clusterability and a source-target pair matchability, by use of a trade-off parameter that determines relative contribution of the target clusterability and the source-target pair matchability. The target clusterability quantifies how clusterable the target domain is. The source-target pair matchability is calculated as an average of a target-side matchability and a source-side matchability, which quantifies how well target centroids of the target domain are aligned with the source centroids and how well source centroids of the source domain are aligned with the target centroids, respectively.

Abstract translation: 用于评估目标域和源域的跨域可聚集性的系统和相关方法。跨域可聚集性通过使用确定目标可聚集性和源 - 目标对匹配性的相对贡献的折衷参数计算为目标可聚集性和源 - 目标对匹配性的线性组合。目标可集群性量化目标域的可集群性。源 - 目标对匹配性被计算为目标端匹配度和源端匹配度的平均值，其量化目标域的目标质心与源中心的匹配程度以及源域的源中心有多好与目标质心分别对齐。

32.

发明授权
Method and apparatus for populating a predefined concept hierarchy or other hierarchical set of classified data items by minimizing system entrophy 失效
Title translation: 用于通过最小化系统萎缩来填充预定义概念层级或其他分层数据集合的方法和装置

公开(公告)号：US07320000B2

公开(公告)日：2008-01-15

申请号：US10309612

申请日：2002-12-04

Applicant: Krishna Prasad Chitrapura , Raghuram Krishnapuram , Sachindra Joshi

Inventor： Krishna Prasad Chitrapura , Raghuram Krishnapuram , Sachindra Joshi

IPC: G06F7/10

CPC classification number: G06F17/30 , Y10S707/99937

Abstract: A system and method for automated populating of an existing concept hierarchy of items with new items, using entropy as a measure of the correctness of a potential classification. User-defined concept hierarchies include, for example, document hierarchies such as directories for the Internet, library catalogues, patent databases and journals, and product hierarchies. These concept hierarchies can be huge and are usually maintained manually. An internet directory may have, for example, millions of Web sites, thousands of editors and hundreds of thousands of different categories. The method for populating a concept hierarchy includes calculating conditional ‘entropy’ values representing the randomness of distribution of classification attributes for the hierarchical set of classes if a new item is added to specific classes of the hierarchy and then selecting whichever class has the minimum randomness of distribution when calculated as a condition of insertion of the new data item.

Abstract translation: 一种使用熵作为潜在分类正确性的量度来自动填充具有新项目的项目的现有概念层次结构的系统和方法。用户定义的概念层次结构包括例如文档层次结构，例如因特网的目录，图书馆目录，专利数据库和期刊以及产品层次结构。这些概念层次结构可以是巨大的，通常是手动维护的。互联网目录可能具有数百万个网站，数千个编辑者和数十万个不同类别。用于填充概念层次的方法包括：如果将新项目添加到层级的特定类别，然后选择哪个类别具有最小随机性，则计算表示分级集合类的分类属性的分布随机性的条件“熵值” 当作为插入新数据项的条件计算时的分配。

33.

发明授权
Methods, apparatus and computer programs for evaluating and using a resilient data representation 有权
Title translation: 用于评估和使用弹性数据表示的方法，装置和计算机程序

公开(公告)号：US07254577B2

公开(公告)日：2007-08-07

申请号：US10880141

申请日：2004-06-29

Applicant: Rahul Gupta , Sachindra Joshi , Raghuram Krishnapuram

Inventor： Rahul Gupta , Sachindra Joshi , Raghuram Krishnapuram

IPC: G06F17/30

CPC classification number: G06F17/30914 , Y10S707/99932 , Y10S707/99936 , Y10S707/99942

Abstract: Provided are methods, apparatus and computer programs for evaluating the resilience, to structural changes in a data source, of a representative label representing a data element within the data source. Also disclosed are applications using a resilient representative label. For example, a representative label may represent a particular data field or other data element within a semi-structured data source—such as within XML or HTML Web pages. An estimate of resilience to changes can be used to determine whether a candidate representative label satisfies a required degree of resilience, or to enable selection of a label with the highest resilience score among a set of representative labels. The validated or selected representative label may then be used for data extraction, remaining usable despite the possibility of future changes to the structure of a Web page, or for template clustering/classification.

Abstract translation: 提供了用于评估表示数据源中的数据元素的代表性标签的弹性（数据源中的结构变化）的方法，装置和计算机程序。还公开了使用弹性代表性标签的应用。例如，代表性标签可以表示半结构化数据源中的特定数据字段或其他数据元素，例如在XML或HTML网页内。可以使用对变化的弹性的估计来确定候选代表标签是否满足所需的弹性程度，或者使得能够在一组代表性标签中选择具有最高回弹分数的标签。经验证或选择的代表性标签然后可用于数据提取，尽管可能将来会改变网页的结构，或用于模板聚类/分类，仍然可用。

34.

发明申请
System and method for extraction of factoids from textual repositories 失效
Title translation: 从文本库中提取事实的系统和方法

公开(公告)号：US20070162447A1

公开(公告)日：2007-07-12

申请号：US11321177

申请日：2005-12-29

Applicant: Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott Holmes

Inventor： Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott Holmes

IPC: G06F7/00

CPC classification number: G06F17/30864 , G06F17/30705

Abstract: A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognise factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.

Abstract translation: 公开了一种从文本存储库中提取事实框架的方法（400），其中事实框架与给定的类别类别相关联。方法（400）通过训练分类器（230）开始，以识别与该给定的类别类别相关的因子。接下来从文本存储库收集与文件类型相关的文档或文档摘要（410）。具有与给定类别类别的预定关联的句子从文档或所述文档摘要中提取（420）。这些句子在嘈杂的环境中被分类（440），使用分类器（230）提取包含与给定类别类别相关的短语的片段。提取的片段是与给定类实体类别相关联的实例。

35.

发明申请
Methods, apparatus and computer programs for evaluating and using a resilient data representation 有权
Title translation: 用于评估和使用弹性数据表示的方法，装置和计算机程序

公开(公告)号：US20060026157A1

公开(公告)日：2006-02-02

申请号：US10880141

申请日：2004-06-29

Applicant: Rahul Gupta , Sachindra Joshi , Raghuram Krishnapuram

Inventor： Rahul Gupta , Sachindra Joshi , Raghuram Krishnapuram

IPC: G06F17/30

CPC classification number: G06F17/30914 , Y10S707/99932 , Y10S707/99936 , Y10S707/99942

Abstract: Provided are methods, apparatus and computer programs for evaluating the resilience, to structural changes in a data source, of a representative label representing a data element within the data source. Also disclosed are applications using a resilient representative label. For example, a representative label may represent a particular data field or other data element within a semi-structured data source - such as within XML or HTML Web pages. An estimate of resilience to changes can be used to determine whether a candidate representative label satisfies a required degree of resilience, or to enable selection of a label with the highest resilience score among a set of representative labels. The validated or selected representative label may then be used for data extraction, remaining usable despite the possibility of future changes to the structure of a Web page, or for template clustering/classification.

Abstract translation: 提供了用于评估表示数据源中的数据元素的代表性标签的弹性（数据源中的结构变化）的方法，装置和计算机程序。还公开了使用弹性代表性标签的应用。例如，代表性标签可以表示半结构化数据源中的特定数据字段或其他数据元素，例如在XML或HTML网页内。可以使用对变化的弹性的估计来确定候选代表标签是否满足所需的弹性程度，或者使得能够在一组代表性标签中选择具有最高回弹分数的标签。经验证或选择的代表性标签然后可用于数据提取，尽管可能将来会改变网页的结构，或用于模板聚类/分类，仍然可用。

36.

发明授权
Dynamically detecting near-duplicate documents 有权
Title translation: 动态检测近重复文件

公开(公告)号：US09245007B2

公开(公告)日：2016-01-26

申请号：US12511175

申请日：2009-07-29

Applicant: Sachindra Joshi , Kenney Ng , Sandeep Singh

Inventor： Sachindra Joshi , Kenney Ng , Sandeep Singh

IPC: G06F17/30

CPC classification number: G06F17/30675

Abstract: Techniques for detecting one or more documents that are duplicate or near-duplicate of a first document are provided. The techniques include obtaining a first document, obtaining one or more additional documents, retrieving a set of one or more document signatures for each document, and detecting one or more documents that are duplicate or near-duplicate of the first document by detecting each of the one or more additional documents that have at least a minimum number of signatures in common with the first document, wherein detecting each of the one or more additional documents that have at least a minimum number of signatures in common with the first document comprises dynamically using at least one of a user-configurable similarity definition and a user-configurable similarity threshold value.

Abstract translation: 提供了用于检测与第一文档重复或近似重复的一个或多个文档的技术。这些技术包括获得第一文档，获得一个或多个附加文档，检索每个文档的一个或多个文档签名的集合，以及通过检测第一文档中的每一个来检测与第一文档重复或近似重复的一个或多个文档一个或多个附加文档具有与第一文档相同的至少最小数量的签名，其中检测至少具有与第一文档共同的最小签名数量的一个或多个附加文档中的每一个，包括动态地使用用户可配置的相似性定义和用户可配置的相似性阈值中的至少一个。

37.

发明授权
Enhancing posted content in discussion forums 有权

公开(公告)号：US09189967B2

公开(公告)日：2015-11-17

申请号：US13600856

申请日：2012-08-31

Applicant: Amit K. Singh , Rose Catherine Kanjirathinkal , Sachindra Joshi , Ankur Gandhe , Karthik Vesweswariah

Inventor： Amit K. Singh , Rose Catherine Kanjirathinkal , Sachindra Joshi , Ankur Gandhe , Karthik Vesweswariah

IPC: G09B7/02

CPC classification number: G09B7/02

Abstract: Methods and arrangements for enhancing content in discussion forums. Access to an online discussion is provided. A posting by an author participating in the discussion is accepted, and a recommendation is automatically produced for the author for amending the posting to increase the likelihood of response to the posting by other individuals participating in the discussion.

38.

发明授权
Enhancing posted content in discussion forums 有权
Title translation: 在论坛上加强发布的内容

公开(公告)号：US09189965B2

公开(公告)日：2015-11-17

申请号：US13538899

申请日：2012-06-29

Applicant: Amit K. Singh , Rose Catherine Kanjirathinkal , Sachindra Joshi , Ankur Gandhe , Karthik Visweswariah

Inventor： Amit K. Singh , Rose Catherine Kanjirathinkal , Sachindra Joshi , Ankur Gandhe , Karthik Visweswariah

IPC: G09B7/02

CPC classification number: G09B7/02

Abstract: Methods and arrangements for enhancing content in discussion forums. Access to an online discussion is provided. A posting by an author participating in the discussion is accepted, and a recommendation is automatically produced for the author for amending the posting to increase the likelihood of response to the posting by other individuals participating in the discussion.

Abstract translation: 在论坛上增强内容的方法和安排。提供访问在线讨论。参与讨论的作者的贴子被接受，并且自动地为作者自行制定修改发布的建议，以增加参与讨论的其他个人发布的可能性。

39.

发明授权
Intent discovery in audio or text-based conversation 有权
Title translation: 音频或基于文本的对话中的意图发现

公开(公告)号：US08983840B2

公开(公告)日：2015-03-17

申请号：US13526637

申请日：2012-06-19

Applicant: Om D. Deshmukh , Sachindra Joshi , Saket Saurabh , Ashish Verma

Inventor： Om D. Deshmukh , Sachindra Joshi , Saket Saurabh , Ashish Verma

IPC: G10L15/18 , G06F17/27

CPC classification number: G10L25/48 , G10L15/02 , G10L15/18 , G10L15/1822 , G10L15/26 , G10L21/10

Abstract: Techniques, an apparatus and an article of manufacture identifying one or more utterances that are likely to carry the intent of a speaker, from a conversation between two or more parties. A method includes obtaining an input of a set of utterances in chronological order from a conversation between two or more parties, computing an intent confidence value of each utterance by summing intent confidence scores from each of the constituent words of the utterance, wherein intent confidence scores capture each word's influence on the subsequent utterances in the conversation based on (i) the uniqueness of the word in the conversation and (ii) the number of times the word subsequently occurs in the conversation, and generating a ranked order of the utterances from highest to lowest intent confidence value, wherein the highest intent value corresponds to the utterance which is most likely to carry intent of the speaker.

Abstract translation: 从两个或多个方之间的对话中识别出可能携带说话人意图的一个或多个话语的技术，装置和制品。一种方法包括从两个或更多方之间的会话按时间顺序获得一组话语的输入，通过将来自每个话语的组成词的意图置信度得分相加来计算每个话语的意图置信度值，其中意图置信度得分基于（i）会话中的单词的唯一性和（ii）单词随后在会话中发生的次数，并且从最高级别生成排序的话语顺序，从而捕获每个单词对对话中后续话语的影响到最低意图置信度值，其中最高意图值对应于最有可能携带说话者意图的话语。

40.

发明授权
Systems and methods for discovering synonymous elements using context over multiple similar addresses 失效
Title translation: 使用上下文发现多个相似地址的同义元素的系统和方法

公开(公告)号：US08682898B2

公开(公告)日：2014-03-25

申请号：US12771543

申请日：2010-04-30

Applicant: Sachindra Joshi , Tanveer Faruquie , Hima Prasad Karanam , Marvin Mendelssohn , Mukesh Kumar Mohania , Angel Marie Smith , L Venkata Subramaniam , Girish Venkatachaliah

Inventor： Sachindra Joshi , Tanveer Faruquie , Hima Prasad Karanam , Marvin Mendelssohn , Mukesh Kumar Mohania , Angel Marie Smith , L Venkata Subramaniam , Girish Venkatachaliah

IPC: G06F7/00 , G06F17/00

CPC classification number: G06F17/2735 , G06F17/2795

Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.

Abstract translation: 提供了基于聚类的数据标准化方法。某些实施例将多个地址作为输入，识别地址的一个或多个特征，基于一个或多个特征聚集地址，利用群集提供用于识别一个或多个同义词的基于数据的上下文对于包含在地址中的元素，并将地址标准化为可接受的格式，其中一个或多个同义词和/或其他元素作为标准化的一部分被添加到或从输入地址中取走处理。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification