专利检索 ap:"Sauraj Goswami" 第 1 页

1.

发明授权
Between matching 失效
标题翻译：在匹配之间

公开(公告)号：US08086597B2

公开(公告)日：2011-12-27

申请号：US11770573

申请日：2007-06-28

申请人： Andrey Balmin , Sauraj Goswami

发明人： Andrey Balmin , Sauraj Goswami

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30938

摘要： A query of at least one mark-up language document has a path expression comprising a conjunction, a first filter and a second filter. The first filter has a first probe. The second filter has a second probe. The first and second filters form a between filter having start and stop values specified by the first and second probes. A plan to process the query is generated based on, at least in part, a range defined by the start and stop values. An index of mark-up language documents is defined by another path expression; the index comprises values of mark-up language documents that satisfy the other path expression; the values are key values of the index. The plan is to perform a single scan of the key values from the start value to the stop value to identify at least one key value that satisfies the between filter.

摘要翻译： 至少一个标记语言文档的查询具有包括连接，第一过滤器和第二过滤器的路径表达式。第一个过滤器有一个第一个探针。第二个过滤器有一个第二个探针。第一和第二滤波器在具有由第一和第二探针指定的起始和停止值的滤波器之间形成。基于至少部分地由起始值和停止值定义的范围来生成处理查询的计划。标记语言文档的索引由另一个路径表达式定义; 该索引包括满足其他路径表达式的标记语言文档的值; 这些值是索引的关键值。该计划是执行从起始值到停止值的关键值的单次扫描，以识别满足过滤器之间的至少一个键值。

2.

发明申请
BETWEEN MATCHING 失效
标题翻译：匹配

公开(公告)号：US20090006447A1

公开(公告)日：2009-01-01

申请号：US11770573

申请日：2007-06-28

申请人： Andrey Balmin , Sauraj Goswami

发明人： Andrey Balmin , Sauraj Goswami

IPC分类号： G06F17/30

CPC分类号： G06F17/30938

摘要： Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that identify a range filter in a mark-up language query. In response to receiving a query of at least one mark-up language document, the query comprising a plurality of singleton filters, at least one group of the plurality of singleton filters are identified. Each group of comprises at least two singleton filters, wherein each group is equivalent to a range filter having a start value and a stop value. The start value and stop value are based on at least two singleton filters of each group. A query plan is generated to process the query based on, at least in part, a range defined by the start value and the stop value of the at least two singleton filters of each group.

摘要翻译： 提供了计算机实现的方法，计算机程序产品和数据处理系统的各种实施例，其识别标记语言查询中的范围过滤器。响应于接收到至少一个标记语言文档的查询，所述查询包括多个单例过滤器，所述多个单例过滤器中的至少一组被识别。每个组包括至少两个单例滤波器，其中每组相当于具有起始值和停止值的范围滤波器。起始值和停止值基于每组的至少两个单例过滤器。生成查询计划，以至少部分地基于由每个组的至少两个单例过滤器的起始值和停止值定义的范围来处理查询。

3.

发明申请
LANGUAGE IDENTIFICATION FOR DOCUMENTS CONTAINING MULTIPLE LANGUAGES 有权
标题翻译：包含多种语言的文档的语言识别

公开(公告)号：US20130191111A1

公开(公告)日：2013-07-25

申请号：US13550346

申请日：2012-07-16

申请人： Sauraj GOSWAMI

发明人： Sauraj GOSWAMI

IPC分类号： G06F17/28

CPC分类号： G06F17/289 , G06F17/275

摘要： Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.

摘要翻译： 可以识别单个文档中的多个不重叠语言。在一个实施例中，对于一组候选语言中的每一个，定义了一组非重叠语言。该文件是在整个文档是一种语言的假设下进行分析的，文档的该部分是一种语言，而其余部分是不同的，不重叠的语言。通过比较多种语言对中的这些竞争假设来识别文档的语言。在另一个实施例中，使用非重叠字符集之间的转换来分割文档，并且对于候选语言的子集分别划分每个段。文档的语言基于分数得分来识别。

4.

发明授权
Generalized partition pruning in a database system 有权
标题翻译：数据库系统中的广义分区修剪

公开(公告)号：US07461060B2

公开(公告)日：2008-12-02

申请号：US11242951

申请日：2005-10-04

申请人： Thomas Abel Beavin , Sauraj Goswami , Terence Patrick Purcell

发明人： Thomas Abel Beavin , Sauraj Goswami , Terence Patrick Purcell

IPC分类号： G06F7/00 , G06F17/30 , G06F17/00

CPC分类号： G06F17/30454 , G06F17/30492 , Y10S707/99935 , Y10S707/99943 , Y10S707/99945

摘要： Methods for executing a query on data that has been partitioned into a plurality of partitions are provided. The method includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The method further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruni.ng decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.

摘要翻译： 提供了对已经被分割成多个分区的数据执行查询的方法。该方法包括提供包括一个或多个列和多个分区的分区数据。分区数据包括与给定分区的每列关联的限制键值。该方法还包括在分区数据的一列或多列之一上接收包括谓词的查询; 以及基于与所述多个分区相关联的所述限制密钥值，在所述一个或多个分区中的至少一个分区上的所述一个或多个列中的所述一个或多个列中的所述谓词。

5.

发明授权
Automated identification of documents as not belonging to any language 有权
标题翻译：自动识别不属于任何语言的文件

公开(公告)号：US08224642B2

公开(公告)日：2012-07-17

申请号：US12275027

申请日：2008-11-20

申请人： Sauraj Goswami

发明人： Sauraj Goswami

IPC分类号： G06F17/20

CPC分类号： G06F17/275

摘要： An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.

摘要翻译： 用于语言的“冒名顶替者”用于确定文档是使用该语言还是没有语言。给定语言的冒名顶替者提供关于将一种或多种其他（“冒名图”）语言应用于实际上以给定语言的文档的语言模型的预期结果的统计信息。在确定测试文档的最可能的语言之后，冒名顶替者与各种冒名顶替者语言中的测试文档的分数一起使用，以确定是否将测试文档识别为最可能的语言或无语言。

6.

发明授权
Compressibility checking avoidance 失效
标题翻译：压缩性检查避免

公开(公告)号：US07840774B2

公开(公告)日：2010-11-23

申请号：US11233956

申请日：2005-09-09

申请人： Jeffrey Allen Berger , You-Chin Fuh , Sauraj Goswami , Balakrishna Raghavendra Iyer , Michael R. Shadduck , James Zu-Chia Teng , Stephen Walter Turnbaugh

发明人： Jeffrey Allen Berger , You-Chin Fuh , Sauraj Goswami , Balakrishna Raghavendra Iyer , Michael R. Shadduck , James Zu-Chia Teng , Stephen Walter Turnbaugh

IPC分类号： G06F12/00 , G06F13/00 , G06F13/28 , G06F13/12 , G06F13/38

CPC分类号： H03M7/30 , G06F17/30336

摘要： Various embodiments of a computer-implemented method, system and computer program product maintain a logical page having a predetermined size. Data is added to an uncompressed area of the logical page. The uncompressed area of the logical page is associated with an uncompressed area of a physical page. The logical page also has a compressed area associated with a compressed area of a physical page. In response to exhausting the uncompressed area, data in the uncompressed area is included in the compressed area. The uncompressed area is adjusted.

摘要翻译： 计算机实现的方法，系统和计算机程序产品的各种实施例维护具有预定大小的逻辑页面。数据被添加到逻辑页面的未压缩区域。逻辑页面的未压缩区域与物理页面的未压缩区域相关联。逻辑页面还具有与物理页面的压缩区域相关联的压缩区域。响应于未压缩区域的耗尽，未压缩区域中的数据被包括在压缩区域中。未压缩区域被调整。

7.

发明授权
Generalized partition pruning in a database system 有权
标题翻译：数据库系统中的广义分区修剪

公开(公告)号：US07970756B2

公开(公告)日：2011-06-28

申请号：US12268391

申请日：2008-11-10

申请人： Thomas Abel Beavin , Sauraj Goswami , Terence Patrick Purcell

发明人： Thomas Abel Beavin , Sauraj Goswami , Terence Patrick Purcell

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30454 , G06F17/30492 , Y10S707/99935 , Y10S707/99943 , Y10S707/99945

摘要： A system for executing a query on data that has been partitioned into a plurality of partitions is provided. The system includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The system further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruning decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.

摘要翻译： 提供了一种用于执行对已被分割成多个分区的数据的查询的系统。该系统包括提供包括一个或多个列和多个分区的分区数据。分区数据包括与给定分区的每列关联的限制键值。该系统还包括在分区数据的一列或多列之一上接收包括谓词的查询; 以及基于与所述多个分区相关联的所述限制键值，在所述一个或多个分区中的至少一个分割中的所述一个或多个列中的所述一个或多个列中的所述谓词。

8.

发明授权
Keymap order compression 有权
标题翻译：键盘顺序压缩

公开(公告)号：US07783855B2

公开(公告)日：2010-08-24

申请号：US11615699

申请日：2006-12-22

申请人： Sauraj Goswami , You-Chin Fuh , Michael R. Shadduck , James Zu-Chia Teng

发明人： Sauraj Goswami , You-Chin Fuh , Michael R. Shadduck , James Zu-Chia Teng

IPC分类号： G06F13/00 , G06F13/28 , G06F7/00 , G06F17/00

CPC分类号： H03M7/30 , G06F17/30336

摘要： Various embodiments of a computer-implemented method, system and computer program product are provided. A first plurality of key entries of a first index page are compressed in accordance with an order specified by a first keymap of the first index page. The first keymap also indicates respective positions of the key entries of the first plurality of key entries. A second keymap is generated indicating the order and also indicating respective post-compression positions of the key entries of the first plurality of key entries. The compressed first plurality of key entries is stored on a second index page with the second keymap.

摘要翻译： 提供了计算机实现的方法，系统和计算机程序产品的各种实施例。根据由第一索引页的第一键映射指定的顺序来压缩第一索引页的第一多个密钥条目。第一键映射还指示第一多个密钥条目的密钥条目的相应位置。产生指示顺序的第二键图，并且还指示第一多个键入口中的键入项的各自的后压缩位置。压缩的第一多个密钥条目存储在具有第二密钥映射的第二索引页上。

9.

发明申请
LANGUAGE IDENTIFICATION FOR DOCUMENTS CONTAINING MULTIPLE LANGUAGES 有权
标题翻译：包含多种语言的文档的语言识别

公开(公告)号：US20100125447A1

公开(公告)日：2010-05-20

申请号：US12274182

申请日：2008-11-19

申请人： Sauraj Goswami

发明人： Sauraj Goswami

IPC分类号： G06F17/20

CPC分类号： G06F17/289 , G06F17/275

摘要： Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.

摘要翻译： 可以识别单个文档中的多个不重叠语言。在一个实施例中，对于一组候选语言中的每一个，定义一组非重叠语言。该文件是在整个文档是一种语言的假设下进行分析的，文档的该部分是一种语言，而其余部分是不同的，不重叠的语言。通过比较多种语言对中的这些竞争假设来识别文档的语言。在另一个实施例中，使用非重叠字符集之间的转换来分割文档，并且对于候选语言的子集分别划分每个段。文档的语言基于分数得分来识别。

10.

发明申请
INDEX EXPLOITATION 失效
标题翻译：指数开发

公开(公告)号：US20090006314A1

公开(公告)日：2009-01-01

申请号：US11770607

申请日：2007-06-28

申请人： Andrey Balmin , Sauraj Goswami

发明人： Andrey Balmin , Sauraj Goswami

IPC分类号： G06F7/00

CPC分类号： G06F17/30929 , G06F17/30979

摘要： Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that generate an index plan that produces a superset of data comprising the query result. In some embodiments, a computer-implemented method, computer program product, and data processing system produce a maximal-index-satisfiable query tree.

摘要翻译： 提供了计算机实现的方法，计算机程序产品和数据处理系统的各种实施例，其生成产生包括查询结果的数据的超集的索引计划。在一些实施例中，计算机实现的方法，计算机程序产品和数据处理系统产生最大索引可满足查询树。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类