Between matching
    1.
    发明授权
    Between matching 失效
    在匹配之间

    公开(公告)号:US08086597B2

    公开(公告)日:2011-12-27

    申请号:US11770573

    申请日:2007-06-28

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30938

    摘要: A query of at least one mark-up language document has a path expression comprising a conjunction, a first filter and a second filter. The first filter has a first probe. The second filter has a second probe. The first and second filters form a between filter having start and stop values specified by the first and second probes. A plan to process the query is generated based on, at least in part, a range defined by the start and stop values. An index of mark-up language documents is defined by another path expression; the index comprises values of mark-up language documents that satisfy the other path expression; the values are key values of the index. The plan is to perform a single scan of the key values from the start value to the stop value to identify at least one key value that satisfies the between filter.

    摘要翻译: 至少一个标记语言文档的查询具有包括连接,第一过滤器和第二过滤器的路径表达式。 第一个过滤器有一个第一个探针。 第二个过滤器有一个第二个探针。 第一和第二滤波器在具有由第一和第二探针指定的起始和停止值的滤波器之间形成。 基于至少部分地由起始值和停止值定义的范围来生成处理查询的计划。 标记语言文档的索引由另一个路径表达式定义; 该索引包括满足其他路径表达式的标记语言文档的值; 这些值是索引的关键值。 该计划是执行从起始值到停止值的关键值的单次扫描,以识别满足过滤器之间的至少一个键值。

    BETWEEN MATCHING
    2.
    发明申请
    BETWEEN MATCHING 失效
    匹配

    公开(公告)号:US20090006447A1

    公开(公告)日:2009-01-01

    申请号:US11770573

    申请日:2007-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30938

    摘要: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that identify a range filter in a mark-up language query. In response to receiving a query of at least one mark-up language document, the query comprising a plurality of singleton filters, at least one group of the plurality of singleton filters are identified. Each group of comprises at least two singleton filters, wherein each group is equivalent to a range filter having a start value and a stop value. The start value and stop value are based on at least two singleton filters of each group. A query plan is generated to process the query based on, at least in part, a range defined by the start value and the stop value of the at least two singleton filters of each group.

    摘要翻译: 提供了计算机实现的方法,计算机程序产品和数据处理系统的各种实施例,其识别标记语言查询中的范围过滤器。 响应于接收到至少一个标记语言文档的查询,所述查询包括多个单例过滤器,所述多个单例过滤器中的至少一组被识别。 每个组包括至少两个单例滤波器,其中每组相当于具有起始值和停止值的范围滤波器。 起始值和停止值基于每组的至少两个单例过滤器。 生成查询计划,以至少部分地基于由每个组的至少两个单例过滤器的起始值和停止值定义的范围来处理查询。

    LANGUAGE IDENTIFICATION FOR DOCUMENTS CONTAINING MULTIPLE LANGUAGES
    3.
    发明申请
    LANGUAGE IDENTIFICATION FOR DOCUMENTS CONTAINING MULTIPLE LANGUAGES 有权
    包含多种语言的文档的语言识别

    公开(公告)号:US20130191111A1

    公开(公告)日:2013-07-25

    申请号:US13550346

    申请日:2012-07-16

    申请人: Sauraj GOSWAMI

    发明人: Sauraj GOSWAMI

    IPC分类号: G06F17/28

    CPC分类号: G06F17/289 G06F17/275

    摘要: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.

    摘要翻译: 可以识别单个文档中的多个不重叠语言。 在一个实施例中,对于一组候选语言中的每一个,定义了一组非重叠语言。 该文件是在整个文档是一种语言的假设下进行分析的,文档的该部分是一种语言,而其余部分是不同的,不重叠的语言。 通过比较多种语言对中的这些竞争假设来识别文档的语言。 在另一个实施例中,使用非重叠字符集之间的转换来分割文档,并且对于候选语言的子集分别划分每个段。 文档的语言基于分数得分来识别。

    Generalized partition pruning in a database system
    4.
    发明授权
    Generalized partition pruning in a database system 有权
    数据库系统中的广义分区修剪

    公开(公告)号:US07461060B2

    公开(公告)日:2008-12-02

    申请号:US11242951

    申请日:2005-10-04

    IPC分类号: G06F7/00 G06F17/30 G06F17/00

    摘要: Methods for executing a query on data that has been partitioned into a plurality of partitions are provided. The method includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The method further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruni.ng decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.

    摘要翻译: 提供了对已经被分割成多个分区的数据执行查询的方法。 该方法包括提供包括一个或多个列和多个分区的分区数据。 分区数据包括与给定分区的每列关联的限制键值。 该方法还包括在分区数据的一列或多列之一上接收包括谓词的查询; 以及基于与所述多个分区相关联的所述限制密钥值,在所述一个或多个分区中的至少一个分区上的所述一个或多个列中的所述一个或多个列中的所述谓词。

    Automated identification of documents as not belonging to any language
    5.
    发明授权
    Automated identification of documents as not belonging to any language 有权
    自动识别不属于任何语言的文件

    公开(公告)号:US08224642B2

    公开(公告)日:2012-07-17

    申请号:US12275027

    申请日:2008-11-20

    申请人: Sauraj Goswami

    发明人: Sauraj Goswami

    IPC分类号: G06F17/20

    CPC分类号: G06F17/275

    摘要: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.

    摘要翻译: 用于语言的“冒名顶替者”用于确定文档是使用该语言还是没有语言。 给定语言的冒名顶替者提供关于将一种或多种其他(“冒名图”)语言应用于实际上以给定语言的文档的语言模型的预期结果的统计信息。 在确定测试文档的最可能的语言之后,冒名顶替者与各种冒名顶替者语言中的测试文档的分数一起使用,以确定是否将测试文档识别为最可能的语言或无语言。

    Generalized partition pruning in a database system
    7.
    发明授权
    Generalized partition pruning in a database system 有权
    数据库系统中的广义分区修剪

    公开(公告)号:US07970756B2

    公开(公告)日:2011-06-28

    申请号:US12268391

    申请日:2008-11-10

    IPC分类号: G06F7/00 G06F17/30

    摘要: A system for executing a query on data that has been partitioned into a plurality of partitions is provided. The system includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The system further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruning decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.

    摘要翻译: 提供了一种用于执行对已被分割成多个分区的数据的查询的系统。 该系统包括提供包括一个或多个列和多个分区的分区数据。 分区数据包括与给定分区的每列关联的限制键值。 该系统还包括在分区数据的一列或多列之一上接收包括谓词的查询; 以及基于与所述多个分区相关联的所述限制键值,在所述一个或多个分区中的至少一个分割中的所述一个或多个列中的所述一个或多个列中的所述谓词。

    Keymap order compression
    8.
    发明授权
    Keymap order compression 有权
    键盘顺序压缩

    公开(公告)号:US07783855B2

    公开(公告)日:2010-08-24

    申请号:US11615699

    申请日:2006-12-22

    CPC分类号: H03M7/30 G06F17/30336

    摘要: Various embodiments of a computer-implemented method, system and computer program product are provided. A first plurality of key entries of a first index page are compressed in accordance with an order specified by a first keymap of the first index page. The first keymap also indicates respective positions of the key entries of the first plurality of key entries. A second keymap is generated indicating the order and also indicating respective post-compression positions of the key entries of the first plurality of key entries. The compressed first plurality of key entries is stored on a second index page with the second keymap.

    摘要翻译: 提供了计算机实现的方法,系统和计算机程序产品的各种实施例。 根据由第一索引页的第一键映射指定的顺序来压缩第一索引页的第一多个密钥条目。 第一键映射还指示第一多个密钥条目的密钥条目的相应位置。 产生指示顺序的第二键图,并且还指示第一多个键入口中的键入项的各自的后压缩位置。 压缩的第一多个密钥条目存储在具有第二密钥映射的第二索引页上。

    LANGUAGE IDENTIFICATION FOR DOCUMENTS CONTAINING MULTIPLE LANGUAGES
    9.
    发明申请
    LANGUAGE IDENTIFICATION FOR DOCUMENTS CONTAINING MULTIPLE LANGUAGES 有权
    包含多种语言的文档的语言识别

    公开(公告)号:US20100125447A1

    公开(公告)日:2010-05-20

    申请号:US12274182

    申请日:2008-11-19

    申请人: Sauraj Goswami

    发明人: Sauraj Goswami

    IPC分类号: G06F17/20

    CPC分类号: G06F17/289 G06F17/275

    摘要: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.

    摘要翻译: 可以识别单个文档中的多个不重叠语言。 在一个实施例中,对于一组候选语言中的每一个,定义一组非重叠语言。 该文件是在整个文档是一种语言的假设下进行分析的,文档的该部分是一种语言,而其余部分是不同的,不重叠的语言。 通过比较多种语言对中的这些竞争假设来识别文档的语言。 在另一个实施例中,使用非重叠字符集之间的转换来分割文档,并且对于候选语言的子集分别划分每个段。 文档的语言基于分数得分来识别。

    INDEX EXPLOITATION
    10.
    发明申请
    INDEX EXPLOITATION 失效
    指数开发

    公开(公告)号:US20090006314A1

    公开(公告)日:2009-01-01

    申请号:US11770607

    申请日:2007-06-28

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30929 G06F17/30979

    摘要: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that generate an index plan that produces a superset of data comprising the query result. In some embodiments, a computer-implemented method, computer program product, and data processing system produce a maximal-index-satisfiable query tree.

    摘要翻译: 提供了计算机实现的方法,计算机程序产品和数据处理系统的各种实施例,其生成产生包括查询结果的数据的超集的索引计划。 在一些实施例中,计算机实现的方法,计算机程序产品和数据处理系统产生最大索引可满足查询树。