SYSTEM, METHOD, AND APPARATUS FOR SCAN-SHARING FOR BUSINESS INTELLIGENCE QUERIES IN AN IN-MEMORY DATABASE
    1.
    发明申请
    SYSTEM, METHOD, AND APPARATUS FOR SCAN-SHARING FOR BUSINESS INTELLIGENCE QUERIES IN AN IN-MEMORY DATABASE 失效
    用于在内存数据库中进行业务智能扫描的扫描共享的系统,方法和装置

    公开(公告)号:US20110040744A1

    公开(公告)日:2011-02-17

    申请号:US12539471

    申请日:2009-08-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30445

    摘要: A computer-implemented method for scan sharing across multiple cores in a business intelligence (BI) query. The method includes receiving a plurality of BI queries, storing a block of data in a first cache, scanning the block of data in the first cache against a first batch of queries on a first processor core, and scanning the block of data against a second batch of queries on a second processor core. The first cache is associated with a first processor core. The block of data includes a subset of data stored in an in-memory database (IMDB). The first batch of queries includes two or more of the BI queries. The second batch of queries includes one or more of the BI queries that are not included in the first batch of queries.

    摘要翻译: 一种用于在商业智能(BI)查询中跨多个核心进行扫描共享的计算机实现的方法。 该方法包括接收多个BI查询,将数据块存储在第一高速缓存中,针对第一处理器核心上的第一批查询扫描第一高速缓存中的数据块,并且针对第二缓冲区扫描数据块 批次在第二个处理器核心上的查询。 第一缓存与第一处理器核心相关联。 数据块包括存储在内存数据库(IMDB)中的数据子集。 第一批查询包括两个或多个BI查询。 第二批查询包括未包含在第一批查询中的一个或多个BI查询。

    COMPACT AGGREGATION WORKING AREAS FOR EFFICIENT GROUPING AND AGGREGATION USING MULTI-CORE CPUS
    2.
    发明申请
    COMPACT AGGREGATION WORKING AREAS FOR EFFICIENT GROUPING AND AGGREGATION USING MULTI-CORE CPUS 失效
    使用多核心CPUs进行有效分组和聚合的紧凑聚合工作区域

    公开(公告)号:US20120078980A1

    公开(公告)日:2012-03-29

    申请号:US12889789

    申请日:2010-09-24

    IPC分类号: G06F17/30 G06F12/08

    CPC分类号: G06F17/30501 G06F17/30489

    摘要: A system is described for creating compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs. The system implements operations including computing a running aggregate for a group within a business intelligence (BI) query, and identifying a location to store running aggregate information within an aggregation working area of a cache. The aggregation working area includes first and second data structures. The first data structure stores running aggregate information that is associated with a group that is accessed frequently relative to a threshold. The second data structure stores running aggregate information that is associated with a group that is accessed infrequently relative to the threshold. The operations also include storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group.

    摘要翻译: 描述了一种系统,用于创建紧凑的聚合工作区域,以便使用多核CPU进行有效的分组和聚合。 系统实现操作,包括计算商业智能(BI)查询中的组的运行聚合,以及标识在高速缓存的聚合工作区域内存储运行聚合信息的位置。 聚合工作区包括第一和第二数据结构。 第一数据结构存储与经常相对于阈值被访问的组相关联的运行聚合信息。 第二数据结构存储与相对于阈值不经常访问的组相关联的运行聚合信息。 所述操作还包括基于所述组的特征化将所述运行的聚合信息存储在所述聚合工作区域的第一或第二数据结构中,作为频繁或不经常访问的组。

    Method for Laying Out Fields in a Database in a Hybrid of Row-Wise and Column-Wise Ordering
    3.
    发明申请
    Method for Laying Out Fields in a Database in a Hybrid of Row-Wise and Column-Wise Ordering 有权
    在行列智慧排序的混合数据库中放置字段的方法

    公开(公告)号:US20100042587A1

    公开(公告)日:2010-02-18

    申请号:US12192504

    申请日:2008-08-15

    IPC分类号: G06F17/30 G06F17/00

    CPC分类号: G06F17/30315 G06F17/30519

    摘要: A method, system, and article are provided for employment of a hybrid layout of representation of data objects in computer memory. Columns of the database are separated based upon a classification of the columns. A vertical partition in the form of a bank is provided to receive an assignment of one or more data objects identified in the columns. Each bank is sized to be a divisor of a size of an associated hardware register. Assignment of data objects to banks organizes the data in a manner that support efficient query processing that mitigates the quantity of banks required to respond to the query.

    摘要翻译: 提供了一种方法,系统和文章,用于使用计算机内存中数据对象表示的混合布局。 基于列的分类来分隔数据库的列。 提供呈银行形式的垂直分区以接收在列中识别的一个或多个数据对象的分配。 每个银行的大小都是相关硬件寄存器大小的除数。 将数据对象分配给银行以支持有效查询处理的方式组织数据,以减轻响应查询所需的银行数量。

    Method for laying out fields in a database in a hybrid of row-wise and column-wise ordering
    4.
    发明授权
    Method for laying out fields in a database in a hybrid of row-wise and column-wise ordering 有权
    在数据库中以行和列顺序排列字段的方法

    公开(公告)号:US08099440B2

    公开(公告)日:2012-01-17

    申请号:US12192504

    申请日:2008-08-15

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30315 G06F17/30519

    摘要: A method, system, and article are provided for employment of a hybrid layout of representation of data objects in computer memory. Columns of the database are separated based upon a classification of the columns. A vertical partition in the form of a bank is provided to receive an assignment of one or more data objects identified in the columns. Each bank is sized to be a divisor of a size of an associated hardware register. Assignment of data objects to banks organizes the data in a manner that support efficient query processing that mitigates the quantity of banks required to respond to the query.

    摘要翻译: 提供了一种方法,系统和文章,用于使用计算机内存中数据对象表示的混合布局。 基于列的分类来分隔数据库的列。 提供呈银行形式的垂直分区以接收在列中识别的一个或多个数据对象的分配。 每个银行的大小都是相关硬件寄存器大小的除数。 将数据对象分配给银行以支持有效查询处理的方式组织数据,以减轻响应查询所需的银行数量。

    Refining a dictionary for information extraction
    5.
    发明授权
    Refining a dictionary for information extraction 失效
    修改信息提取字典

    公开(公告)号:US08775419B2

    公开(公告)日:2014-07-08

    申请号:US13598946

    申请日:2012-08-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A method for refining a dictionary for information extraction, the operations including: inputting a set of extracted results from execution of an extractor comprising the dictionary on a collection of text, wherein the extracted results are labeled as correct results or incorrect results; processing the extracted results using an algorithm configured to set a score of the extractor above a score threshold, wherein the score threshold balances a precision and a recall of the extractor; and outputting a set of candidate dictionary entries corresponding to a full set of dictionary entries, wherein the candidate dictionary entries are candidates to be removed from the dictionary based on the extracted results.

    摘要翻译: 一种用于提炼用于信息提取的词典的方法,所述操作包括:在文本集合上输入包括字典的提取器的执行中提取的结果集合,其中所提取的结果被标记为正确的结果或不正确的结果; 使用被配置为将提取器的分数设置在分数阈值之上的算法来处理提取的结果,其中分数阈值平衡提取器的精度和回忆; 并输出与一组完整的字典条目对应的一组候选字典条目,其中候选字典条目是根据提取的结果从字典中删除的候选。

    DICTIONARY REFINEMENT FOR INFORMATION EXTRACTION
    6.
    发明申请
    DICTIONARY REFINEMENT FOR INFORMATION EXTRACTION 审中-公开
    信息提取的词典修订

    公开(公告)号:US20130318075A1

    公开(公告)日:2013-11-28

    申请号:US13480974

    申请日:2012-05-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A method for refining a dictionary for information extraction, the operations including: inputting a set of extracted results from execution of an extractor comprising the dictionary on a collection of text, wherein the extracted results are labeled as correct results or incorrect results; processing the extracted results using an algorithm configured to set a score of the extractor above a score threshold, wherein the score threshold balances a precision and a recall of the extractor; and outputting a set of candidate dictionary entries corresponding to a full set of dictionary entries, wherein the candidate dictionary entries are candidates to be removed from the dictionary based on the extracted results.

    摘要翻译: 一种用于提炼用于信息提取的词典的方法,所述操作包括:在文本集合上输入包括字典的提取器的执行中提取的结果集合,其中所提取的结果被标记为正确的结果或不正确的结果; 使用被配置为将提取器的分数设置在分数阈值之上的算法来处理提取的结果,其中分数阈值平衡提取器的精度和回忆; 并输出与一组完整的字典条目对应的一组候选字典条目,其中候选字典条目是根据提取的结果从字典中删除的候选。

    Automatic refinement of information extraction rules
    7.
    发明授权
    Automatic refinement of information extraction rules 有权
    自动细化信息提取规则

    公开(公告)号:US08417709B2

    公开(公告)日:2013-04-09

    申请号:US12788407

    申请日:2010-05-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A method and system for automatically refining information extraction (IE) rules. A provenance graph for IE rules on a set of test documents is determined. The provenance graph indicates a sequence of evaluations of the IE rules that generates an output of each operator of the IE rules. Based on the provenance graph, high-level rule changes (HLCs) of the IE rules are determined. Low-level rule changes (LLCs) of the IE rules are determined to specify how to implement the HLCs. Each LLC specifies changing an operator's structure or inserting a new operator in between two operators. Based on how the LLCs affect the IE rules and previously received correct results of applying the rules on the test documents, a ranked list of the LLCs is determined. The IE rules are refined based on the ranked list.

    摘要翻译: 一种自动提炼信息提取(IE)规则的方法和系统。 确定一组测试文件上的IE规则的原始图。 来源图表示生成IE规则的每个运算符的输出的IE规则的评估序列。 根据来源图,确定IE规则的高级规则变更(HLC)。 确定IE规则的低级规则更改(LLC)以指定如何实现HLC。 每个LLC指定更改操作员的结构或在两个操作符之间插入一个新的操作符。 根据LLC如何影响IE规则,并且先前收到在测试文档上应用规则的正确结果,确定LLC的排名列表。 IE规则根据排名列表进行细化。

    EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM
    8.
    发明申请
    EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM 有权
    用于数据处理系统中信息提取的可扩展系统和方法

    公开(公告)号:US20120209844A1

    公开(公告)日:2012-08-16

    申请号:US13413893

    申请日:2012-03-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30082 G06F17/30616

    摘要: A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream.

    摘要翻译: 一种具有用于接收多个文本数据流的信息提取能力的数据混搭系统,其中至少一个包含非结构化文本数据。 存储库存储描述如何分析指定的非结构化数据组件的文本数据流的注释器。 注释器应用于数据流,以根据注释器识别和提取指定的数据组件。 提取的数据组件被标记以生成结构化数据组件,并且输入数据流中的指定非结构化数据组件被标记的数据组件替换。 然后,系统将来自多个流的标记数据组合以形成混搭输出数据流。

    REFINING A DICTIONARY FOR INFORMATION EXTRACTION
    10.
    发明申请
    REFINING A DICTIONARY FOR INFORMATION EXTRACTION 失效
    修改信息提取的词典

    公开(公告)号:US20130318076A1

    公开(公告)日:2013-11-28

    申请号:US13598946

    申请日:2012-08-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A method for refining a dictionary for information extraction, the operations including: inputting a set of extracted results from execution of an extractor comprising the dictionary on a collection of text, wherein the extracted results are labeled as correct results or incorrect results; processing the extracted results using an algorithm configured to set a score of the extractor above a score threshold, wherein the score threshold balances a precision and a recall of the extractor; and outputting a set of candidate dictionary entries corresponding to a full set of dictionary entries, wherein the candidate dictionary entries are candidates to be removed from the dictionary based on the extracted results.

    摘要翻译: 一种用于提炼用于信息提取的词典的方法,所述操作包括:在文本集合上输入包括字典的提取器的执行中提取的结果集合,其中所提取的结果被标记为正确的结果或不正确的结果; 使用被配置为将提取器的分数设置在分数阈值之上的算法来处理提取的结果,其中分数阈值平衡提取器的精度和回忆; 并输出与一组完整的字典条目对应的一组候选字典条目,其中候选字典条目是根据提取的结果从字典中删除的候选。