Generating a predictive model from multiple data sources
    1.
    发明授权
    Generating a predictive model from multiple data sources 有权
    从多个数据源生成预测模型

    公开(公告)号:US08996452B2

    公开(公告)日:2015-03-31

    申请号:US13545817

    申请日:2012-07-10

    IPC分类号: G06F7/00 G06F17/00 G06Q10/06

    CPC分类号: G06Q10/06

    摘要: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

    摘要翻译: 公开了用于从多个数据源生成集合模型的技术。 在一个实施例中,使用全局验证样本,全局保持样本和从多个数据源生成的基本模型来生成集合模型。 可以基于全局验证数据集为每个基本模型确定精度值。 集合模型可以从基本模型的子集生成,其中基于确定的精度值选择子集。

    GENERATING A PREDICTIVE MODEL FROM MULTIPLE DATA SOURCES
    3.
    发明申请
    GENERATING A PREDICTIVE MODEL FROM MULTIPLE DATA SOURCES 有权
    从多个数据源生成预测模型

    公开(公告)号:US20120239613A1

    公开(公告)日:2012-09-20

    申请号:US13048536

    申请日:2011-03-15

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    CPC分类号: G06Q10/06

    摘要: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

    摘要翻译: 公开了用于从多个数据源生成集合模型的技术。 在一个实施例中,使用全局验证样本,全局保持样本和从多个数据源生成的基本模型来生成集合模型。 可以基于全局验证数据集为每个基本模型确定精度值。 集合模型可以从基本模型的子集生成,其中基于确定的精度值选择子集。

    Computing and applying order statistics for data preparation
    5.
    发明授权
    Computing and applying order statistics for data preparation 有权
    计算和应用订单统计数据进行准备

    公开(公告)号:US08868573B2

    公开(公告)日:2014-10-21

    申请号:US13444718

    申请日:2012-04-11

    IPC分类号: G06F7/00

    摘要: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.

    摘要翻译: 提供了用于生成订单统计和错误界限的技术。 对于多个分布式数据源中的每一个,为数据源中的每个字段创建有限数量的数据仓。 处理多个分布式数据源中的每一个中的数据值,以便在单次数据值中为每个数据仓生成基本摘要。 来自多个分布式数据源中的每一个的数据仓被排序。 通过从多个排序的数据仓中累积计数,为数据集计算一个或多个近似顺序统计量。 为所计算的一个或多个近似秩统计中的每一个提供下限和上限误差界限,其中下限误差界限和上限误差界限是定义包含订单统计量的真实值的间隔的值。

    COMPUTING AND APPLYING ORDER STATISTICS FOR DATA PREPARATION
    6.
    发明申请
    COMPUTING AND APPLYING ORDER STATISTICS FOR DATA PREPARATION 审中-公开
    计算和应用订单统计数据准备

    公开(公告)号:US20130218908A1

    公开(公告)日:2013-08-22

    申请号:US13399838

    申请日:2012-02-17

    IPC分类号: G06F17/30

    摘要: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.

    摘要翻译: 提供了用于生成订单统计和错误界限的技术。 对于多个分布式数据源中的每一个,为数据源中的每个字段创建有限数量的数据仓。 处理多个分布式数据源中的每一个中的数据值,以便在单次数据值中为每个数据仓生成基本摘要。 来自多个分布式数据源中的每一个的数据仓被排序。 通过从多个排序的数据仓中累积计数,为数据集计算一个或多个近似顺序统计量。 为所计算的一个或多个近似秩统计中的每一个提供下限和上限误差界限,其中下限误差界限和上限误差界限是定义包含订单统计量的真实值的间隔的值。

    Step detection and step length estimation
    8.
    发明授权
    Step detection and step length estimation 有权
    步长检测和步长估计

    公开(公告)号:US08831909B2

    公开(公告)日:2014-09-09

    申请号:US13240743

    申请日:2011-09-22

    摘要: Step detection and step length estimation techniques include detecting salient points in sensor data of one or more sensors. A step frequency is estimated based on a time interval between the detected salient points. A step length of the step may then be computed based on a nonlinear combination of the estimated step frequency and a function of the sensor data, and/or a step model. Alternatively, the step length of the step may be computed based on a combination of a nonlinear function of the estimated step frequency and a (linear or nonlinear) function of the sensor data, and/or a step model.

    摘要翻译: 步骤检测和步长估计技术包括检测一个或多个传感器的传感器数据中的突出点。 基于检测到的突出点之间的时间间隔来估计步进频率。 然后可以基于估计的步进频率和传感器数据的函数的非线性组合和/或步骤模型来计算步长的步长。 或者,可以基于估计的步进频率的非线性函数和传感器数据的(线性或非线性)函数和/或步骤模型的组合来计算步长的步长。

    Semantic and Text Matching Techniques for Network Search
    9.
    发明申请
    Semantic and Text Matching Techniques for Network Search 有权
    网络搜索的语义和文本匹配技术

    公开(公告)号:US20110072021A1

    公开(公告)日:2011-03-24

    申请号:US12563357

    申请日:2009-09-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.

    摘要翻译: 在一个实施例中,访问包括一个或多个查询词的搜索查询,表示一个或多个查询概念的查询词中的至少一个; 访问由搜索引擎识别为搜索查询的网络文档,所述网络文档包括一个或多个文档字,所述文档字中的至少一个表示一个或多个文档概念; 语义文本匹配搜索查询和网络文档以确定一个或多个否定语义文本匹配; 并基于负面语义文本匹配构造一个或多个负面特征。