Systems and methods for large-scale randomized optimization for problems with decomposable loss functions
    1.
    发明授权
    Systems and methods for large-scale randomized optimization for problems with decomposable loss functions 有权
    用于分解损失函数问题的大规模随机优化的系统和方法

    公开(公告)号:US08903748B2

    公开(公告)日:2014-12-02

    申请号:US13169618

    申请日:2011-06-27

    摘要: Systems and methods directed toward processing optimization problems using loss functions, wherein a loss function is decomposed into at least one stratum loss function, a loss is decreased for each stratum loss function to a predefined stratum loss threshold individually using gradient descent, and the overall loss is decreased to a predefined threshold for the loss function by appropriately ordering the processing of the strata and spending appropriate processing time in each stratum. Other embodiments and aspects are also described herein.

    摘要翻译: 针对使用损失函数来处理优化问题的系统和方法,其中损失函数被分解成至少一个层损失函数,每个层损失函数的损失都减少到单独使用梯度下降的预定义层损失阈值,并且总体损耗 通过适当地排序层的处理并在每个层中消耗适当的处理时间来减少到损失函数的预定阈值。 本文还描述了其它实施例和方面。

    Systems and methods for large-scale randomized optimization for problems with decomposable loss functions
    3.
    发明授权
    Systems and methods for large-scale randomized optimization for problems with decomposable loss functions 有权
    用于分解损失函数问题的大规模随机优化的系统和方法

    公开(公告)号:US08983879B2

    公开(公告)日:2015-03-17

    申请号:US13595618

    申请日:2012-08-27

    摘要: Systems and methods directed toward processing optimization problems using loss functions, wherein a loss function is decomposed into at least one stratum loss function, a loss is decreased for each stratum loss function to a predefined stratum loss threshold individually using gradient descent, and the overall loss is decreased to a predefined threshold for the loss function by appropriately ordering the processing of the strata and spending appropriate processing time in each stratum. Other embodiments and aspects are also described herein.

    摘要翻译: 针对使用损失函数来处理优化问题的系统和方法,其中损失函数被分解成至少一个层损失函数,每个层损失函数的损失都减少到单独使用梯度下降的预定义层损失阈值,并且总体损耗 通过适当地排序层的处理并在每个层中消耗适当的处理时间来减少到损失函数的预定阈值。 本文还描述了其它实施例和方面。

    SYSTEMS AND METHODS FOR LARGE-SCALE RANDOMIZED OPTIMIZATION FOR PROBLEMS WITH DECOMPOSABLE LOSS FUNCTIONS
    4.
    发明申请
    SYSTEMS AND METHODS FOR LARGE-SCALE RANDOMIZED OPTIMIZATION FOR PROBLEMS WITH DECOMPOSABLE LOSS FUNCTIONS 审中-公开
    用于具有可分解损失函数的问题的大规模随机优化的系统和方法

    公开(公告)号:US20120331025A1

    公开(公告)日:2012-12-27

    申请号:US13595618

    申请日:2012-08-27

    IPC分类号: G06F7/38

    摘要: Systems and methods directed toward processing optimization problems using loss functions, wherein a loss function is decomposed into at least one stratum loss function, a loss is decreased for each stratum loss function to a predefined stratum loss threshold individually using gradient descent, and the overall loss is decreased to a predefined threshold for the loss function by appropriately ordering the processing of the strata and spending appropriate processing time in each stratum. Other embodiments and aspects are also described herein.

    摘要翻译: 针对使用损失函数来处理优化问题的系统和方法,其中损失函数被分解成至少一个层损失函数,每个层损失函数的损失都减少到单独使用梯度下降的预定义层损失阈值,并且总体损耗 通过适当地排序层的处理并在每个层中消耗适当的处理时间来减少到损失函数的预定阈值。 本文还描述了其它实施例和方面。

    Method for estimating the number of distinct values in a partitioned dataset
    5.
    发明授权
    Method for estimating the number of distinct values in a partitioned dataset 有权
    用于估计分区数据集中不同值的数量的方法

    公开(公告)号:US07987177B2

    公开(公告)日:2011-07-26

    申请号:US12022601

    申请日:2008-01-30

    IPC分类号: G06F17/00 G06F17/30

    CPC分类号: G06F17/30536 G06F17/30469

    摘要: The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.

    摘要翻译: 在大数据集中估计不同值(DV)的数量的任务出现在计算机科学和其他地方的各种设置中。 本发明提供了在分区数据集的设置中的DV估计的概要,以及利用这些概要的对应的DV估计器。 无论何时通过一对(可能是复合)输入分区上的多集合操作创建输出复合数据分区,可以通过组合输入分区的概要来获取输出分区的概要。 如果输入分区是复合分区,则不需要访问用于构建输入分区的所有基本分区的概要。 维持DV估计中的优异(在某些情况下接近最佳)的准确度,特别是当概要大小较小时。 概要可以并行创建,也可以处理各个分区元素的删除。

    Method for Estimating the Number of Distinct Values in a Partitioned Dataset
    6.
    发明申请
    Method for Estimating the Number of Distinct Values in a Partitioned Dataset 有权
    用于估计分区数据集中不同值的数量的方法

    公开(公告)号:US20090192980A1

    公开(公告)日:2009-07-30

    申请号:US12022601

    申请日:2008-01-30

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30536 G06F17/30469

    摘要: The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.

    摘要翻译: 在大数据集中估计不同值(DV)的数量的任务出现在计算机科学和其他地方的各种设置中。 本发明提供了在分区数据集的设置中的DV估计的概要,以及利用这些概要的对应的DV估计器。 无论何时通过一对(可能是复合)输入分区上的多集合操作创建输出复合数据分区,可以通过组合输入分区的概要来获得输出分区的概要。 如果输入分区是复合分区,则不需要访问用于构建输入分区的所有基本分区的概要。 维持DV估计中的优异(在某些情况下接近最佳)的准确度,特别是当概要大小较小时。 概要可以并行创建,也可以处理各个分区元素的删除。

    System and method for maintaining and utilizing Bernoulli samples over evolving multisets
    7.
    发明授权
    System and method for maintaining and utilizing Bernoulli samples over evolving multisets 有权
    使用伯努利样本进化演化的多集群的系统和方法

    公开(公告)号:US08140466B2

    公开(公告)日:2012-03-20

    申请号:US12101985

    申请日:2008-04-24

    IPC分类号: G06F15/00 G06F15/18

    摘要: One embodiment of the present invention provides a method for incrementally maintaining a Bernoulli sample S with sampling rate q over a multiset R in the presence of update, delete, and insert transactions. The method includes processing items inserted into R using Bernoulli sampling and augmenting S with tracking counters during this processing. Items deleted from R are processed by using the tracking counters and by removing newly deleted items from S using a calculated probability while maintaining a degree of uniformity in S.

    摘要翻译: 本发明的一个实施例提供了一种用于在存在更新,删除和插入事务的情况下通过多集群R递增地维持具有采样率q的伯努利样本S的方法。 该方法包括使用伯努利抽样处理插入到R中的项目,并在该处理期间用追踪计数器增加S。 从R中删除的项目通过使用跟踪计数器进行处理,并使用计算的概率从S中删除新删除的项目,同时保持S中的均匀度。

    SYSTEM AND METHOD FOR MAINTAINING AND UTILIZING BERNOULLI SAMPLES OVER EVOLVING MULTISETS
    8.
    发明申请
    SYSTEM AND METHOD FOR MAINTAINING AND UTILIZING BERNOULLI SAMPLES OVER EVOLVING MULTISETS 有权
    用于维护和利用BERNOULLI样品进行多层次扫描的系统和方法

    公开(公告)号:US20090271421A1

    公开(公告)日:2009-10-29

    申请号:US12101985

    申请日:2008-04-24

    IPC分类号: G06F17/30

    摘要: One embodiment of the present invention provides a method for incrementally maintaining a Bernoulli sample S with sampling rate q over a multiset R in the presence of update, delete, and insert transactions. The method includes processing items inserted into R using Bernoulli sampling and augmenting S with tracking counters during this processing. Items deleted from R are processed by using the tracking counters and by removing newly deleted items from S using a calculated probability while maintaining a degree of uniformity in S.

    摘要翻译: 本发明的一个实施例提供了一种用于在存在更新,删除和插入事务的情况下通过多集群R递增地维持具有采样率q的伯努利样本S的方法。 该方法包括使用伯努利抽样处理插入到R中的项目,并在该处理期间用追踪计数器增加S。 从R中删除的项目通过使用跟踪计数器进行处理,并使用计算的概率从S中删除新删除的项目,同时保持S中的均匀度。

    Method for maintaining a sample synopsis under arbitrary insertions and deletions
    9.
    发明授权
    Method for maintaining a sample synopsis under arbitrary insertions and deletions 有权
    在任意插入和缺失下维护样品概要的方法

    公开(公告)号:US07536403B2

    公开(公告)日:2009-05-19

    申请号:US11615481

    申请日:2006-12-22

    IPC分类号: G06F17/30

    摘要: A method of incrementally maintaining a stable, bounded, uniform random sample S from a dataset R, in the presence of arbitrary insertions and deletions to the dataset R, and without accesses to the dataset R, comprises a random pairing method in which deletions are uncompensated until compensated by a subsequent insertion (randomly paired to the deletion) by including the insertion's item into S if and only if the uncompensated deletion's item was removed from S (i.e., was in S so that it could be removed). A method for resizing a sample to a new uniform sample of increased size while maintaining a bound on the sample size and balancing cost between dataset accesses and transactions to the dataset is also disclosed. A method for maintaining uniform, bounded samples for a dataset in the presence of growth in size of the dataset is additionally disclosed.

    摘要翻译: 在数据集R的任意插入和删除的存在下,并且不访问数据集R的情况下,从数据集R中增加维持稳定的,有界的均匀随机样本S的方法包括其中缺失被补偿的随机配对方法 直到通过随后的插入(随机配对删除)来补偿,通过将插入的项目包含在S中,并且仅当未经补偿的删除项目从S中移除(即,在S中才能将其删除)。 还公开了一种将样本调整到增加大小的新的统一样本的方法,同时保持对样本大小的限制并且将数据集访问和事务之间的成本平衡到数据集。 另外公开了一种在存在数据集大小的情况下为数据集维持统一的有界样本的方法。

    Method for maintaining a sample synopsis under arbitrary insertions and deletions
    10.
    发明授权
    Method for maintaining a sample synopsis under arbitrary insertions and deletions 有权
    在任意插入和缺失下维护样品概要的方法

    公开(公告)号:US07827211B2

    公开(公告)日:2010-11-02

    申请号:US12054298

    申请日:2008-03-24

    IPC分类号: G06F17/30

    摘要: A method of incrementally maintaining a stable, bounded, uniform random sample S from a dataset R, in the presence of arbitrary insertions and deletions to the dataset R, and without accesses to the dataset R, comprises a random pairing method in which deletions are uncompensated until compensated by a subsequent insertion (randomly paired to the deletion) by including the insertion's item into S if and only if the uncompensated deletion's item was removed from S (i.e., was in S so that it could be removed). A method for resizing a sample to a new uniform sample of increased size while maintaining a bound on the sample size and balancing cost between dataset accesses and transactions to the dataset is also disclosed. A method for maintaining uniform, bounded samples for a dataset in the presence of growth in size of the dataset is additionally disclosed.

    摘要翻译: 在数据集R的任意插入和删除的存在下,并且不访问数据集R的情况下,从数据集R中增加维持稳定的,有界的均匀随机样本S的方法包括其中缺失被补偿的随机配对方法 直到通过随后的插入(随机配对删除)来补偿,通过将插入的项目包含在S中,并且仅当未经补偿的删除项目从S中移除(即,在S中才能将其删除)。 还公开了一种将样本调整到增加大小的新的统一样本的方法,同时保持对样本大小的限制并且将数据集访问和事务之间的成本平衡到数据集。 另外公开了一种在存在数据集大小的情况下为数据集维持统一的有界样本的方法。