Online analytic processing in the presence of uncertainties
    1.
    发明申请
    Online analytic processing in the presence of uncertainties 审中-公开
    在线分析处理存在不确定性

    公开(公告)号:US20070233651A1

    公开(公告)日:2007-10-04

    申请号:US11395403

    申请日:2006-03-31

    IPC分类号: G06F17/30

    CPC分类号: G06F16/24556

    摘要: Disclosed are embodiments of a method for online analytic processing of queries and, and more particularly, of a method that extends the on-line analytic processing (OLAP) data model to represent data ambiguity, such as imprecision and uncertainty, in data values. Specifically, the embodiments of the method incorporate a statistical model that allows for uncertain measures to be modeled as conditional probabilities. Additionally, an embodiment of the method further identifies natural query properties (e.g., consistency and faithfulness) and uses them to shed light on alternative query semantics. Lastly, an embodiment of the method further introduces an allocation-based approach to the semantics of aggregation queries over such data.

    摘要翻译: 公开了用于查询的在线分析处理的方法的实施例,并且更具体地,扩展在线分析处理(OLAP)数据模型以表示数据值中的数据模糊性,诸如不精确性和不确定性的方法的实施例。 具体地说,该方法的实施例包括允许将不确定度量建模为条件概率的统计模型。 此外,该方法的一个实施例进一步标识自然查询属性(例如,一致性和忠实性),并使用它们来阐明替代查询语义。 最后,该方法的一个实施例进一步引入基于分配的方法来处理关于这种数据的聚合查询的语义。

    Method to hierarchical pooling of opinions from multiple sources
    2.
    发明申请
    Method to hierarchical pooling of opinions from multiple sources 有权
    从多个来源层次分组意见的方法

    公开(公告)号:US20050114161A1

    公开(公告)日:2005-05-26

    申请号:US10723471

    申请日:2003-11-26

    IPC分类号: G06Q30/00 G06F17/60

    CPC分类号: G06Q30/02 G06Q30/0282

    摘要: Disclosed is a system, method, and program storage device of aggregating opinions comprising consolidating a plurality of expressed opinions on various dimensions of topics as discrete probability distributions, generating an aggregate opinion as a single point probability distribution by minimizing a sum of weighted divergences between a plurality of the discrete probability distributions, and presenting the aggregate opinion as a Bayesian network, wherein the divergences comprise Kullback-Liebler distance divergences, and wherein the expressed opinions are generated by experts and comprise opinions on sentiments of products and services. Moreover, the aggregate opinion predicts success of the products and services. Furthermore, the experts are arranged in a hierarchy of knowledge, wherein the knowledge comprises the various dimensions of topics for which opinions may be expressed upon.

    摘要翻译: 公开了一种集合意见的系统,方法和程序存储装置,包括将关于主题的各个维度的多个表达的意见合并为离散概率分布,通过最小化一个点概率分布的加权差异之和来生成聚合意见作为单点概率分布 多个离散概率分布,并将总体意见呈现为贝叶斯网络,其中分歧包括Kullback-Liebler距离差异,并且其中所表达的意见由专家产生并且包括对产品和服务的感觉的意见。 此外,总体意见预测产品和服务的成功。 此外,专家们被安排在知识层次中,其中知识包括可以表达意见的主题的各个维度。

    Dynamic Resource Allocation Using Projected Future Benefits
    4.
    发明申请
    Dynamic Resource Allocation Using Projected Future Benefits 失效
    动态资源分配利用预期的未来收益

    公开(公告)号:US20080033774A1

    公开(公告)日:2008-02-07

    申请号:US11861663

    申请日:2007-09-26

    IPC分类号: G06Q10/00

    摘要: A method for server allocation in a Web server “farm” is based on limited information regarding future loads to achieve close to the greatest possible revenue based on the assumption that revenue is proportional to the utilization of servers and differentiated by customer class. The method of server allocation uses an approach of “discounting the future”. Specifically, when the policy faces the choice between a guaranteed benefit immediately and a potential benefit in the future, the decision is made by comparing the guaranteed benefit value with a discounted value of the potential future benefit. This discount factor is exponential in the number of time units that it would take a potential benefit to be materialized. The future benefits are discounted because by the time a benefit will be materialized, things might change and the algorithm might decide to make another choice for a potential (even greater) benefit.

    摘要翻译: Web服务器“farm”中的服务器分配方法基于有限的关于未来负载的信息,以实现接近最大可能收入的假设,即假设收入与服务器的利用率成正比,并根据客户类别区分。 服务器分配的方法采用“贴现未来”的方法。 具体来说,当政策面临保证收益立即与潜在利益之间的选择时,通过将担保收益值与潜在未来收益的折扣价值进行比较来做出决策。 这个折扣因子是要实现潜在收益的时间单位数量的指数。 未来的利益是折扣的,因为在实现利益的时候,事情可能会改变,算法可能决定为潜力(甚至更大)的利益作出另一个选择。

    Method of obtaining data samples from a data stream and of estimating the sortedness of the data stream based on the samples
    5.
    发明申请
    Method of obtaining data samples from a data stream and of estimating the sortedness of the data stream based on the samples 有权
    从数据流获取数据样本并基于样本估计数据流的排序的方法

    公开(公告)号:US20070244891A1

    公开(公告)日:2007-10-18

    申请号:US11405994

    申请日:2006-04-18

    IPC分类号: G06F17/30

    CPC分类号: G06F7/22 G06F17/30864

    摘要: Disclosed is a method of scanning a data stream in a single pass to obtain uniform data samples from selected intervals. The method comprises randomly selecting elements from the stream for storage in one or more data buckets and, then, randomly selecting multiple samples from the bucket(s). Each sample is associated with a specified interval immediately prior to a selected point in time. There is a balance of probabilities between the selection of elements stored in the bucket and the selection of elements included in the samples so that elements scanned during the specified interval are included in the sample with equal probability. Samples can then be used to estimate the degree of sortedness of the stream, based on counting how many elements in the sequence are the rightmost point of an interval such that majority of the interval's elements are inverted with respect to the interval's rightmost element.

    摘要翻译: 公开了一种在单次扫描中扫描数据流以从选定间隔获得均匀数据样本的方法。 该方法包括从流中随机选择元素以存储在一个或多个数据桶中,然后从桶随机选择多个样本。 每个样本在选定的时间点之前与指定的间隔相关联。 在存储在桶中的元素的选择和包含在样本中的元素的选择之间存在概率的平衡,使得在指定间隔期间扫描的元素以相等的概率被包含在样本中。 然后可以使用样本来估计流的排序程度,这是基于计数序列中的多少个元素是间隔的最右点,使得大部分间隔的元素相对于间隔的最右边的元素被反转。

    System and method for detecting matches of small edit distance
    6.
    发明申请
    System and method for detecting matches of small edit distance 审中-公开
    用于检测小编辑距离匹配的系统和方法

    公开(公告)号:US20070085716A1

    公开(公告)日:2007-04-19

    申请号:US11241468

    申请日:2005-09-30

    IPC分类号: H03M7/30

    CPC分类号: G06F16/90344

    摘要: A system and method of approximating edit distance for a set of character strings in a database includes producing a representative sketch for each of the character strings; and approximating an edit distance between two selected character strings based only on the representative sketch for each of the selected character strings. The character strings may comprise text, wherein the method further comprises encoding positions of substrings in the text using anchors, wherein the anchors comprise identical substrings occurring in two input character strings at a nearby position. A set of anchors may be used in a correlated manner, wherein character strings with a sufficiently small edit distance are likely to use a same sequence of anchors. The character strings may be substantially non-repetitive. The representative sketch of a first character string is preferably constructed absent knowledge of a second character string. A size of the representative sketch may be constant.

    摘要翻译: 近似数据库中的一组字符串的编辑距离的系统和方法包括为每个字符串产生代表性的草图; 并且仅基于每个所选择的字符串的代表性草图来近似两个所选字符串之间的编辑距离。 字符串可以包括文本,其中该方法还包括使用锚点对文本中的子串的位置进行编码,其中锚点包括在附近位置处的两个输入字符串中出现的相同的子串。 可以以相关方式使用一组锚,其中具有足够小的编辑距离的字符串可能使用相同的锚点序列。 字符串可以是基本上不重复的。 优选地构造第一个字符串的代表性草图而不知道第二个字符串。 代表性草图的大小可能不变。