METHOD AND APPARATUS FOR ADAPTIVE LOAD SHEDDING
    51.
    发明申请
    METHOD AND APPARATUS FOR ADAPTIVE LOAD SHEDDING 失效
    用于自适应载荷的方法和装置

    公开(公告)号:US20090049187A1

    公开(公告)日:2009-02-19

    申请号:US12165524

    申请日:2008-06-30

    IPC分类号: G06F15/16

    CPC分类号: H04L49/90

    摘要: One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.

    摘要翻译: 本发明的方法和设备的一个实施例是自适应负载脱落包括将至少一个数据流(包括多个元组或数据项)接收到存储器的第一滑动窗口中。 然后根据至少一个数据流操作(例如数据流加入操作)选择来自接收到的数据流的元组的子集用于处理。 未选择处理的元组将被忽略。 所选择的元组的数量和所选择的特定元组至少部分取决于各种动态参数,包括接收数据流(和任何其他处理的数据流)的速率,与接收到的数据流相关联的时间延迟, 对数据流执行的连接操作的方向和相对于预期输出的单个元组的值。

    System and method for load shedding in data mining and knowledge discovery from stream data
    52.
    发明授权
    System and method for load shedding in data mining and knowledge discovery from stream data 有权
    数据挖掘中的负载脱落和流数据的知识发现的系统和方法

    公开(公告)号:US07493346B2

    公开(公告)日:2009-02-17

    申请号:US11058944

    申请日:2005-02-16

    IPC分类号: G06F12/00 G06F17/30 G06F9/46

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    摘要翻译: 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。

    SYSTEMS AND METHODS FOR STRUCTURAL CLUSTERING OF TIME SEQUENCES
    53.
    发明申请
    SYSTEMS AND METHODS FOR STRUCTURAL CLUSTERING OF TIME SEQUENCES 审中-公开
    时间序列结构聚类的系统和方法

    公开(公告)号:US20080275671A1

    公开(公告)日:2008-11-06

    申请号:US12115824

    申请日:2008-05-06

    IPC分类号: G06F15/00

    摘要: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.

    摘要翻译: 在不同时间序列之间进行结构聚类的布置和方法。 接收与多个时间序列相关的时间序列数据,确定与时间序列数据相关的结构特征,并且确定通过采用结构特征的不同时间序列之间的至少一个距离。 可以基于至少一个距离将不同的时间序列划分成簇,并且可以返回基于至少一个距离的/或与给定时间序列查询的k个最接近的匹配。

    Content based method for product-peer filtering
    55.
    发明授权
    Content based method for product-peer filtering 有权
    基于内容的产品 - 对等过滤方法

    公开(公告)号:US06356879B2

    公开(公告)日:2002-03-12

    申请号:US09169029

    申请日:1998-10-09

    IPC分类号: G06F1760

    摘要: The present invention derives product characterizations for products offered at an e-commerce site based on the text descriptions of the products provided at the site. A customer characterization is generated for any customer browsing the e-commerce site. The characterizations include an aggregation of derived product characterizations associated with products bought and/or browsed by that customer. A peer group is formed by clustering customers having similar customer characterizations. Recommendations are then made to a customer based on the processed characterization and peer group data.

    摘要翻译: 本发明基于在现场提供的产品的文本描述,得出在电子商务站点提供的产品的产品特性。 为浏览电子商务网站的任何客户生成客户表征。 表征包括与由该客户购买和/或浏览的产品相关联的衍生产品表征的聚合。 通过对具有类似客户特征的客户进行聚类形成对等组。 然后根据经处理的特征和对等体组数据向客户提供建议。

    HMC: A hybrid mirror-and-chained data replication method to support high
data availability for disk arrays
    56.
    发明授权
    HMC: A hybrid mirror-and-chained data replication method to support high data availability for disk arrays 失效
    HMC:用于支持磁盘阵列的高数据可用性的混合镜像和链接数据复制方法

    公开(公告)号:US5559764A

    公开(公告)日:1996-09-24

    申请号:US292640

    申请日:1994-08-18

    摘要: A method of distributing a set of data among a plurality of disks, which provides for load balancing in the event of a disk failure. In accordance with the method the total number of the disks in an array are divided into a number of clusters. The blocks of data are then stored in each cluster such that each cluster contains a complete set of the data and such that data block placement in each cluster is a unique permutation of the data block placement in the other clusters. In the event of a disk failure, data block accesses to the failed disk are redirected to a disk in the other cluster having a copy of the data block and further access to the disks that remain operational are rebalanced.

    摘要翻译: 一种在多个磁盘之间分配一组数据的方法,其在磁盘故障的情况下提供负载平衡。 根据该方法,将阵列中的盘的总数分成多个簇。 然后将数据块存储在每个集群中,使得每个集群包含完整的数据集,并且使得每个集群中的数据块放置是其他集群中的数据块放置的唯一置换。 在发生磁盘故障的情况下,对故障磁盘的数据块访问将重定向到具有数据块副本的另一个集群中的磁盘,并且对保持运行的磁盘的进一步访问将重新平衡。

    Frame sampling scheme for video scanning in a video-on-demand system
    57.
    发明授权
    Frame sampling scheme for video scanning in a video-on-demand system 失效
    视频点播系统中视频扫描的帧采样方案

    公开(公告)号:US5521630A

    公开(公告)日:1996-05-28

    申请号:US222781

    申请日:1994-04-04

    摘要: A system and method for performing variable speed scanning or browsing, wherein a user controls the playout speed of a movie, which does not require additional disk or network bandwidth resources. In a preferred embodiment, the method provides for scanning operations for an Motion Picture Experts Group (MPEG) video stream. The method satisfies the constraints of the MPEG decoder (in the users set-top box) and require a minimum of additional system resources. The embodiments of the present invention include (a) a storage method, (b1) a segment sampling method, (b2) a segment placement method, and (c) a playout method, where (b1) and (b2) are two alternatives for segment selection. Thus, two sets of solutions are provided to support variable speed scanning in a disk-array-based video server: One using (a), (b1) and (c), and the other using (a), (b2) and (c).

    摘要翻译: 一种用于执行变速扫描或浏览的系统和方法,其中用户控制电影的播放速度,其不需要额外的磁盘或网络带宽资源。 在优选实施例中,该方法提供用于运动图像专家组(MPEG)视频流的扫描操作。 该方法满足MPEG解码器的限制(在用户机顶盒中),并且需要至少额外的系统资源。 本发明的实施例包括(a)存储方法,(b1)段采样方法,(b2)段放置方法和(c)播出方法,其中(b1)和(b2) 段选择。 因此,提供两组解决方案以支持基于磁盘阵列的视频服务器中的可变速度扫描:一种使用(a),(b1)和(c),另一种使用(a),(b2)和( C)。

    System and method for adaptive pruning
    58.
    发明授权
    System and method for adaptive pruning 失效
    自适应修剪的系统和方法

    公开(公告)号:US08301584B2

    公开(公告)日:2012-10-30

    申请号:US10737123

    申请日:2003-12-16

    IPC分类号: G06F7/00 G06F3/00

    CPC分类号: G06F17/30539 G06F17/30598

    摘要: Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence. As the level of confidence is raised, a sub-ensemble that has more models will be selected and as the level of confidence is lowered, a sub-ensemble that has fewer models will be selected. Finally, the invention applies the selected sub-ensemble, in place of the ensemble, to an example to make a prediction.

    摘要翻译: 公开了一种使用模型集合在数据库中搜索数据的方法和结构。 首先,发明执行训练。 这种训练按照预测精度的顺序对集合内的模型进行排序,并将不同数量的模型结合在一起形成子集合。 这些模型以预测精度的顺序连接在子集合中。 接下来在训练过程中,本发明计算每个子集合的置信度值。 信心是衡量子系统的结果与合奏结果相符的结果。 每个子集合的大小根据置信水平而变化,而相反,整体的大小是固定的。 训练后,本发明可以进行预测。 首先,本发明选择满足给定的置信水平的子集合。 随着信心的提高,将选择具有更多模型的子集合,并且随着置信度的降低,将选择具有较少模型的子集合。 最后,本发明将选择的子集合代替集合应用于一个例子进行预测。

    System and Method for Classifying Data Streams with Very Large Cardinality
    59.
    发明申请
    System and Method for Classifying Data Streams with Very Large Cardinality 失效
    用于分类具有非常大的基数的数据流的系统和方法

    公开(公告)号:US20120166382A1

    公开(公告)日:2012-06-28

    申请号:US13400863

    申请日:2012-02-21

    IPC分类号: G06N5/02

    CPC分类号: G06N99/005 G06K9/6267

    摘要: An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.

    摘要翻译: 识别描述该对象的对象和属性。 这些属性被分组成属性模式,并且识别分类类。 对于每个识别的类,创建包含多个并行哈希表的草图表。 对于要分类的对象,使用每个草图表的所有散列函数处理每个属性模式,从而在单个属性模式的每个草图表下产生多个值。 为每个草图表选择最低值。 对每个属性模式评估所有草图表中的值的分布,为每个属性模式产生歧视性的权力。 选择具有高于给定阈值的辨别力的属性模式并将其添加到关联的草图表值。 识别具有最大总和的草图表,并将关联的类分配给属于属性模式的对象。