Method and apparatus for adaptive load shedding
    11.
    发明授权
    Method and apparatus for adaptive load shedding 失效
    自适应负荷脱落的方法和装置

    公开(公告)号:US08117331B2

    公开(公告)日:2012-02-14

    申请号:US12165524

    申请日:2008-06-30

    IPC分类号: G06F15/16

    CPC分类号: H04L49/90

    摘要: One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.

    摘要翻译: 本发明的方法和设备的一个实施例是自适应负载脱落包括将至少一个数据流(包括多个元组或数据项)接收到存储器的第一滑动窗口中。 然后根据至少一个数据流操作(例如数据流加入操作)选择来自接收到的数据流的元组的子集用于处理。 未选择处理的元组将被忽略。 所选择的元组的数量和所选择的特定元组至少部分取决于各种动态参数,包括接收数据流(和任何其他处理的数据流)的速率,与接收到的数据流相关联的时间延迟, 对数据流执行的连接操作的方向和相对于预期输出的单个元组的值。

    Method, apparatus and computer program product for preserving privacy in data mining
    12.
    发明授权
    Method, apparatus and computer program product for preserving privacy in data mining 有权
    用于保护数据挖掘隐私的方法,设备和计算机程序产品

    公开(公告)号:US07904471B2

    公开(公告)日:2011-03-08

    申请号:US11836171

    申请日:2007-08-09

    IPC分类号: G06F17/30

    摘要: Privacy in data mining of sparse high dimensional data records is preserved by transforming the data records into anonymized data records. This transformation involves creating a sketch-based private representation of each data record, each data record containing only a small number of non-zero attribute value in relation to the high dimensionality of the data records.

    摘要翻译: 通过将数据记录转换为匿名数据记录来保留稀疏高维数据记录的数据挖掘隐私。 该变换涉及创建每个数据记录的基于草图的私有表示,每个数据记录仅包含相对于数据记录的高维数的少量非零属性值。

    Integrity assurance of query result from database service provider
    13.
    发明授权
    Integrity assurance of query result from database service provider 有权
    数据库服务提供商的查询结果的完整性保证

    公开(公告)号:US07870398B2

    公开(公告)日:2011-01-11

    申请号:US11626847

    申请日:2007-01-25

    IPC分类号: G06F12/14 G06F7/00

    摘要: A method, system and computer program product for confirming the validity of data returned from a data store. A data store contains a primary data set encrypted using a first encryption and a secondary data set using a second encryption. The secondary data set is a subset of the primary data set. A client issues a substantive query against the data store to retrieve a primary data result belonging to the primary data set. A query interface issues at least one validating query against the data store. Each validating query returns a secondary data result belonging to the secondary data set. The query interface receives the secondary data result and provides a data invalid notification if data satisfying the substantive query included in an unencrypted form of the secondary data result is not contained in an unencrypted form of the primary data result.

    摘要翻译: 一种用于确认从数据存储返回的数据的有效性的方法,系统和计算机程序产品。 数据存储包含使用第一加密加密的主数据集和使用第二加密的辅数据集。 辅助数据集是主数据集的子集。 客户端对数据存储器发出实质性查询以检索属于主数据集的主数据结果。 查询界面对数据存储区发出至少一个验证查询。 每个验证查询返回属于辅助数据集的辅助数据结果。 如果满足辅助数据结果的未加密形式的实质性查询的数据未包含在主数据结果的未加密形式中,则查询接口接收辅助数据结果并提供数据无效通知。

    Peer-to-peer multi-party voice-over-IP services
    14.
    发明授权
    Peer-to-peer multi-party voice-over-IP services 有权
    点对点多方语音IP服务

    公开(公告)号:US07849138B2

    公开(公告)日:2010-12-07

    申请号:US12038386

    申请日:2008-02-27

    IPC分类号: G06F15/16 H04L12/16

    摘要: A system and computer program product for establishing multi-party VoIP conference audio calls in a distributed, peer-to-peer network where any number of nodes are able to arbitrarily and asynchronously start or stop producing audio output to be mixed into a single composite audio stream that is distributed to all nodes. A single distribution tree is used that has optimal communications characteristics to distribute the composite audio signal to all nodes. An audio mixing tree is established and maintained by adaptively and dynamically adding and merging intermediate mixing nodes operating between user nodes and the root of the single distribution tree. The intermediate mixing nodes and the root of the single distribution tree are all hosted, in an exemplary embodiment, on user nodes that are endpoints of the distribution tree.

    摘要翻译: 一种用于在分布式对等网络中建立多方VoIP会议音频呼叫的系统和计算机程序产品,其中任何数量的节点能够任意地和异步地开始或停止产生要混合到单个复合音频中的音频输出 分发给所有节点的流。 使用具有最佳通信特性以将复合音频信号分配给所有节点的单个分发树。 通过自适应地动态地添加和合并在用户节点和单个分发树的根之间运行的中间混合节点来建立和维护音频混合树。 在示例性实施例中,分发树的中间混合节点和根分别在作为分发树的端点的用户节点上托管。

    Systems and methods for sequential modeling in less than one sequential scan
    15.
    发明授权
    Systems and methods for sequential modeling in less than one sequential scan 失效
    在不到一次顺序扫描中进行顺序建模的系统和方法

    公开(公告)号:US07822730B2

    公开(公告)日:2010-10-26

    申请号:US11931129

    申请日:2007-10-31

    IPC分类号: G06F17/30

    CPC分类号: G06N99/005 Y10S707/99931

    摘要: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.

    摘要翻译: 对最大流式数据集的可伸缩归纳学习的最新研究着重于消除记忆限制并减少顺序数据扫描的次数。 然而,最先进的算法仍然需要对数据集进行多次扫描,并使用复杂的控制机制和数据结构。 这里讨论了一般的归纳学习框架,该框架一次扫描数据集。 然后,提出了一种基于Hoeffding不等式的扩展,可以扫描数据集不止一次。 提出的框架适用于广泛的归纳学习者。

    Systems and Methods for Metadata Embedding in Streaming Medical Data
    16.
    发明申请
    Systems and Methods for Metadata Embedding in Streaming Medical Data 有权
    用于流式传输医疗数据的元数据的系统和方法

    公开(公告)号:US20090226056A1

    公开(公告)日:2009-09-10

    申请号:US12042961

    申请日:2008-03-05

    IPC分类号: G06K9/00

    摘要: Systems and methods for embedding metadata such as personal patient information within actual medical data signals obtained from a patient are provided wherein two watermarks, a robust watermark and a fragile watermark are embedded in a given medical data signal. The robust watermark includes a binary coded representation of the metadata that is incorporated into the frequency domain of the medical data signal using discrete Fourier transformations and additive embedding. Error correcting code can also be added to the binary representation of the metadata using Hamming coding. A given robust watermark can be incorporated multiple times in the medical data signal. The fragile watermark is added on top of the modified medical signal containing the robust watermark in the spatial domain of the modified medical signal. The fragile watermark utilizes hash function to generate random sequences that are incorporated through the medical data signal.

    摘要翻译: 提供了用于将诸如个人患者信息之类的元数据嵌入到从患者获得的实际医疗数据信号中的系统和方法,其中在给定医疗数据信号中嵌入两个水印,鲁棒水印和脆弱水印。 鲁棒水印包括使用离散傅里叶变换和附加嵌入将并入到医疗数据信号的频域中的元数据的二进制编码表示。 错误纠正码也可以使用汉明编码加到元数据的二进制表示中。 给定的鲁棒水印可以被并入多次在医疗数据信号中。 在修改后的医疗信号的空间域中包含鲁棒水印的经修改的医学信号之上添加脆弱水印。 脆弱水印利用散列函数产生通过医疗数据信号并入的随机序列。

    SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA
    17.
    发明申请
    SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA 有权
    用于数据挖掘中的负载分解和来自流数据的知识发现的系统和方法

    公开(公告)号:US20090187914A1

    公开(公告)日:2009-07-23

    申请号:US12372568

    申请日:2009-02-17

    IPC分类号: G06F9/46 G06N5/02

    CPC分类号: G06K9/6297 H04L43/028

    摘要: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

    摘要翻译: 挖掘数据流的加载脱落方案。 使用评分函数对流元素的重要性进行排序,并调查那些具有重要意义的元素。 在不知道数据流的精确特征值的上下文中,本文提出了使用马尔可夫模型来预测数据流的特征分布。 基于预测的特征分布,可以进行分类决定,以最大限度地提高预期效益。 此外,在此提出采用质量决策(QoD)度量来衡量决策中的不确定性水平并指导负荷脱落。 诸如此处呈现的负载脱落方案将可用资源分配给多个数据流以最大化分类决定的质量。 此外,这种负载脱落方案能够学习和适应数据流中不断变化的数据特性。

    Systems and methods for simultaneous summarization of data cube streams
    18.
    发明授权
    Systems and methods for simultaneous summarization of data cube streams 失效
    同时汇总数据立方体流的系统和方法

    公开(公告)号:US07505876B2

    公开(公告)日:2009-03-17

    申请号:US11620679

    申请日:2007-01-07

    IPC分类号: G06F15/00 G06F17/30

    摘要: In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.

    摘要翻译: 在一个示例性实施例中,本发明的一些主要方面如下:(i)数据模型:我们引入张量流以处理多方面流的大集合; 和(ii)算法框架:我们提出基于窗口的张量分析(WTA)来有效地从张量流中提取核心模式。 张量表示与在线分析处理(OLAP)中的数据立方体相关。 然而,我们的本发明专注于为每个窗口构造简单的摘要,而不仅仅是组织数据以沿着每个方面或方面的组合来产生简单的聚合。

    Space and time efficient XML graph labeling
    19.
    发明授权
    Space and time efficient XML graph labeling 失效
    空间和时间有效的XML图形标注

    公开(公告)号:US07492727B2

    公开(公告)日:2009-02-17

    申请号:US11396502

    申请日:2006-03-31

    IPC分类号: H04L12/28

    CPC分类号: H04L45/48 H04L45/02

    摘要: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.

    摘要翻译: 提供了一种用于确定图中任何两个节点之间的可达性的方法。 本发明的方法利用双标记方案。 最初,为图中的一组节点定义了生成树。 生成树中的每个节点都被分配一个唯一的基于间隔的标签,它描述了从祖先节点的依赖关系。 然后,非树标签被分配给生成树中通过非树形链接连接到生成树中的另一个节点的每个节点。 从这些标签中,生成树中任何两个节点的可达性通过仅使用基于间隔的标签和非树标签来确定。

    Method to Continuously Diagnose and Model Changes of Real-Valued Streaming Variables
    20.
    发明申请
    Method to Continuously Diagnose and Model Changes of Real-Valued Streaming Variables 审中-公开
    连续地对实际流变量变量进行诊断和建模的方法

    公开(公告)号:US20090043715A1

    公开(公告)日:2009-02-12

    申请号:US12060932

    申请日:2008-04-02

    申请人: Wei Fan Philip S. Yu

    发明人: Wei Fan Philip S. Yu

    IPC分类号: G06F15/18

    CPC分类号: G06N20/00

    摘要: The method trains an inductive model to output multiple models from the inductive model and trains an error correlation model to estimate an average output of predictions made by the multiple models. Then the method can determine an error estimation of each of the multiple models using the error correlation model.

    摘要翻译: 该方法训练一个归纳模型,从感应模型中输出多个模型,并训练一个误差相关模型,以估计多个模型所做出的预测的平均输出。 然后,该方法可以使用误差相关模型来确定多个模型中的每一个的误差估计。