System and method for efficiently performing similarity searches of structural data
    1.
    发明授权
    System and method for efficiently performing similarity searches of structural data 有权
    有效执行结构数据相似性检索的系统和方法

    公开(公告)号:US09165042B2

    公开(公告)日:2015-10-20

    申请号:US11096165

    申请日:2005-03-31

    IPC分类号: G06F7/00 G06F17/30 G06F19/00

    CPC分类号: G06F17/30536 G06F19/705

    摘要: Techniques for similarity searching are provided. Structural data in a database is searched against one or more structural queries. A desired minimum degree of similarity between the one or more queries and the structural data in the database is first specified. One or more indices are then used to exclude from consideration any structural data in the database that does not share the minimum degree of similarity with one or more of the queries.

    摘要翻译: 提供了相似搜索的技术。 针对一个或多个结构查询搜索数据库中的结构数据。 首先指定一个或多个查询与数据库中的结构数据之间期望的最小相似程度。 然后使用一个或多个索引从考虑中排除不与一个或多个查询共享最小相似度的数据库中的任何结构数据。

    System and method for analyzing streams and counting stream items on multi-core processors
    2.
    发明授权
    System and method for analyzing streams and counting stream items on multi-core processors 失效
    用于分析多核处理器上的流和计数流项目的系统和方法

    公开(公告)号:US08321579B2

    公开(公告)日:2012-11-27

    申请号:US11828732

    申请日:2007-07-26

    IPC分类号: G06F15/16

    CPC分类号: G06F17/18

    摘要: Systems and methods for parallel stream item counting are disclosed. A data stream is partitioned into portions and the portions are assigned to a plurality of processing cores. A sequential kernel is executed at each processing core to compute a local count for items in an assigned portion of the data stream for that processing core. The counts are aggregated for all the processing cores to determine a final count for the items in the data stream. A frequency-aware counting method (FCM) for data streams includes dynamically capturing relative frequency phases of items from a data stream and placing the items in a sketch structure using a plurality of hash functions where a number of hash functions is based on the frequency phase of the item. A zero-frequency table is provided to reduce errors due to absent items.

    摘要翻译: 公开了并行流项计数的系统和方法。 将数据流划分为多个部分,并将这些部分分配给多个处理核。 在每个处理核心处执行顺序内核以计算用于该处理核心的数据流的分配部分中的项目的本地计数。 为所有处理核心聚合计数,以确定数据流中项目的最终计数。 用于数据流的频率感知计数方法(FCM)包括从数据流动态地捕获项目的相对频率相位,并且使用多个散列函数将项目放置在草图结构中,其中多个散列函数基于频率相位 的项目。 提供零频率表以减少由于缺少项目导致的错误。

    System and method for indexing type-annotated web documents
    3.
    发明申请
    System and method for indexing type-annotated web documents 审中-公开
    用于索引类型注释的Web文档的系统和方法

    公开(公告)号:US20090049035A1

    公开(公告)日:2009-02-19

    申请号:US11891921

    申请日:2007-08-14

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F16/951

    摘要: Methods and apparatus generate an index for use in a document retrieval system where the index is organized by type and keyword. Redundancy in the index is reduced by organizing type entries in a hierarchy of internal and leaf nodes. Determining whether to generate an inverted list for a type is based on the position of the type in the hierarchy; generally inverted lists are generated only for types corresponding to leaf nodes. Redundancy is further reduced by re-using inverted lists generated for keywords for types when there is an overlap between keywords and types. Search performance using the document retrieval index is improved by adding entries corresponding to combinations of keywords and types. The intersections of inverted lists associated with the keywords and types comprising the combinations are determined and added to the index for use in search operations. Determining whether to add an entry for a keyword-type combination is made on a cost-benefit analysis dependent, at least in part, on the proximity of the keyword to type in documents containing the combination.

    摘要翻译: 方法和设备生成用于文档检索系统的索引,其中索引按类型和关键字组织。 通过在内部和叶节点的层次结构中组织类型条目来减少索引中的冗余。 确定是否为类型生成反向列表是基于层次结构中类型的位置; 一般反转的列表仅针对对应于叶节点的类型生成。 当关键字和类型之间存在重叠时,通过重新使用针对关键字生成的反向列表来进一步减少冗余。 通过添加与关键字和类型的组合相对应的条目来提高使用文档检索索引的搜索性能。 确定与包括组合的关键词和类型相关联的倒排列表的交集并将其添加到用于搜索操作的索引中。 确定是否添加关键字类型组合的条目是根据成本效益分析进行的,至少部分是关键字的邻近度来键入包含该组合的文档。

    Systems and methods for providing real-time classification of continuous data streams
    4.
    发明授权
    Systems and methods for providing real-time classification of continuous data streams 有权
    用于提供连续数据流的实时分类的系统和方法

    公开(公告)号:US07937269B2

    公开(公告)日:2011-05-03

    申请号:US11208893

    申请日:2005-08-22

    IPC分类号: G10L15/06 G10L15/00 G10L13/00

    CPC分类号: G10L15/063 G10L17/00

    摘要: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.

    摘要翻译: 提供了系统和方法,用于流式传输数据的实时分类。 特别地,用于连续数据流的实时分类的系统和方法实现用于离线和在线处理训练数据的微聚类方法,以构建和动态地更新用于分类的训练模型,以及在连续数据上逐渐聚类数据 将连续数据流的段(实时)分割成多个微群集,从中构建目标简档,其定义/模拟数据流的各个段中的数据的行为。

    Systems and methods for resource-adaptive workload management
    6.
    发明授权
    Systems and methods for resource-adaptive workload management 失效
    资源自适应工作负载管理的系统和方法

    公开(公告)号:US07379953B2

    公开(公告)日:2008-05-27

    申请号:US11063168

    申请日:2005-02-22

    IPC分类号: G06F17/30

    摘要: Systems and methods are provided for resource adaptive workload management. In a method thereof, at least one execution objective is received for at least one of a plurality of queries under execution. A progress status of, and an amount of resource consumed by, each of the plurality of queries are monitored. A remaining resource requirement for each of the plurality of queries is estimated, based on the progress status of, and the amount of resource consumed by, each of the plurality of queries. Resource allocation is adjusted based on the at least one execution objective and the estimates of the remaining resource requirements.

    摘要翻译: 为资源自适应工作负载管理提供了系统和方法。 在其方法中,为执行中的多个查询中的至少一个接收至少一个执行目标。 监视多个查询中的每一个的进度状态和消耗的资源量。 基于多个查询中的每个查询的消耗的进度状态和资源量来估计多个查询中的每一个的剩余资源需求。 基于至少一个执行目标和剩余资源需求的估计来调整资源分配。

    SYSTEM AND METHOD FOR ANALYZING STREAMS AND COUNTING STREAM ITEMS ON MULTI-CORE PROCESSORS
    9.
    发明申请
    SYSTEM AND METHOD FOR ANALYZING STREAMS AND COUNTING STREAM ITEMS ON MULTI-CORE PROCESSORS 失效
    用于分析多个流程的流程和计数流程的系统和方法

    公开(公告)号:US20090031175A1

    公开(公告)日:2009-01-29

    申请号:US11828732

    申请日:2007-07-26

    IPC分类号: G06F11/00 G06F9/06

    CPC分类号: G06F17/18

    摘要: Systems and methods for parallel stream item counting are disclosed. A data stream is partitioned into portions and the portions are assigned to a plurality of processing cores. A sequential kernel is executed at each processing core to compute a local count for items in an assigned portion of the data stream for that processing core. The counts are aggregated for all the processing cores to determine a final count for the items in the data stream. A frequency-aware counting method (FCM) for data streams includes dynamically capturing relative frequency phases of items from a data stream and placing the items in a sketch structure using a plurality of hash functions where a number of hash functions is based on the frequency phase of the item. A zero-frequency table is provided to reduce errors due to absent items.

    摘要翻译: 公开了并行流项计数的系统和方法。 将数据流划分为多个部分,并将这些部分分配给多个处理核。 在每个处理核心处执行顺序内核以计算用于该处理核心的数据流的分配部分中的项目的本地计数。 为所有处理核心聚合计数,以确定数据流中项目的最终计数。 用于数据流的频率感知计数方法(FCM)包括从数据流动态地捕获项目的相对频率相位,并且使用多个散列函数将项目放置在草图结构中,其中多个散列函数基于频率相位 的项目。 提供零频率表以减少由于缺少项目导致的错误。