Reducing human overhead in text categorization
    1.
    发明申请
    Reducing human overhead in text categorization 有权
    在文本分类中减少人为的开销

    公开(公告)号:US20070183655A1

    公开(公告)日:2007-08-09

    申请号:US11350701

    申请日:2006-02-09

    申请人: Arnd Konig Eric Brill

    发明人: Arnd Konig Eric Brill

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6282

    摘要: A unique multi-stage classification system and method that facilitates reducing human resources or costs associated with text classification while still obtaining a desired level of accuracy is provided. The multi-stage classification system and method involve a pattern-based classifier and a machine learning classifier. The pattern-based classifier is trained on discriminative patterns as identified by humans rather than machines which allow a smaller training set to be employed. Given humans' superior abilities to reason over text, discriminative patterns can be more accurately and more readily identified by them. Unlabeled items can be initially processed by the pattern-based classifier and if no pattern match exists, then the unlabeled data can be processed by the machine learning classifier. By employing the classifiers in this manner, less human involvement is required in the classification process. Even more, classification accuracy is maintained and/or improved.

    摘要翻译: 提供了一种独特的多级分类系统和方法,其有助于减少与文本分类相关联的人力资源或成本,同时仍然获得期望的精度水平。 多级分类系统和方法涉及基于模式的分类器和机器学习分类器。 对基于模式的分类器进行人类识别的识别模式的培训,而不是允许使用较小训练集的机器。 鉴于人类超越文本的优越能力,歧视性模式可以更准确,更容易地被识别。 未标记的项目可以由基于模式的分类器最初处理,如果不存在模式匹配,那么未标记的数据可以由机器学习分类器处理。 通过以这种方式使用分类器,在分类过程中需要较少的人参与。 更重要的是,维护和/或改进分类精度。

    DATABASE CONFIGURATION ANALYSIS
    2.
    发明申请
    DATABASE CONFIGURATION ANALYSIS 有权
    数据库配置分析

    公开(公告)号:US20070174335A1

    公开(公告)日:2007-07-26

    申请号:US11275657

    申请日:2006-01-20

    IPC分类号: G06F17/00 G06F7/00

    CPC分类号: G06F17/30306

    摘要: To determine a configuration for a database system, a plurality of queries may be sampled from a representative workload using statistical inference to compute the probability of correctly selecting one of a plurality of evaluation configurations. The probability of correctly selecting may determine which and/or how many queries to sample, and/or may be compared to a target probability threshold to determine if more queries must be sampled. The configuration from the plurality of configurations with the lowest estimated cost of executing the representative workload may be determined based on the probability of selecting correctly. Estimator variance may be reduced through a stratified sampling scheme that leverages commonality, such as an average cost of execution, between queries based on query templates. The applicability of the Central Limit Theorem may be verified and used to determine which and/or how many queries to sample.

    摘要翻译: 为了确定数据库系统的配置,可以使用统计推断从代表性工作负载中采样多个查询,以计算正确选择多个评估配置之一的概率。 正确选择的概率可以确定要采样和/或可以与目标概率阈值进行比较和/或可以查询多少查询以确定是否必须对更多查询进行采样。 可以基于正确选择的概率来确定具有执行代表性工作负荷的估计成本最低的多个配置的配置。 通过分层采样方案,可以通过分层抽样方案来减少估计器差异,该方案利用基于查询模板的查询之间的共同性,如平均执行成本。 可以验证中心极限定理的适用性,并用于确定哪些和/或多少查询查询。

    Database monitoring system
    4.
    发明申请
    Database monitoring system 有权
    数据库监控系统

    公开(公告)号:US20050192921A1

    公开(公告)日:2005-09-01

    申请号:US10788077

    申请日:2004-02-26

    IPC分类号: G06F7/00 G06F17/30

    摘要: A framework is provided within a database system for specifying database monitoring rules that will be evaluated as part of the execution code path of database events being monitored. The occurrence of a selected database event triggers a rule that evaluates some parameter of an object related to the event against a condition in the rule. If the condition is met, a specified action is taken that can alter the execution of the database event or database system performance. Lightweight aggregation tables are utilized to enable aggregation of object parameter values so that presently occurring events can be compared to a summary of the object parameter values from previously occurring database events. Signatures are assigned to queries based on the structure of the query plan so that information in the lightweight aggregation tables can be grouped according to query signature.

    摘要翻译: 在数据库系统中提供一个框架,用于指定数据库监视规则,该规则将作为被监视的数据库事件的执行代码路径的一部分进行评估。 所选数据库事件的发生触发一个规则,该规则根据规则中的条件来评估与事件相关的对象的某些参数。 如果满足条件,则采取可以改变数据库事件或数据库系统性能执行的指定操作。 轻量级聚合表用于启用对象参数值的聚合,以便将当前发生的事件与先前发生的数据库事件的对象参数值的摘要进行比较。 根据查询计划的结构将签名分配给查询,以便轻量级聚合表中的信息可以根据查询签名进行分组。