Adaptive bayes feature extraction
    61.
    发明授权
    Adaptive bayes feature extraction 失效
    自适应贝叶斯特征提取

    公开(公告)号:US07961955B1

    公开(公告)日:2011-06-14

    申请号:US12011518

    申请日:2008-01-28

    IPC分类号: G06K9/62

    摘要: A system and method for extracting “discriminately informative features” from input patterns which provide accurate discrimination between two classes, a class-of-interest and a class-other, while reducing the number of features under the condition where training samples or otherwise, are provided a priori only for the class-of-interest thus eliminating the requirement for any a priori knowledge of the other classes in the input-data-set while exploiting the potentially robust and powerful feature extraction capability provided by fully supervised feature extraction approaches. The system and method extracts discriminate features by exploiting the ability of the adaptive Bayes classifier to define an optimal Bayes decision boundary between the class-of-interest and class-other using only labeled samples from the class-of-interest and unlabeled samples from the data to be classified. Optimal features are derived from vectors normal to the decision boundary defined by the adaptive Bayes classifier.

    摘要翻译: 从输入模式中提取“区分信息特征”的系统和方法,其提供两类之间的准确区分,一类感兴趣类和另一类,同时减少在训练样本或其他方式下的特征数量 仅为兴趣类提供先验,从而消除对输入数据集中的其他类的任何先验知识的要求,同时利用由完全监督的特征提取方法提供的潜在鲁棒强大的特征提取能力。 该系统和方法通过利用自适应贝叶斯分类器的能力来确定兴趣类别和类别之间的最佳贝叶斯决策边界,通过利用来自兴趣类别的标签样本和未标记的样本来提取辨别特征 要分类的数据。 最优特征是从自适应贝叶斯分类器定义的决策边界的向量导出的。

    Method and System for Data Classification Using a Self-Organizing Map
    63.
    发明申请
    Method and System for Data Classification Using a Self-Organizing Map 有权
    使用自组织图进行数据分类的方法和系统

    公开(公告)号:US20080071708A1

    公开(公告)日:2008-03-20

    申请号:US11467344

    申请日:2006-08-25

    IPC分类号: G06F15/18

    摘要: The described embodiments relate to methods and systems for data classification using a self-organizing map. Certain embodiments relate to a method of labeling data for training a classifier, comprising: obtaining data, the data comprising labeled data and unlabeled data; generating a self-organizing map of the data; and a labeling at least some of the unlabeled data based on proximity of the unlabeled data to labeled data within the self-organizing map to generate self-labeled data. The method may include training a classifier based on the labeled and self-labeled data. Other embodiments relate to systems and computer readable media configured to perform, or allow performance of, the method embodiments.

    摘要翻译: 所描述的实施例涉及使用自组织图的数据分类的方法和系统。 某些实施例涉及标记用于训练分类器的数据的方法,包括:获得数据,包括标记数据和未标记数据的数据; 生成数据的自组织图; 以及基于所述未标记数据与所述自组织图中的标记数据的接近度来标记至少一些所述未标记数据,以产生自标记数据。 该方法可以包括基于标记和自标记的数据训练分类器。 其他实施例涉及被配置为执行或允许执行方法实施例的系统和计算机可读介质。

    LEARNING LANGUAGE REPRESENTATION WITH LOGICAL INDUCTIVE BIAS

    公开(公告)号:US20240193399A1

    公开(公告)日:2024-06-13

    申请号:US18077723

    申请日:2022-12-08

    发明人: Jianshu CHEN

    IPC分类号: G06N3/04

    CPC分类号: G06N3/04 G06K9/6259

    摘要: A method including receiving input comprising natural language texts; pre-training a First-Order Logic Network (FOLNet) neural network model on unlabeled texts included in the natural language texts, the FOLNet neural network model comprising of a plurality of layers; processing the input through the plurality of layers of the FOLNet neural network model; encoding a logical inductive bias using the FOLNet neural network model; outputting one or more tensors based on the logical inductive bias; and predicting an outcome using the one or more tensors.

    Sequential Synthesis and Selection for Feature Engineering

    公开(公告)号:US20240013089A1

    公开(公告)日:2024-01-11

    申请号:US17859978

    申请日:2022-07-07

    发明人: Michael Langford

    IPC分类号: G06N20/00 G06K9/62

    CPC分类号: G06N20/00 G06K9/6259

    摘要: Systems and methods, as described herein, relate to sequential synthesis and selection for feature engineering. A dataset may be associated with a label defining a machine-learning target attribute and a received operation that can be applied to at least one of the existing features of the dataset. One or more potential features may be generated by applying the operation to one or more existing features. For each of the one or more potential features, a feature importance algorithm may be applied to the respective feature along with the one or more existing features, generating a respective feature importance value. Respective feature importance values may be generated for each of the one or more existing features based on applying the feature importance algorithm and used to sort the potential features. A level of correlation to each of the one or more existing features may be determined to make sure it is under a threshold level to avoid new features heavily correlated to existing ones.

    CHANGE MANAGEMENT PROCESS FOR IDENTIFYING INCONSISTENCIES FOR IMPROVED PROCESSING EFFICIENCY

    公开(公告)号:US20230342351A1

    公开(公告)日:2023-10-26

    申请号:US17660698

    申请日:2022-04-26

    申请人: Truist Bank

    发明人: Gregory Wright

    摘要: A system for determining whether inconsistencies exist in an entity's shared databases using a machine learning model. The system includes a repository having a plurality of databases that store data and information in a format accessible to users, and a back-end server operatively coupled to the repository and being responsive to the data and information from all of the databases. The back-end server includes a processor for processing the data and information, a communications interface communicatively coupled to the processor, and a memory device storing data and executable code. The code causes the processor to collect data and information from the databases, store the collected data and information in the memory device, process the stored data and information through the machine learning model to determine whether inconsistencies in the data exist in the databases, and transmit a communication on the interface identifying whether inconsistencies do exist in the databases.

    Resume Document Parsing using Computer Vision and Optical Character Recognition with Reblocking Feedback

    公开(公告)号:US20230215206A1

    公开(公告)日:2023-07-06

    申请号:US17331463

    申请日:2021-05-26

    申请人: Indeed, Inc.

    IPC分类号: G06K9/00 G06K9/62

    摘要: Systems and methods are disclosed for parsing resume documents using computer vision and optical character recognition technology in combination with a user feedback interface system to facilitate user feedback to improve the overall processing quality of the resumes that are imported into computer resume processing systems. In at least one embodiment, the system and method prompt a user to upload an input resume document, which is processed with a first parsing pass to generate initial resume data by extracting a plurality of resume text blocks. Further processing identifies an initial set of bounding blocks and to visually displays the initial resume data for user review and feedback to regroup one or more of the initial set of bounding blocks into a regrouped bounding block. Additional processing consolidates into a group text block each of the resume text blocks corresponding to the regrouped one or more of the initial set of bounding blocks.