PRIOR INJECTIONS FOR SEMI-LABELED SAMPLES
    1.
    发明公开

    公开(公告)号:US20240143996A1

    公开(公告)日:2024-05-02

    申请号:US17977964

    申请日:2022-10-31

    申请人: Intuit Inc.

    发明人: Itay MARGOLIN

    IPC分类号: G06N3/08 G06K9/62

    CPC分类号: G06N3/08 G06K9/6259

    摘要: Systems and methods for training machine learning models are disclosed. An example method includes receiving a semi-labeled set of training samples including a first set of training samples, where each training sample in the first set is assigned a known label, and a second set of training samples, where each training sample in the second set has an unknown label, determining a first loss component, the first loss component providing a loss associated with the first set, determining a second loss component, the second loss component having a value which increases based on a difference between a distribution of individually predicted values of at least the second set and an expected overall distribution of at least the second set, and training the machine learning model, based on the first loss component and the second loss component, to predict labels for unlabeled input data.

    SYSTEMS AND METHODS FOR HARDWARE ACCELERATION OF MASKING AND NORMALIZING DATA WITH A TRIANGULAR INPUT MASK

    公开(公告)号:US20230376663A1

    公开(公告)日:2023-11-23

    申请号:US17751014

    申请日:2022-05-23

    发明人: Jinwen XI

    IPC分类号: G06F30/331 G06F9/30 G06K9/62

    摘要: A field programmable gate array including a configurable interconnect fabric connecting logic blocks implementing a circuit to: receive input data including data values organized into rows and columns, each row having N data values; select R[i] unmasked data values of a row of the input data in accordance with a mask and an index i of the row; select N−[i] unmasked data values of another row of the input data in accordance with the mask and an index of the another row; merge the R[i] unmasked data values of the row and the N−[i] data values of the another row into a combined data vector of N data values; and compute R[i] normalized values based on the R[i] unmasked data values of the combined data vector and N−[i] normalized values based on the N−[i] data values of the combined data vector to generate N normalized data values.

    AUTOMATIC ANOMALY THRESHOLDING FOR MACHINE LEARNING

    公开(公告)号:US20230244754A1

    公开(公告)日:2023-08-03

    申请号:US17590489

    申请日:2022-02-01

    申请人: ServiceNow, Inc.

    发明人: Lorne Schell

    IPC分类号: G06K9/62

    摘要: A program is provided to automatically train using a training dataset a machine learning model for detecting anomalies. The machine learning model is automatically applied to a validation dataset to determine anomaly detection results. A histogram of the anomaly detection results of the machine learning model is automatically generated. The histogram is automatically analyzed, and a first peak and a second peak of the histogram is automatically identified. A threshold activation of the machine learning model is automatically determined based at least in part on the automatically identified second peak of the histogram.

    JOINT PERSONALIZED SEARCH AND RECOMMENDATION WITH HYPERGRAPH CONVOLUTIONAL NETWORKS

    公开(公告)号:US20230195809A1

    公开(公告)日:2023-06-22

    申请号:US17882922

    申请日:2022-08-08

    申请人: NAVER CORPORATION

    IPC分类号: G06F16/9535 G06N3/04 G06K9/62

    摘要: A method of training a hypergraph convolutional network (HGCN) includes: receiving training data including search instances and recommendation instances; constructing a hypergraph from the training data, where each node of the hypergraph represents one of a user profile, a query term, and a content item, and where the hypergraph represents each of the search instances and each of the recommendation instances as a hyperedge linking corresponding ones of the nodes; initializing base embeddings associated with the hypergraph nodes; propagating the base embeddings through one or more convolutional layers of the HGCN to obtain, for each of the convolutional layers, respective embeddings of the nodes of the hypergraph; computing, based on the base embeddings and the respective embeddings obtained from each of the one or more convolutional layers: a first loss; and a second loss; and selectively updating ones of the base embeddings based on the first and second losses.

    REDUCING MACHINE-LEARNING MODEL COMPLEXITY WHILE MAINTAINING ACCURACY TO IMPROVE PROCESSING SPEED

    公开(公告)号:US20190213475A1

    公开(公告)日:2019-07-11

    申请号:US15866970

    申请日:2018-01-10

    申请人: Red Hat, Inc.

    IPC分类号: G06N3/08 G06N3/04 G06K9/62

    摘要: One example of the present disclosure can include a computing device identifying parameters for configuring a machine-learning model. The computing device can then determine descriptor values for multiple versions of the machine-learning model by, for each parameter in the group of parameters: (i) adjusting the parameter's value to generate a modified version of the machine-learning model; (ii) training the modified version of the machine-learning model to determine a likelihood function for the modified version of the machine-learning model; and (iii) determining a descriptor value for the modified version of the machine-learning model using the number of parameters in the group of parameters and the likelihood function. The computing device can then select a particular version of the machine-learning model based on the particular version having the lowest descriptor value among all the descriptor values. The computing device can execute the particular version of the machine-learning model to perform a task.

    METHOD AND APPARATUS FOR IDENTIFICATION OF FRAUDULENT CLICK ACTIVITY

    公开(公告)号:US20180253755A1

    公开(公告)日:2018-09-06

    申请号:US15971614

    申请日:2018-05-04

    IPC分类号: G06Q30/02 G06K9/62 G06F15/18

    摘要: This application discloses a method and an apparatus for advertisement fraud reduction. A training sample set including multiple training samples is obtained. At least one of the multiple training samples, associated with a fraudulent training user, includes a training click log associated with clicking one or more advertisements by the fraudulent training user. Feature information from the training sample set is extracted. The fraudulent training user and the feature information are associated with a fraudulent user type. A positive sample associated with the feature information is formed based on the at least one of the multiple training samples. A fraudulent user identification model associated with the fraudulent user type is trained based on at least the positive sample. Further, a sample to be identified, associated with a user to be identified, is received. Whether the user is a fraudulent user is determining using the fraudulent user identification model.