INTERACTIVE MACHINE LEARNING OPTIMIZATION

    公开(公告)号:US20220309391A1

    公开(公告)日:2022-09-29

    申请号:US17216475

    申请日:2021-03-29

    IPC分类号: G06N20/00 G06N5/04 G06F16/28

    摘要: Methods, computer program products, and systems are presented. The method, computer program products, and systems can include, for instance: examining an enterprise dataset, the enterprise dataset defined by enterprise collected data; selecting one or more synthetic dataset in dependence on the examining, the one or more synthetic dataset including data other than data collected by the enterprise; training a set of predictive models using data of the one or more synthetic dataset to provide a set of trained predictive models; testing the set of trained predictive models with use of holdout data of the one or more synthetic dataset; and presenting prompting data on a displayed user interface of a developer user in dependence on result data resulting from the testing, the prompting data prompting the developer user to direct action with respect to one or more model of the set of predictive models.

    EFFICIENT TECHNIQUES FOR DETERMINING THE BEST DATA IMPUTATION ALGORITHMS

    公开(公告)号:US20210357781A1

    公开(公告)日:2021-11-18

    申请号:US16875533

    申请日:2020-05-15

    IPC分类号: G06N5/04 G06F17/17

    摘要: A processing system, a computer program product, and a method for efficiently determining a best imputation algorithm from a plurality of imputation algorithms A method includes: providing a plurality of imputation algorithms; providing a time parameter tmax to limit an amount of time spent determining a best imputation algorithm; maintaining past information i on accuracy and execution time for at least one of the imputation algorithms; using said information i to compute a utility score for each of the at least one the imputation algorithms; and testing imputation algorithms and associated parameters in an order based on said utility scores.

    RECTIFYING LABELS IN TRAINING DATASETS IN MACHINE LEARNING

    公开(公告)号:US20240256943A1

    公开(公告)日:2024-08-01

    申请号:US18103057

    申请日:2023-01-30

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: A method includes obtaining, by a processor set, labeled training data associated with a system; identifying, by the processor set, a first region and a second region in the labeled training data, wherein the first region is associated with a failure of the system and the second region is exclusive of the first region; and creating, by the processor set, re-labeled training data by altering one or more labels of the labeled training data in the first region based on data in the second region.

    DETERMINING THE BEST DATA IMPUTATION ALGORITHMS

    公开(公告)号:US20210357794A1

    公开(公告)日:2021-11-18

    申请号:US16875450

    申请日:2020-05-15

    摘要: A processing system, a computer program product, and a method for determining a best imputation algorithm from a plurality of imputation algorithms A method includes: providing a plurality of imputation algorithms; defining a data analytics task in which at least one step of the data analytics task includes determining at least one missing data value by imputation; executing the data analytics task multiple times wherein each execution of the data analytics task uses a data imputation algorithm of the plurality of data imputation algorithms to determine at least one missing data value; determining an error for each execution of the data analytics task; and selecting an imputation algorithm which results in a least error for the data analytics task.