GENERATION OF DATA TRANSFORMATIONS USING FINGERPRINTS

    公开(公告)号:US20240202573A1

    公开(公告)日:2024-06-20

    申请号:US18067871

    申请日:2022-12-19

    IPC分类号: G06N20/00 G06N3/004

    CPC分类号: G06N20/00 G06N3/004

    摘要: A method, computer program product, and computer system for transforming sets of source data having different formats into respective sets of target data having a same format. N source patterns are determined and respectively describe N different formats in which N sets of source data items are formatted, where N≥1. A target format pattern is determined and describes a target format in which a target data items are formatted. N graphs are generated and respectively describe transformations of the N source patterns to the target pattern. Each graph includes multiple transformation paths. Each transformation path transforms the source pattern to the target pattern in a manner that maps source strings in the source pattern to each target string in the target pattern. A single transformation path is selected from the multiple transformation paths resulting in N single transformation paths having been selected.

    PRIVACY-PRESERVING CLASS LABEL STANDARDIZATION IN FEDERATED LEARNING SETTINGS

    公开(公告)号:US20230177113A1

    公开(公告)日:2023-06-08

    申请号:US17540660

    申请日:2021-12-02

    IPC分类号: G06K9/62 G06N20/00

    摘要: Methods, systems, and computer program products for privacy-preserving class label standardization in federated learning settings are provided herein. A computer-implemented method includes determining, using one or more data privacy-preserving techniques, a signature for each of one or more classes of data for each of multiple client devices within a federated learning environment; identifying one or more signature matches across at least a portion of the multiple client devices; generating one or more class labels for the one or more classes of data associated with the one or more signature matches; labeling, across the at least a portion of the multiple client devices, the one or more classes of data associated with the one or more signature matches with the one or more generated class labels; and performing one or more automated actions based at least in part on the one or more labeled classes of data.

    DATA QUALITY ASSESSMENT FOR UNSUPERVISED MACHINE LEARNING

    公开(公告)号:US20220405631A1

    公开(公告)日:2022-12-22

    申请号:US17353978

    申请日:2021-06-22

    IPC分类号: G06N20/00 G06N5/04

    摘要: Techniques for qualitatively assessing unlabeled data in an unsupervised machine learning environment are disclosed. In one example, a method comprises the following steps. A dataset of unlabeled data points is converted into a graph structure. Nodes of the graph structure represent the unlabeled data points in the dataset and weighted edges between at least a portion of the nodes represent similarity between the unlabeled data points represented by the nodes. A metric is computed for each node of the graph structure. A value generated by the metric for a given node represents a measure of dissimilarity between the corresponding unlabeled data point of the given node and one or more other unlabeled data points of one or more other nodes. A subset of the dataset is generated by removing one or more unlabeled data points from the dataset based on one or more values of the computed metric.

    Data transformation methodology using generated program code and token mappings

    公开(公告)号:US11928126B1

    公开(公告)日:2024-03-12

    申请号:US17821309

    申请日:2022-08-22

    IPC分类号: G06F16/25 G06F16/84

    CPC分类号: G06F16/258 G06F16/86

    摘要: A computer implemented method transforms data. Responsive to receiving a data transformation of an input string to an output string, a computer system identifies mappable tokens in the input string that are mappable to the output string. The computer system creates a set of initial mappings for a set of common tokens in the mappable tokens. The set of initial mappings maps the set of common tokens from the input string to the output string. The computer system creates a set of user mappings that maps the mappable tokens from input string to the output string using a user input to the set of initial mappings. The computer system generates program code that transform input strings to output strings using the set of user mappings that maps the mappable tokens from input string to the output string, wherein the program code is used to transform input strings to output strings.