-
公开(公告)号:US11640446B2
公开(公告)日:2023-05-02
申请号:US17407181
申请日:2021-08-19
发明人: Mandis Beigi , Jacob Aptekar , Afrah Shafquat , Jason Mezey
IPC分类号: G06F21/62 , G06F16/27 , G06F18/214 , G06F18/2133 , G06F18/2135 , G06F18/21 , G06F18/2137
摘要: A method for generating a synthetic dataset from an original dataset includes encoding categorical features of the original dataset, embedding the encoded dataset in a low-dimensional space, selecting a seed record from the embedded dataset, identifying a plurality of nearest neighbor records to the seed record, generating a new record by randomly selecting features from the plurality of nearest neighbor records, and concatenating the new record into the synthetic dataset. For a synthetic dataset that contains N records, which may be the same as or different from the number of records in the original dataset, the selecting, identifying, generating, and concatenating operations operate a total of N times on the records in the embedded dataset.
-
公开(公告)号:US20230060848A1
公开(公告)日:2023-03-02
申请号:US17407181
申请日:2021-08-19
发明人: Mandis Beigi , Jacob Aptekar , Afrah Shafquat , Jason Mezey
IPC分类号: G06K9/62
摘要: A method for generating a synthetic dataset from an original dataset includes encoding categorical features of the original dataset, embedding the encoded dataset in a low-dimensional space, selecting a seed record from the embedded dataset, identifying a plurality of nearest neighbor records to the seed record, generating a new record by randomly selecting features from the plurality of nearest neighbor records, and concatenating the new record into the synthetic dataset. For a synthetic dataset that contains N records, which may be the same as or different from the number of records in the original dataset, the selecting, identifying, generating, and concatenating operations operate a total of N times on the records in the embedded dataset.
-