-
公开(公告)号:US11768903B2
公开(公告)日:2023-09-26
申请号:US16906077
申请日:2020-06-19
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Chao-Min Chang , Ying-Chen Yu , June-Ray Lin , Kuei-Ching Lee , Curtis C H Wei
IPC: G06F16/95 , G06F16/955 , G06N20/00 , G06N5/04 , G06F40/30 , G06F16/951 , G06F40/169 , G06F40/295 , G06F40/205
CPC classification number: G06F16/955 , G06F16/951 , G06F40/30 , G06N5/04 , G06N20/00 , G06F40/169 , G06F40/205 , G06F40/295
Abstract: A computer-implemented method for automatically adjusting a Uniform Resource Locator (URL) seed list. The method includes crawling for documents based on a seed URL list. The method generates relations data from the documents using a Natural Language Processing (NLP) model. The method analyzes the relations data using an auto-seed model. The method modifies the seed URL list.
-
公开(公告)号:US11663402B2
公开(公告)日:2023-05-30
申请号:US16934220
申请日:2020-07-21
Applicant: International Business Machines Corporation
Inventor: Chao-Min Chang , Kuei-Ching Lee , Ci-Hao Wu , Chia-Heng Lin
IPC: G06F40/242 , G06F40/295 , G06F40/58
CPC classification number: G06F40/242 , G06F40/295 , G06F40/58
Abstract: An approach for a fast and accurate word embedding model, “desc2vec,” for out-of-dictionary (OOD) words with a model learning from the dictionary descriptions of the word is disclosed. The approach includes determining that a target text element is not in a set of reference text elements, information describing the target text element is obtained. The information comprises a set of descriptive text elements. A set of vectorized representations for the set of descriptive text elements is determined. A target vectorized representation for the target text element is determined based on the set of vectorized representations using a machine learning model. The machine learning model is trained to represent a predetermined association between the set of vectorized representations for the set of descriptive text elements describing the target text element and the target vectorized representation.
-
公开(公告)号:US20200125681A1
公开(公告)日:2020-04-23
申请号:US16167653
申请日:2018-10-23
Applicant: International Business Machines Corporation
Inventor: June-Ray Lin , Curtis CH Wei , Hsieh-Lung Yang , Ying-Chen Yu , Chia-Heng Lin , Ci-Hao Wu , Chen-Yu Huang , Kuei-Ching Lee
Abstract: A method, computer program product, and computing system device for receiving, on a computing device, a plurality of webpages. At least one webpage may be filtered from the plurality of webpages into at least one set of webpages using a decision tree algorithm. At least one remaining webpage may be filtered from the plurality of webpages into the at least one set of webpages using a supported vector machine (SVM) algorithm.
-
公开(公告)号:US20220027557A1
公开(公告)日:2022-01-27
申请号:US16934220
申请日:2020-07-21
Applicant: international Business Machines Corporation
Inventor: Chao-Min Chang , Kuei-Ching Lee , Ci-Hao Wu , Chia-Heng Lin
IPC: G06F40/242 , G06N20/00 , G06F40/58 , G06F40/295 , G06K9/62
Abstract: An approach for a fast and accurate word embedding model, “desc2vec,” for out-of-dictionary (OOD) words with a model learning from the dictionary descriptions of the word is disclosed. The approach includes determining that a target text element is not in a set of reference text elements, information describing the target text element is obtained. The information comprises a set of descriptive text elements. A set of vectorized representations for the set of descriptive text elements is determined. A target vectorized representation for the target text element is determined based on the set of vectorized representations using a machine learning model. The machine learning model is trained to represent a predetermined association between the set of vectorized representations for the set of descriptive text elements describing the target text element and the target vectorized representation.
-
公开(公告)号:US11017083B2
公开(公告)日:2021-05-25
申请号:US16162597
申请日:2018-10-17
Applicant: International Business Machines Corporation
Inventor: Ci-Hao Wu , Ying-Chen Yu , June-Ray Lin , Hsieh-Lung Yang , Chen-Yu Huang , Chia-Heng Lin , Kuei-Ching Lee
IPC: G06F21/56 , G06N3/04 , G06N3/08 , G06F16/901 , G06F40/289
Abstract: Provided are systems, methods, and media for multiphase graph partitioning for malware entity detection. An example method includes receiving an input string associated with the malware entity. A determination is made as to whether the input string includes a symbolic word, a non-symbolic word, a symbolic phrase, or a non-symbolic phrase. A branching graph is formed based on a combination of the input string and a plurality of stored strings that are each associated with the malware entity to determine whether the input string is a valid detection name of the malware entity, in which the branching graph is formed by at least performing a first graph partitioning stage and a second graph partitioning stage. The input string is then labeled based on the formed branching graph and then outputted to a malware detection engine.
-
公开(公告)号:US10762155B2
公开(公告)日:2020-09-01
申请号:US16167653
申请日:2018-10-23
Applicant: International Business Machines Corporation
Inventor: June-Ray Lin , Curtis CH Wei , Hsieh-Lung Yang , Ying-Chen Yu , Chia-Heng Lin , Ci-Hao Wu , Chen-Yu Huang , Kuei-Ching Lee
IPC: G06F16/00 , G06F16/9535 , G06K9/62 , G06F16/335 , G06F16/955
Abstract: A method, computer program product, and computing system device for receiving, on a computing device, a plurality of webpages. At least one webpage may be filtered from the plurality of webpages into at least one set of webpages using a decision tree algorithm. At least one remaining webpage may be filtered from the plurality of webpages into the at least one set of webpages using a supported vector machine (SVM) algorithm.
-
-
-
-
-