-
公开(公告)号:US20220027557A1
公开(公告)日:2022-01-27
申请号:US16934220
申请日:2020-07-21
Applicant: international Business Machines Corporation
Inventor: Chao-Min Chang , Kuei-Ching Lee , Ci-Hao Wu , Chia-Heng Lin
IPC: G06F40/242 , G06N20/00 , G06F40/58 , G06F40/295 , G06K9/62
Abstract: An approach for a fast and accurate word embedding model, “desc2vec,” for out-of-dictionary (OOD) words with a model learning from the dictionary descriptions of the word is disclosed. The approach includes determining that a target text element is not in a set of reference text elements, information describing the target text element is obtained. The information comprises a set of descriptive text elements. A set of vectorized representations for the set of descriptive text elements is determined. A target vectorized representation for the target text element is determined based on the set of vectorized representations using a machine learning model. The machine learning model is trained to represent a predetermined association between the set of vectorized representations for the set of descriptive text elements describing the target text element and the target vectorized representation.
-
公开(公告)号:US11017083B2
公开(公告)日:2021-05-25
申请号:US16162597
申请日:2018-10-17
Applicant: International Business Machines Corporation
Inventor: Ci-Hao Wu , Ying-Chen Yu , June-Ray Lin , Hsieh-Lung Yang , Chen-Yu Huang , Chia-Heng Lin , Kuei-Ching Lee
IPC: G06F21/56 , G06N3/04 , G06N3/08 , G06F16/901 , G06F40/289
Abstract: Provided are systems, methods, and media for multiphase graph partitioning for malware entity detection. An example method includes receiving an input string associated with the malware entity. A determination is made as to whether the input string includes a symbolic word, a non-symbolic word, a symbolic phrase, or a non-symbolic phrase. A branching graph is formed based on a combination of the input string and a plurality of stored strings that are each associated with the malware entity to determine whether the input string is a valid detection name of the malware entity, in which the branching graph is formed by at least performing a first graph partitioning stage and a second graph partitioning stage. The input string is then labeled based on the formed branching graph and then outputted to a malware detection engine.
-
公开(公告)号:US10762155B2
公开(公告)日:2020-09-01
申请号:US16167653
申请日:2018-10-23
Applicant: International Business Machines Corporation
Inventor: June-Ray Lin , Curtis CH Wei , Hsieh-Lung Yang , Ying-Chen Yu , Chia-Heng Lin , Ci-Hao Wu , Chen-Yu Huang , Kuei-Ching Lee
IPC: G06F16/00 , G06F16/9535 , G06K9/62 , G06F16/335 , G06F16/955
Abstract: A method, computer program product, and computing system device for receiving, on a computing device, a plurality of webpages. At least one webpage may be filtered from the plurality of webpages into at least one set of webpages using a decision tree algorithm. At least one remaining webpage may be filtered from the plurality of webpages into the at least one set of webpages using a supported vector machine (SVM) algorithm.
-
公开(公告)号:US20190303501A1
公开(公告)日:2019-10-03
申请号:US15936666
申请日:2018-03-27
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Chen-Yu Huang , Sheng-Wei Lee , June-Ray Lin , Ci-Hao Wu , Hsieh-Lung Yang , Ying-Chen Yu
Abstract: A method, computer system, and a computer program product for crawling and extracting main content from a web page is provided. The present invention may include retrieving a HTML document associated with a web page. The present invention may then include identifying at least one entry point located in the retrieved HTML document by utilizing a self-adaptive entry point locator. The present invention may also include extracting a main content article associated with the retrieved HTML document based on the identified at least one entry point. The present invention may further include presenting the extracted main content associated with the retrieved HTML document to the user.
-
-
-