-
1.
公开(公告)号:US20190294995A1
公开(公告)日:2019-09-26
申请号:US16359336
申请日:2019-03-20
Applicant: TELEFONICA, S.A.
Inventor: Antonio Pastor Perales , Diego R. Lopez , Alberto Mozo Velasco , Sandra Gomez Canaval
Abstract: A system and method for training and validating ML algorithms in real networks, including: generating synthetic traffic and receiving it along with real traffic; aggregating the received traffic into network flows by using metadata and transforming them to generate a first dataset readable by the ML algorithm, comprising features defined by the metadata; labelling the traffic and selecting a subset of the features from the labelled dataset used in an iterative training to generate a trained model; filtering out a part of real traffic to obtain a second labelled dataset; and selecting a subset of features from the second labelled dataset used for validating the trained model by comparing predicted results for the trained model and the labels; repeating the steps with a different subset of features to generate another trained model until results are positive in terms of precision or accuracy.
-
公开(公告)号:US11301778B2
公开(公告)日:2022-04-12
申请号:US16359336
申请日:2019-03-20
Applicant: TELEFONICA, S.A.
Inventor: Antonio Pastor Perales , Diego R. Lopez , Alberto Mozo Velasco , Sandra Gomez Canaval
IPC: G06N20/00 , G06N5/04 , H04L43/026 , H04L43/028 , H04L43/067 , H04L43/12 , H04L41/16 , H04L41/14 , H04L47/10
Abstract: A system and method for training and validating ML algorithms in real networks, including: generating synthetic traffic and receiving it along with real traffic; aggregating the received traffic into network flows by using metadata and transforming them to generate a first dataset readable by the ML algorithm, comprising features defined by the metadata; labelling the traffic and selecting a subset of the features from the labelled dataset used in an iterative training to generate a trained model; filtering out a part of real traffic to obtain a second labelled dataset; and selecting a subset of features from the second labelled dataset used for validating the trained model by comparing predicted results for the trained model and the labels; repeating the steps with a different subset of features to generate another trained model until results are positive in terms of precision or accuracy.
-