ROBUSTNESS TO ADVERSARIAL BEHAVIOR FOR TEXT CLASSIFICATION MODELS

    公开(公告)号:US20210334459A1

    公开(公告)日:2021-10-28

    申请号:US17239284

    申请日:2021-04-23

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a text classification machine learning model. One of the methods includes training a model having a plurality of parameters and configured to generate a classification of a text sample comprising a plurality of words by processing a model input that includes a combined feature representation of the plurality of words in the text sample, wherein the training comprises receiving a text sample and a target classification for the text sample; generating a plurality of perturbed combined feature representations; determining, based on the plurality of perturbed combined feature representations, a region in the embedding space; and determining an update to the parameters based on an adversarial objective that encourages the model to assign the target classification for the text sample for all of the combined feature representations in the region in the embedding space.

Patent Agency Ranking