-
公开(公告)号:EP4105815A1
公开(公告)日:2022-12-21
申请号:EP21179303.9
申请日:2021-06-14
申请人: Onfido Ltd
发明人: Botros, Philip , Sabathe, Romain , Christiansen, Lewis , Bonev, Slavi , Annunziata, Roberto , Mahadevan, Mohan
摘要: A computer implemented method of training a machine learning model for detecting anomalies in images of documents of a class of documents is provided. The method comprises obtaining, for each document, at least one first digital image of a first set of digital images of documents within the class of documents, each first digital image being an image of a region of the respective document comprising a portion of or the whole respective document and the first set of digital images comprising at least one digital image of a document of the class of documents containing an anomaly and at least one digital image of a document of the class of documents not containing an anomaly. The method further comprises applying a plurality of signal processing algorithms to each of the first digital images to generate a respective signal for each first digital image of the first set of digital images of documents and each signal processing algorithm and evaluating a discriminative power of each signal processing algorithm, wherein the discriminative power is indicative of the power of the signals generated with the respective signal processing algorithm to discriminate digital images of documents of the class of documents containing an anomaly from digital images of documents of the class of documents not containing an anomaly. The method further comprises selecting, based on at least the discriminative power of the respective signal processing algorithms, one or more of the plurality of signal processing algorithms, generating input data for the machine learning model using one or more respective signals generated by applying the selected one or more of the plurality of signal processing algorithms to each of a plurality of second digital images, wherein each second digital image is an image of the region of a respective document of a second set of digital images of documents within the class of documents and the second set of digital images comprises at least one digital image of a document of the class of documents containing an anomaly and at least one digital image of a document of the class of documents not containing an anomaly, and training the machine learning model using the input data to produce output data indicative of whether a digital image of a document of the class of documents contains an anomaly or not, wherein optionally, the first set of digital images of documents is the same as or different from, for example, a subset of, the second set of digital images of documents.