Abstract:
File risk and malware detection and classification can be enhanced using machine learning analysis of content disarm and reconstruction (CDR) output. Correlations can be discovered or analyzed between individual elements of such outputs, which can include an XML report. Such correlations can provide useful information on threat intelligence and help validate content disarm and reconstruction. A method can include training machine learning algorithms with a dataset derived from CDR results from test files labelled as malicious or not malicious; instructing algorithms to predict probabilities; and determining correlation between the report items and malware (for example, using the function feature importances and the SHAP value method).
Abstract:
A method or system of receiving an electronic file containing content data in a predetermined data format, the method comprising the steps of: receiving the electronic file, determining the data format, parsing the content data, to determine whether it conforms to the predetermined data format, and if the content data does conform to the predetermined data format, regenerating the parsed data to create a regenerated electronic file in the data format.
Abstract:
A system and method for calculating a risk assessment for an electronic file is described. A database of checks, organized into categories, can be used to scan electronic files. The categories of checks can include weights assigned to them. An analyser can analyse electronic files using the checks. Issues identified by the analyser can be weighted using the weights to determine a risk assessment for the electronic file.
Abstract:
A method for resisting spread of unwanted code and data without scanning incoming electronic files for unwanted code and data, the method comprising the steps, performed by a computer system, includes receiving, at the computer system, an incoming electronic file containing content data encoded and arranged in accordance with a predetermined file type corresponding to a set of rules, determining a purported predetermined file type of the incoming electronic file by analysing the encoded and arranged content data, the purported predetermined file type and the associated set of rules specifying allowable content data for the purported predetermined file type, parsing the content data by dividing the content data into separate parts in accordance with a predetermined data format identified by the associated set of rules corresponding to the purported predetermined file type and determining nonconforming data in the content data by identifying content data that does not conform to the purported predetermined file format, and if the separate parts of the content data do conform to the predetermined data format, regenerating the allowable parsed content data to create a substitute regenerated electronic file in the purported predetermined file type by extracting the separate parts that do conform and putting them into the substitute regenerated electronic file.
Abstract:
A method or system of receiving an incoming electronic file containing content data in a predetermined data format, the method including receiving an incoming electronic file containing content data encoded and arranged in accordance with a predetermined file type, determining a purported predetermined file type of the incoming electronic file and an associated set of rules specifying allowable content data, determining at least an allowable portion of the content data that conforms with the set of rules corresponding to the determined purported predetermined file type, extracting, from the incoming electronic file, the at least an allowable portion of content data, creating a substitute electronic file in the purported predetermined file type, said substitute electronic file containing the extracted allowable content data, forwarding the substitute regenerated electronic file, and forwarding the incoming electronic file if a portion, part or whole of the content data does not conform, only when the intended recipient approves the electronic file at the time of receipt.
Abstract:
A system for processing a file using a file issue exclusion policy to manage risk is disclosed. If a file does not conform to a set of rules and would otherwise be quarantined, a file issue exclusion policy can be reviewed. If the file issue exclusion policy indicates that the reason why the file did not conform to the set of rules is acceptable, the file can be delivered to the recipient despite not conforming to the set of rules.
Abstract:
A method or system of receiving an incoming electronic file containing content data in a predetermined data format, the method including receiving an incoming electronic file containing content data encoded and arranged in accordance with a predetermined file type, determining a purported predetermined file type of the incoming electronic file and an associated set of rules specifying allowable content data, determining at least an allowable portion of the content data that conforms with the set of rules corresponding to the determined purported predetermined file type, extracting, from the incoming electronic file, the at least an allowable portion of content data, creating a substitute electronic file in the purported predetermined file type, said substitute electronic file containing the extracted allowable content data, forwarding the substitute regenerated electronic file, and forwarding the incoming electronic file if a portion, part or whole of the content data does not conform, only when the intended recipient approves the electronic file at the time of receipt.
Abstract:
A system and method for calculating a risk assessment for an electronic file is described. A database of checks, organized into categories, can be used to scan electronic files. The categories of checks can include weights assigned to them. An analyzer can analyze electronic files using the checks. Issues identified by the analyzer can be weighted using the weights to determine a risk assessment for the electronic file.
Abstract:
File risk and malware detection and classification can be enhanced using machine learning analysis of content disarm and reconstruction (CDR) output. Correlations can be discovered or analyzed between individual elements of such outputs, which can include an XML report. Such correlations can provide useful information on threat intelligence and help validate content disarm and reconstruction. A method can include training machine learning algorithms with a dataset derived from CDR results from test files labelled as malicious or not malicious: instructing algorithms to predict probabilities; and determining correlation between the report items and malware (for example, using the function feature importances and the SHAP value method).
Abstract:
A method for resisting spread of unwanted code and data without scanning incoming electronic files for unwanted code and data, the method comprising the steps, performed by a computer system, includes receiving, at the computer system, an incoming electronic file containing content data encoded and arranged in accordance with a predetermined file type corresponding to a set of rules, determining a purported predetermined file type of the incoming electronic file by analysing the encoded and arranged content data, the purported predetermined file type and the associated set of rules specifying allowable content data for the purported predetermined file type, parsing the content data by dividing the content data into separate parts in accordance with a predetermined data format identified by the associated set of rules corresponding to the purported predetermined file type and determining nonconforming data in the content data by identifying content data that does not conform to the purported predetermined file format, and if the separate parts of the content data do conform to the predetermined data format, regenerating the allowable parsed content data to create a substitute regenerated electronic file in the purported predetermined file type by extracting the separate parts that do conform and putting them into the substitute regenerated electronic file.