摘要:
Computer implemented systems and methods are disclosed for identifying and categorizing electronic documents through machine learning. In accordance with some embodiments, a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm. The trained document categorizer may categorize electronic documents in a large corpus of electronic documents. Performance metrics associated with performance of the trained document categorizer may be tracked, and additional seed sets of categorized electronic documents may be used to improve the performance of document categorizer by retraining the document categorizer on subsequent seed sets. Additional seed sets may and categorizations may be iterated through until a desired document categorization performance is reached.
摘要:
Systems and methods are provided for identifying and compiling information relating to an entity for investigative analysis. The system may comprise one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the system to search, in one or more data sources, with a plurality of known characteristics of an entity to obtain a first plurality of records, identify from the first plurality of records a subset of records that match the known characteristics with a substantial confidence, compile the subset of records to form a unified record representing the entity and conduct a second search with information from the unified record to obtain a second plurality of search results.
摘要:
Systems and methods are provided for identifying relevant information for an entity, referred to as a seed entity. A plurality of search queries can be generated each comprising a property of a seed entity or one of the entities associated with the seed entity (seed-linked entities). Preferably, a collection of search queries includes ones representing different properties of the seed entity and properties of different seed-linked entities. Optionally, the collection of search queries is optimized to reduce search burden. Searches can then be conducted with the search queries in one or more data sources to obtain a plurality of search results, wherein each search result comprises a hit entity and one or more entities associated with the hit entity (hit-linked entity). For each of the search results, a score can be determined taking as input (a) likelihood of match between the seed entity and the hit entity or between a seed-linked entity and a hit-linked entity, (b) presence of a new entity in the search result not present in the search queries or a difference between the new entity and an entity present in the search queries, and (c) characteristic of the new entity in the search result. Based on the scores, high priority search results can be presented a user for further analysis.
摘要:
Computer-implemented systems and methods are disclosed for automatically generating and displaying a chronology of events, where events may be represented by data objects in one or more databases. Events/data objects may be identified as relevant to an investigation or analysis based on specified criteria. A timeline may be generated based on the identified set of relevant events, and interactive user interfaces may be generated and displayed that present the events as a timeline and a list. Events may be selected from the timeline or the list, may be identified as key events in the chronology, and additional events related to a selected event may be determined and added to the chronology. Timelines may be compared to other data sets, including other timelines, other event lists, and other relevant data.
摘要:
Systems and methods are provided for identifying relevant information for an entity, referred to as a seed entity. A plurality of search queries can be generated each comprising a property of a seed entity or one of the entities associated with the seed entity (seed-linked entities). Preferably, a collection of search queries includes ones representing different properties of the seed entity and properties of different seed-linked entities. Optionally, the collection of search queries is optimized to reduce search burden. Searches can then be conducted with the search queries in one or more data sources to obtain a plurality of search results, wherein each search result comprises a hit entity and one or more entities associated with the hit entity (hit-linked entity). For each of the search results, a score can be determined taking as input (a) likelihood of match between the seed entity and the hit entity or between a seed-linked entity and a hit-linked entity, (b) presence of a new entity in the search result not present in the search queries or a difference between the new entity and an entity present in the search queries, and (c) characteristic of the new entity in the search result. Based on the scores, high priority search results can be presented a user for further analysis.
摘要:
Computer-implemented systems and methods are disclosed for automatically generating and displaying a chronology of events, where events may be represented by data objects in one or more databases. Events/data objects may be identified as relevant to an investigation or analysis based on specified criteria. A timeline may be generated based on the identified set of relevant events, and interactive user interfaces may be generated and displayed that present the events as a timeline and a list. Events may be selected from the timeline or the list, may be identified as key events in the chronology, and additional events related to a selected event may be determined and added to the chronology. Timelines may be compared to other data sets, including other timelines, other event lists, and other relevant data.