Method and system for automated data curation

    公开(公告)号:US11947901B2

    公开(公告)日:2024-04-02

    申请号:US17810750

    申请日:2022-07-05

    发明人: Trevor McGuire

    摘要: A method for facilitating automated data curation in real-time is disclosed. The method includes retrieving electronic documents from a source; converting the electronic documents into data sets, the data sets corresponding to a predetermined format; preprocessing the data sets to identify linguistic units, the linguistic units relating to paragraphs and sentences; extracting, by using a model, attributes based on the linguistic units, the attributes relating to a key detail in the electronic documents; and generating, in real-time, messages based on the extracted attributes. Additionally, the electronic documents include a corporate action prospectus that provides information for a corresponding corporate event, the information including term and condition information, date information, and restriction information.

    PRIVATE DECISION TREE EVALUATION USING AN ARITHMETIC CIRCUIT

    公开(公告)号:US20230379135A1

    公开(公告)日:2023-11-23

    申请号:US18221665

    申请日:2023-07-13

    申请人: SAP SE

    摘要: A non-interactive protocol is provided for evaluating machine learning models such as decision trees. A client can delegate the evaluation of a machine learning model such as a decision tree to a server by sending an encrypted input and receiving only the encryption of the result. The inputs can be encoded as vector of integers using their binary representation. The server can then evaluate the machine learning model using a homomorphic arithmetic circuit. The homomorphic arithmetic circuit provides an implementation that requires fewer multiplication than a Boolean comparison circuit. Efficient data representations are then combined with different algorithmic optimizations to keep the computational overhead and the communication cost low. Related apparatus, systems, techniques and articles are also described.

    COMPUTER ARCHITECTURE FOR STRING SEARCHING

    公开(公告)号:US20230036196A1

    公开(公告)日:2023-02-02

    申请号:US17386477

    申请日:2021-07-27

    摘要: An embodiment of the present invention is a prime representation data structure in a computer architecture. The prime representation data structure has a plurality of records where each record contains a prime representation and where the prime representation is a product of two or more selected prime factors. Each of the selected prime factor associated with an n-gram of a domain representation of a domain string. The domain representation of the domain string is a domain string of ordered, contiguous domain characters. The n-gram being a subset of n number of the ordered, contiguous domain characters in the domain string. The computer architecture performs string searching and includes one or more central processing units (CPUs) with one or more operating systems, one or more input/output device interfaces, one or more memories, and one or more input/output devices. The architecture further includes the prime representation data structure, one or more prime target query data structures and a search process performed by one or more of the CPUs. The CPUs can be organized in a hierarchical structure. The prime target query data structure has one or more target prime queries. Each target prime query is the product of one or more target selected prime factors. Each target selected factor is associated with a target n-gram of a target domain representation of a target domain string. The search process, performed by one or more of the CPUs, determines whether one or more of the target selected prime factors is common with one of the selected prime factors. By performing this efficient testing, the computer system can determine if one or more small strings are included in one or more large strings.

    Text document categorization using rules and document fingerprints

    公开(公告)号:US11557141B2

    公开(公告)日:2023-01-17

    申请号:US16720289

    申请日:2019-12-19

    摘要: Methods, apparatuses, and storage media storing instructions for classifying text documents are provided. A plurality of text documents is obtained. The plurality of text documents is classified into one or more document categories based on a plurality of classification rules. Each of the one or more document categories include one or more first text documents of the plurality of text documents. A second text document of the plurality of text documents is classified based on the plurality of classification rules as belonging to none of the one or more document categories. One or more document fingerprints are generated for respective first text documents in the one or more document categories. The second text document is classified into one of the one or more document categories based on the one or more document fingerprints.