SYSTEM AND METHOD FOR UNSUPERVISED TEXT NORMALIZATION USING DISTRIBUTED REPRESENTATION OF WORDS

    公开(公告)号:US20230075113A1

    公开(公告)日:2023-03-09

    申请号:US18055338

    申请日:2022-11-14

    IPC分类号: G06F40/232 G06F40/58

    摘要: A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.

    INK DATA MODIFICATION METHOD, INFORMATION PROCESSING DEVICE, AND PROGRAM THEREOF

    公开(公告)号:US20220415070A1

    公开(公告)日:2022-12-29

    申请号:US17903804

    申请日:2022-09-06

    摘要: An ink data modification or correction method, and an information processing device and a program for implementing the method are provided, which allow automatic correction of ink data including a spelling error in a handwritten character string. An ink data modification method according to the present disclosure includes determining a modification method of ink data by detecting a spelling error included in a handwritten character string represented by the ink data, and modifying the ink data by manipulating the ink data on the basis of the determined modification method. For example, the determined modification method may be to add a missing character, or to delete a superfluous character, or to correct a typo by replacing an erroneous character with a correct character.

    Method and apparatus for error correction of numerical contents in text, and storage medium

    公开(公告)号:US11526657B2

    公开(公告)日:2022-12-13

    申请号:US17375225

    申请日:2021-07-14

    摘要: This application discloses a method, an apparatus and an electronic device for error correction of numerical contents in a text, and relates to a technology field of artificial intelligence such as natural language processing and deep learning. The implementation method is: obtaining a target text to be processed; determining original numerical contents included in the target text; determining target types corresponding to the original numerical contents; and performing error correction on each original numerical content according to an error correction manner corresponding to each target type. Therefore, the error correction of numerical contents is realized according to types of the numerical contents, which is not only limited to the error correction of the numerical format, but also the logical error correction of the numerical content, so as to improve the numerical error correction capability and thereby improving the recall rate of detection and correction of wrong values.

    SYSTEM AND METHOD FOR IMPROVING CHATBOT TRAINING DATASET

    公开(公告)号:US20220309247A1

    公开(公告)日:2022-09-29

    申请号:US17347773

    申请日:2021-06-15

    摘要: The present invention provides for improving training dataset by identifying errors in training dataset and generating improvement recommendations. In operation, the present invention provides for identifying and correcting duplicate utterances in training dataset comprising utterances-intent pairs. Further, a plurality of Natural Language ML models are trained with the corrected training dataset to obtain diverse set of trained ML models. Each utterance of training dataset are fed as input to trained ML models, and a probability of error associated with each utterances-intent pairs of training dataset are evaluated based on analysis of respective intent predictions received from each of the trained ML models. Furthermore, spelling errors in the dataset are identified and data-imbalances in the training dataset are evaluated. Finally, a set of improvement recommendations for each utterances-intent pair is generated based on evaluated probability of errors, spelling errors, duplicate utterances and data imbalances.

    SYSTEM AND METHOD FOR GENERATING TEST DOCUMENT FOR CONTEXT SENSITIVE SPELLING ERROR CORRECTION

    公开(公告)号:US20220164530A1

    公开(公告)日:2022-05-26

    申请号:US17327072

    申请日:2021-05-21

    摘要: Disclosed is a system for generating test documents for context-sensitive spelling error correction. The system includes: an input unit inputting an error-free document for generating an error document; an error target word segment test unit checking possibility of an error in a word segment by sequentially examining word segments of the entire sentences in the document input through the input unit and searching for a candidate word appearing at the corresponding position together with surrounding context; an error word candidate selection unit selecting error word candidates among candidate words found at the corresponding position by considering edit distances to a correct word and keyboard typographical errors; and an error word determination and presentation unit calculating probabilities of an error word candidate and its surrounding context and determining an error word of the highest priority as a final error word.