Invention Grant
- Patent Title: System and method for unsupervised text normalization using distributed representation of words
-
Application No.: US14506156Application Date: 2014-10-03
-
Publication No.: US10083167B2Publication Date: 2018-09-25
- Inventor: Vivek Kumar Rangarajan Sridhar
- Applicant: AT&T Intellectual Property I, L.P.
- Applicant Address: US GA Atlanta
- Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
- Current Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
- Current Assignee Address: US GA Atlanta
- Main IPC: G06F17/27
- IPC: G06F17/27 ; G06F17/28 ; G06Q50/00

Abstract:
A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.
Public/Granted literature
- US20160098386A1 SYSTEM AND METHOD FOR UNSUPERVISED TEXT NORMALIZATION USING DISTRIBUTED REPRESENTATION OF WORDS Public/Granted day:2016-04-07
Information query