System and method for building diverse language models

Invention Grant

US09396183B2 System and method for building diverse language models 有权

Please log in to see more content

Patent Title: System and method for building diverse language models
Application No.: US14797680

Application Date: 2015-07-13
Publication No.: US09396183B2

Publication Date: 2016-07-19
Inventor: Luciano De Andrade Barbosa , Srinivas Bangalore
Applicant: AT&T Intellectual Property I, L.P.
Applicant Address: US GA Atlanta
Assignee: AT&T Intellectual Property I, L.P.
Current Assignee: AT&T Intellectual Property I, L.P.
Current Assignee Address: US GA Atlanta
Main IPC: G06F17/21
IPC: G06F17/21 ; G06F17/27 ; G06F17/28 ; G10L15/06

System and method for building diverse language models

Abstract:

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.

Public/Granted literature

US20150339292A1 SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS Public/Granted day:2015-11-26

Information query

Espacenet