SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS
    1.
    发明申请
    SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS 有权
    用于建立多元语言模型的系统和方法

    公开(公告)号:US20150339292A1

    公开(公告)日:2015-11-26

    申请号:US14797680

    申请日:2015-07-13

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.

    Abstract translation: 本文公开了用于收集网络数据以便创建不同语言模型的系统,方法和非暂时的计算机可读存储介质。 被配置为实践该方法的系统首先通过根据访问策略的互连设备的网络中的诸如通过在计算设备上操作的爬行器来爬行一组文档,其中所述访问策略被配置为专注于新颖区域 目前的语言模型是从以前的爬行周期构建的,通过抓取其词汇被认为可能填补当前语言模型的空白的文档。 来自上一个循环的语言模型可用于指导在以下循环中创建语言模型。 新奇区域可以包括与当前语言模型相比具有高困惑价值的文档。

    SYSTEM AND METHOD FOR LOCATING BILINGUAL WEB SITES

    公开(公告)号:US20170091178A1

    公开(公告)日:2017-03-30

    申请号:US15294883

    申请日:2016-10-17

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for bootstrapping a language translation system. A system configured to practice the method performs a bidirectional web crawl to identify a bilingual website. The system analyzes data on the bilingual website to make a classification decision about whether the root of the bilingual website is an entry point for the bilingual website. The bilingual site can contain pairs of parallel pages. Each pair can include a first website in a first language and a second website in a second language, and a first portion of the first web page corresponds to a second portion of the second web page. Then the system analyzes the first and second web pages to identify corresponding information pairs in the first and second languages, and extracts the corresponding information pairs from the first and second web pages for use in a language translation model.

Patent Agency Ranking