Method and apparatus for distribution-based language model adaptation
    3.
    发明授权
    Method and apparatus for distribution-based language model adaptation 有权
    基于分布式语言模型适应的方法和装置

    公开(公告)号:US07254529B2

    公开(公告)日:2007-08-07

    申请号:US11225543

    申请日:2005-09-13

    IPC分类号: G06F17/27 G06F17/28 G10L15/00

    摘要: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

    摘要翻译: 提供了一种用于使语言模型适应于任务特定领域的方法和装置。 在该方法和装置下,小训练集中的n-gram的相对频率(即任务特定的训练数据集)和大训练集中的n-gram的相对频率(即,域外训练数据集 )用于在大训练集中加权n-g的分布计数。 然后通过从加权分布中识别n克的概率,将加权分布用于形成修改后的语言模型。

    Language input system for mobile devices
    4.
    发明授权
    Language input system for mobile devices 有权
    移动设备语言输入系统

    公开(公告)号:US07277732B2

    公开(公告)日:2007-10-02

    申请号:US09843358

    申请日:2001-04-24

    IPC分类号: A04B1/38

    摘要: A language system facilitates entry of an input string into a mobile device using discrete keys on a keypad, such as a 10-key keypad. The numeric keys have associated letters of an alphabet. The key input is representative of one or more Chinese phonetic characters. Based on this input string, the language system derives the most likely Chinese corresponding language characters intended by the user. The language system uses multiple different search engines and language models to aid in deriving the most probable Chinese language characters. When the language system recognizes possible Chinese language characters, the mobile device displays the possible Chinese language characters for user selection of the possible Chinese language characters and/or further input of one or more Chinese phonetic characters. In this manner, the language system adopts a modeless entry methodology that eliminates conventional mode switching between input and selection operations.

    摘要翻译: 语言系统有助于使用键盘上的离散键(诸如10键键盘)将输入串输入到移动设备中。 数字键具有字母的相关字母。 关键输入是一个或多个汉语拼音字符的代表。 基于该输入字符串,语言系统导出用户想要的最可能的中文对应语言字符。 语言系统使用多种不同的搜索引擎和语言模型来帮助推导出最可能的中文字符。 当语言系统识别可能的中文字符时,移动设备显示可能的汉语字符,用于选择可能的中文字符和/或进一步输入一个或多个汉语拼音字符。 以这种方式,语言系统采用无模式输入方法,消除了输入和选择操作之间的常规模式切换。

    Method and apparatus for distribution-based language model adaptation

    公开(公告)号:US07043422B2

    公开(公告)日:2006-05-09

    申请号:US09945930

    申请日:2001-09-04

    IPC分类号: G06F17/27

    摘要: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

    Method and apparatus for distribution-based language model adaptation

    公开(公告)号:US20060009965A1

    公开(公告)日:2006-01-12

    申请号:US11225543

    申请日:2005-09-13

    IPC分类号: G06F17/27

    摘要: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

    Statistical approach to large-scale image annotation
    7.
    发明授权
    Statistical approach to large-scale image annotation 有权
    大规模图像注释的统计方法

    公开(公告)号:US08594468B2

    公开(公告)日:2013-11-26

    申请号:US13406804

    申请日:2012-02-28

    IPC分类号: G06K9/60

    CPC分类号: G06K9/00684 G06K2209/27

    摘要: Statistical approaches to large-scale image annotation are described. Generally, the annotation technique includes compiling visual features and textual information from a number of images, hashing the images visual features, and clustering the images based on their hash values. An example system builds statistical language models from the clustered images and annotates the image by applying one of the statistical language models.

    摘要翻译: 描述了大规模图像注释的统计方法。 通常,注释技术包括从许多图像编译视觉特征和文本信息,对图像进行散列视觉特征,并且基于它们的散列值对图像进行聚类。 示例系统从群集图像构建统计语言模型,并通过应用统计语言模型之一来注释图像。

    Estimating word correlations from images
    8.
    发明授权
    Estimating word correlations from images 有权
    从图像估计字相关性

    公开(公告)号:US08457416B2

    公开(公告)日:2013-06-04

    申请号:US11956333

    申请日:2007-12-13

    IPC分类号: G06K9/72

    CPC分类号: G06F17/30247 G06F17/30731

    摘要: Word correlations are estimated using a content-based method, which uses visual features of image representations of the words. The image representations of the subject words may be generated by retrieving images from data sources (such as the Internet) using image search with the subject words as query words. One aspect of the techniques is based on calculating the visual distance or visual similarity between the sets of retrieved images corresponding to each query word. The other is based on calculating the visual consistence among the set of the retrieved images corresponding to a conjunctive query word. The combination of the content-based method and a text-based method may produce even better result.

    摘要翻译: 使用基于内容的方法来估计词相关性,其使用词的图像表示的视觉特征。 可以通过使用将主题词作为查询词的图像搜索从数据源(例如因特网)检索图像来生成主题词的图像表示。 该技术的一个方面是基于计算对应于每个查询词的检索图像组之间的视觉距离或视觉相似度。 另一个是基于计算与连接查询词对应的检索到的图像的集合之间的视觉一致性。 基于内容的方法和基于文本的方法的组合可以产生更好的结果。

    Bipartite graph reinforcement modeling to annotate web images
    9.
    发明授权
    Bipartite graph reinforcement modeling to annotate web images 有权
    双边图加强建模以注释网页图像

    公开(公告)号:US08321424B2

    公开(公告)日:2012-11-27

    申请号:US11848157

    申请日:2007-08-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30265 G06F17/30864

    摘要: Systems and methods for bipartite graph reinforcement modeling to annotate web images are described. In one aspect the systems and methods implement bipartite graph reinforcement modeling operations to identify a set of annotations that are relevant to a Web image. The systems and methods annotate the Web image with the identified annotations. The systems and methods then index the annotated Web image. Responsive to receiving an image search query from a user, wherein the image search query comprises information relevant to at least a subset of the identified annotations, the image search engine service presents the annotated Web image to the user.

    摘要翻译: 描述了用于注释网络图像的二分图加强建模的系统和方法。 在一个方面,系统和方法实现二分图加强建模操作,以识别与Web图像相关的一组注释。 系统和方法用已识别的注释注释Web图像。 系统和方法然后索引注释的Web图像。 响应于从用户接收图像搜索查询,其中所述图像搜索查询包括与所识别的注释的至少一个子集相关的信息,所述图像搜索引擎服务将所述注释的Web图像呈现给所述用户。

    CLASSIFICATION OF IMAGES AS ADVERTISEMENT IMAGES OR NON-ADVERTISEMENT IMAGES
    10.
    发明申请
    CLASSIFICATION OF IMAGES AS ADVERTISEMENT IMAGES OR NON-ADVERTISEMENT IMAGES 有权
    图像分类作为广告图像或非广告图像

    公开(公告)号:US20110058734A1

    公开(公告)日:2011-03-10

    申请号:US12945635

    申请日:2010-11-12

    IPC分类号: G06K9/62

    CPC分类号: G06Q30/02 G06Q30/0277

    摘要: An advertisement image classification system trains a binary classifier to classify images as advertisement images or non-advertisement images and then uses the binary classifier to classify images of web pages as advertisement images or non-advertisement images. During a training phase, the classification system generates training data of feature vectors representing the images and labels indicating whether an image is an advertisement image or a non-advertisement Image. The classification system trains a binary classifier to classify Images using training data. During a classification phase, the classification system inputs a web page with an image and generates a feature vector for the image. The classification system then applies the trained binary classifier to the feature vector to generate a score indicating whether the image is an advertisement image or a non-advertisement image.

    摘要翻译: 广告图像分类系统训练二进制分类器将图像分类为广告图像或非广告图像,然后使用二进制分类器将网页的图像分类为广告图像或非广告图像。 在训练阶段,分类系统生成表示图像的特征向量的训练数据,以及指示图像是广告图像还是非广告图像的标签。 分类系统训练二进制分类器,以使用训练数据对图像进行分类。 在分类阶段,分类系统输入具有图像的网页,并生成图像的特征向量。 然后,分类系统将经过训练的二进制分类器应用于特征向量,以生成指示图像是广告图像还是非广告图像的分数。