Joint ranking model for multilingual web search
    5.
    发明授权
    Joint ranking model for multilingual web search 有权
    多语言网络搜索的联合排名模型

    公开(公告)号:US08326785B2

    公开(公告)日:2012-12-04

    申请号:US12241078

    申请日:2008-09-30

    CPC分类号: G06F17/30675

    摘要: A classifier is built to rank documents of different languages found in a query based at least in part on similarity to other documents and the relevance of those other documents to the query. A joint ranking model, e.g., based upon a Boltzmann machine, is used to represent the content similarity among documents, and to help determine joint relevance probability for a set of documents. The relevant documents of one language are thus leveraged to improve the relevance estimation for documents of different languages. In one aspect, a hidden layer of units (neurons) represents clusters (corresponding to relevant topics) among the retrieved documents, with an output layer representing the relevant documents and their features, and edges representing a relationship between clusters and documents.

    摘要翻译: 构建分类器至少部分地基于与其他文档的相似性以及这些其他文档与查询的相关性来对查询中发现的不同语言的文档进行排序。 联合排名模型,例如基于玻尔兹曼(Boltzmann)机器,用于表示文档之间的内容相似性,并且帮助确定一组文档的联合相关概率。 因此,利用一种语言的相关文件来改进不同语言文件的相关性估计。 在一个方面,隐藏的单位(神经元)表示检索的文档中的集群(对应于相关主题),输出层表示相关文档及其特征,边缘表示集群和文档之间的关系。

    Processing collocation mistakes in documents
    6.
    发明授权
    Processing collocation mistakes in documents 有权
    处理文件中的并置错误

    公开(公告)号:US07574348B2

    公开(公告)日:2009-08-11

    申请号:US11177136

    申请日:2005-07-08

    IPC分类号: G06F17/27

    摘要: A sentence is accessed and at least one query is generated based on the sentence. At least one query can be compared to text within a collection of documents, for example using a web search engine. Collocation errors in the sentence can be detected and/or corrected based on the comparison of the at least one query and the text within the collection of documents.

    摘要翻译: 访问一个句子,并且基于该句子生成至少一个查询。 至少可以将一个查询与文档集合中的文本进行比较,例如使用Web搜索引擎。 可以基于至少一个查询与文档集合内的文本的比较来检测和/或修正该句子中的配置错误。

    Web-based collocation error proofing
    7.
    发明申请
    Web-based collocation error proofing 有权
    基于Web的搭配错误打样

    公开(公告)号:US20080133444A1

    公开(公告)日:2008-06-05

    申请号:US11633788

    申请日:2006-12-05

    IPC分类号: G06N7/02 G06F17/30 G06F3/048

    摘要: Collocation errors can be automatically proofed using local and network-based corpora, including the Web. For example, according to one illustrative method, one or more collocations from a text sample are compared with a corpus such as the content of the Web. The collocations are identified for whether they are disfavored in the corpus. Indications are provided via an output device of whether the collocations are disfavored in the corpus. Additional steps may then be taken such as searching for and providing potentially proper word collocations via a user output.

    摘要翻译: 可以使用本地和基于网络的语料库(包括Web)自动验证并置错误。 例如,根据一个说明性方法,将来自文本样本的一个或多个并置与诸如Web的内容的语料库进行比较。 识别他们是否在语料库中不利的搭配。 通过输出设备提供指示是否在语料库中不匹配。 然后可以采取额外的步骤,例如通过用户输出搜索并提供潜在的适当的单词搭配。

    Compression of logs of language data
    8.
    发明申请
    Compression of logs of language data 审中-公开
    压缩日志的语言数据

    公开(公告)号:US20050203934A1

    公开(公告)日:2005-09-15

    申请号:US10796644

    申请日:2004-03-09

    CPC分类号: H03M7/30

    摘要: A method and apparatus for compressing query logs is provided. Multiple levels of user-specifiable compression include character-based compression, token-based compression, and subsumption. An efficient method for performing subsumption is also provided. The compressed query logs are then used to train a statistical process such as a help function for a computer operating system.

    摘要翻译: 提供了一种用于压缩查询日志的方法和装置。 用户可指定压缩的多个级别包括基于字符的压缩,基于令牌的压缩和包含。 还提供了一种执行包含的有效方法。 然后,压缩的查询日志用于训练诸如用于计算机操作系统的帮助功能的统计过程。

    Automatic text generation
    9.
    发明申请
    Automatic text generation 审中-公开
    自动文本生成

    公开(公告)号:US20050033713A1

    公开(公告)日:2005-02-10

    申请号:US10887058

    申请日:2004-07-08

    CPC分类号: G06F17/2881 G06F9/453

    摘要: A text generator automatically generating a text document based on the actions of an author on a user interface. To generate the text document the author activates a recording component. The recording component records the author's actions on the user interface. Based on the recorded actions, a text generation component searches a text database and identifies an entry that matches the author's recorded actions. This text is then combined to form a text document, which provides instruction or other information to a user. During the process of generating the text document, the text can be edited using an editor as desired, such as to enhance the comprehensibility of the document.

    摘要翻译: 文本生成器根据作者在用户界面上的动作自动生成文本文档。 要生成文本文档,作者激活录制组件。 录音组件将作者的动作记录在用户界面上。 基于记录的动作,文本生成组件搜索文本数据库并识别与作者记录的动作相匹配的条目。 然后将该文本组合以形成文本文档,其向用户提供指令或其他信息。 在生成文本文档的过程中,可以使用编辑器根据需要编辑文本,以增强文档的可理解性。

    Method and apparatus for tone-sensitive acoustic modeling
    10.
    发明授权
    Method and apparatus for tone-sensitive acoustic modeling 失效
    用于音调声学建模的方法和装置

    公开(公告)号:US5884261A

    公开(公告)日:1999-03-16

    申请号:US271639

    申请日:1994-07-07

    摘要: Tone-sensitive acoustic models are generated by first generating acoustic vectors which represent the input data. The input data is separated into multiple frames and an acoustic vector is generated for each frame which represents the input data over its corresponding frame. A tone-sensitive parameter is then generated for each of the frames which indicates the tone of the input data at its corresponding frame. Tone-sensitive parameters are generated in accordance with two embodiments. First, a pitch detector may be used to calculate a pitch for each of the frames. If a pitch cannot be detected for a particular frame, then a pitch is created for that frame based on the pitch values of surrounding frames. Second, the cross covariance between the autocorrelation coefficients for each frame and its successive frame may be generated and used as the tone-sensitive parameter. Feature vectors are then created for each frame by appending the tone-sensitive parameter for a frame to the acoustic vector for the same frame. Then, using these feature vectors, acoustic models are created which represent the input data.

    摘要翻译: 通过首先产生表示输入数据的声矢量来产生音调敏感的声学模型。 输入数据被分成多个帧,并且为代表其对应帧上的输入数据的每个帧生成声向量。 然后,对于指示在其对应帧处的输入数据的音调的每个帧,生成对音调敏感的参数。 根据两个实施例产生音敏参数。 首先,可以使用音调检测器来计算每个帧的音调。 如果对于特定帧不能检测到音调,则基于周围帧的音调值创建针对该帧的音高。 其次,可以生成每个帧及其连续帧的自相关系数之间的交叉协方差,并将其用作音调敏感参数。 然后通过将帧的音调敏感参数附加到相同帧的声矢量来为每个帧创建特征向量。 然后,使用这些特征向量,创建表示输入数据的声学模型。