Retrieval using a generalized sentence collocation
    1.
    发明授权
    Retrieval using a generalized sentence collocation 有权
    使用广义句子搭配进行检索

    公开(公告)号:US08484014B2

    公开(公告)日:2013-07-09

    申请号:US12362428

    申请日:2009-01-29

    IPC分类号: G06F17/30 G06F17/27 G06F17/21

    CPC分类号: G06F17/30684 G10L15/19

    摘要: A method and system for identifying documents relevant to a query that specifies a part of speech is provided. A retrieval system receives from a user an input query that includes a word and a part of speech. Upon receiving an input query that includes a word and a part of speech, the retrieval system identifies documents with a sentence that includes that word collocated with a word that is used as that part of speech. The retrieval system displays to the user an indication of the identified documents.

    摘要翻译: 提供了一种用于识别与指定一部分语音的查询相关的文档的方法和系统。 检索系统从用户接收包括单词和一部分语音的输入查询。 在接收到包括单词和一部分语音的输入查询时,检索系统用包含该单词的句子识别文档,该单词与用作该部分语音的单词并置。 检索系统向用户显示所识别的文档的指示。

    RETRIEVAL USING A GENERALIZED SENTENCE COLLOCATION
    2.
    发明申请
    RETRIEVAL USING A GENERALIZED SENTENCE COLLOCATION 有权
    使用一般化的声明集合进行检索

    公开(公告)号:US20100114574A1

    公开(公告)日:2010-05-06

    申请号:US12362428

    申请日:2009-01-29

    IPC分类号: G10L15/04

    CPC分类号: G06F17/30684 G10L15/19

    摘要: A method and system for identifying documents relevant to a query that specifies a part of speech is provided. A retrieval system receives from a user an input query that includes a word and a part of speech. Upon receiving an input query that includes a word and a part of speech, the retrieval system identifies documents with a sentence that includes that word collocated with a word that is used as that part of speech. The retrieval system displays to the user an indication of the identified documents.

    摘要翻译: 提供了一种用于识别与指定一部分语音的查询相关的文档的方法和系统。 检索系统从用户接收包括单词和一部分语音的输入查询。 在接收到包括单词和一部分语音的输入查询时,检索系统用包含该单词的句子识别文档,该单词与用作该部分语音的单词并置。 检索系统向用户显示所识别的文档的指示。

    Adaptive pattern learning for bilingual data mining
    3.
    发明授权
    Adaptive pattern learning for bilingual data mining 有权
    双语数据挖掘的自适应模式学习

    公开(公告)号:US08275604B2

    公开(公告)日:2012-09-25

    申请号:US12406722

    申请日:2009-03-18

    CPC分类号: G06F17/28 G06F17/2827

    摘要: Embodiments for the adaptive learning of translation layout patterns to mine bilingual data are disclosed. In accordance with at least one embodiment, the adaptive learning of patterns to mine bilingual data includes processing a bilingual web page into a Document Object Model (DOM) tree. The embodiment further includes linking the bilingual snippet pairs of each node into a plurality bilingual snippet pairs. The embodiment also includes determining one or more best fit candidate patterns based on the plurality of translation snippets via a Support Vector Machine classifier. The embodiment additionally includes mining one or more translation pairs from the bilingual web page using the one or more best fit candidate patterns. The translation pairs are further stored in a data storage. The one or more translation pairs including at least one of a term pair, a phrase pair, or a sentence pair.

    摘要翻译: 披露了双语数据翻译布局模式自适应学习的实施例。 根据至少一个实施例,对双语数据挖掘的模式的自适应学习包括将双语网页处理成文档对象模型(DOM)树。 该实施例还包括将每个节点的双语片段对链接成多个双语片段对。 该实施例还包括经由支持向量机分类器基于多个翻译片段来确定一个或多个最佳拟合候选模式。 该实施例另外包括使用一个或多个最佳拟合候选模式从双语网页挖掘一个或多个翻译对。 翻译对进一步存储在数据存储器中。 所述一个或多个翻译对包括术语对,短语对或句子对中的至少一个。

    ADAPTIVE PATTERN LEARNING FOR BILINGUAL DATA MINING
    4.
    发明申请
    ADAPTIVE PATTERN LEARNING FOR BILINGUAL DATA MINING 有权
    自适应模式学习数据挖掘

    公开(公告)号:US20100241416A1

    公开(公告)日:2010-09-23

    申请号:US12406722

    申请日:2009-03-18

    IPC分类号: G06F17/28

    CPC分类号: G06F17/28 G06F17/2827

    摘要: Embodiments for the adaptive learning of translation layout patterns to mine bilingual data are disclosed. In accordance with at least one embodiment, the adaptive learning of patterns to mine bilingual data includes processing a bilingual web page into a Document Object Model (DOM) tree. The embodiment further includes linking the bilingual snippet pairs of each node into a plurality bilingual snippet pairs. The embodiment also includes determining one or more best fit candidate patterns based on the plurality of translation snippets via a Support Vector Machine classifier. The embodiment additionally includes mining one or more translation pairs from the bilingual web page using the one or more best fit candidate patterns. The translation pairs are further stored in a data storage. The one or more translation pairs including at least one of a term pair, a phrase pair, or a sentence pair.

    摘要翻译: 披露了双语数据翻译布局模式自适应学习的实施例。 根据至少一个实施例,对双语数据挖掘的模式的自适应学习包括将双语网页处理成文档对象模型(DOM)树。 该实施例还包括将每个节点的双语片段对链接成多个双语片段对。 该实施例还包括经由支持向量机分类器基于多个翻译片段来确定一个或多个最佳拟合候选模式。 该实施例另外包括使用一个或多个最佳拟合候选模式从双语网页挖掘一个或多个翻译对。 翻译对进一步存储在数据存储器中。 所述一个或多个翻译对包括术语对,短语对或句子对中的至少一个。

    Interactive Multilingual Word-Alignment Techniques
    5.
    发明申请
    Interactive Multilingual Word-Alignment Techniques 有权
    交互式多语言字对齐技术

    公开(公告)号:US20110246173A1

    公开(公告)日:2011-10-06

    申请号:US12753023

    申请日:2010-04-01

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827 G06F17/2854

    摘要: Techniques for interactively presenting word-alignments of multilingual translations and automatically improving those translations based upon user feedback are described herein. With one or more implementations of the techniques described herein, a word-alignment user-interface (UI) concurrently displays a pair of bilingual sentences, where one is a translation of the other, and interactively highlights linked (i.e., “word-aligned”) words and phrases of the pair. Other implementations of the techniques described herein offer an option for a user to provide feedback about the existing word-alignments or realign the words or phrases. In still other described implementations, word-alignment is automatically improved based upon that user feedback.

    摘要翻译: 本文描述了用于交互地呈现多语言翻译的字对齐并基于用户反馈自动改进这些翻译的技术。 通过本文描述的技术的一个或多个实现,字对齐用户界面(UI)同时显示一对双语句子,其中一个是另一个的翻译,并且交互地突出显示链接(即,“字对齐” )该对的单词和短语。 本文描述的技术的其他实施方案提供了用于用户提供关于现有单词对齐或重新对准单词或短语的反馈的选项。 在其他描述的实施方式中,基于该用户反馈自动改进字对齐。

    Interactive multilingual word-alignment techniques
    6.
    发明授权
    Interactive multilingual word-alignment techniques 有权
    交互式多语言字对齐技术

    公开(公告)号:US08930176B2

    公开(公告)日:2015-01-06

    申请号:US12753023

    申请日:2010-04-01

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827 G06F17/2854

    摘要: Techniques for interactively presenting word-alignments of multilingual translations and automatically improving those translations based upon user feedback are described herein. With one or more implementations of the techniques described herein, a word-alignment user-interface (UI) concurrently displays a pair of bilingual sentences, where one is a translation of the other, and interactively highlights linked (i.e., “word-aligned”) words and phrases of the pair. Other implementations of the techniques described herein offer an option for a user to provide feedback about the existing word-alignments or realign the words or phrases. In still other described implementations, word-alignment is automatically improved based upon that user feedback.

    摘要翻译: 本文描述了用于交互地呈现多语言翻译的字对齐并基于用户反馈自动改进这些翻译的技术。 通过本文描述的技术的一个或多个实现,字对齐用户界面(UI)同时显示一对双语句子,其中一个是另一个的翻译,并且交互地突出显示链接(即,“字对齐” )该对的单词和短语。 本文描述的技术的其他实施方案提供了用于用户提供关于现有单词对齐或重新对准单词或短语的反馈的选项。 在其他描述的实施方式中,基于该用户反馈自动改进字对齐。

    TARGET BASED INDEXING OF MICRO-BLOG CONTENT
    7.
    发明申请
    TARGET BASED INDEXING OF MICRO-BLOG CONTENT 审中-公开
    目标基于微博内容的指标

    公开(公告)号:US20130159277A1

    公开(公告)日:2013-06-20

    申请号:US13326028

    申请日:2011-12-14

    IPC分类号: G06F17/30

    摘要: Target based indexing of micro-blog content may include extracting, labeling, and indexing data contained in micro-blog entries. For example, by adapting natural language processing (NLP) technologies to a micro-blog entry, data is extracted in order to create an index. In one embodiment, a search engine may access the index in order to return results of a search query. In another embodiment, a user interface may display micro-blog entries categorically, allowing the user to access micro-blog entries by event, quote, opinion, or other category.

    摘要翻译: 基于目标的微博内容索引可能包括提取,标注和索引微博条目中包含的数据。 例如,通过将自然语言处理(NLP)技术适应到微博条目,提取数据以创建索引。 在一个实施例中,搜索引擎可以访问索引以返回搜索查询的结果。 在另一个实施例中,用户界面可以分别显示微博条目,允许用户通过事件,报价,意见或其他类别访问微博条目。

    Learning-Based Processing of Natural Language Questions
    8.
    发明申请
    Learning-Based Processing of Natural Language Questions 审中-公开
    自然语言问题的基于学习的处理

    公开(公告)号:US20140006012A1

    公开(公告)日:2014-01-02

    申请号:US13539674

    申请日:2012-07-02

    IPC分类号: G06F17/27

    摘要: Techniques described enable answering a natural language question using machine learning-based methods to gather and analyze evidence from web searches. A received natural language question is analyzed to extract query units and to determine a question type, answer type, and/or lexical answer type using rules-based heuristics and/or machine learning trained classifiers. Query generation templates are employed to generate a plurality of ranked queries to be used to gather evidence to determine the answer to the natural language question. Candidate answers are extracted from the results based on the answer type and/or lexical answer type, and ranked using a ranker previously trained offline. Confidence levels are calculated for the candidate answers and top answer(s) may be provided to the user if the confidence levels of the top answer(s) surpass a threshold.

    摘要翻译: 所描述的技术使用基于机器学习的方法来回答自然语言问题,以收集和分析来自网络搜索的证据。 分析收到的自然语言问题以提取查询单元,并使用基于规则的启发式和/或机器学习训练分类器来确定问题类型,答案类型和/或词汇答案类型。 采用查询生成模板来生成多个排列的查询,用于收集证据以确定自然语言问题的答案。 基于答案类型和/或词汇答案类型从结果中提取候选答案,并使用以前在线下训练的跑步者进行排名。 对于候选答案计算置信水平,如果顶级答案的置信水平超过阈值,则可以向用户提供顶级答案。

    Generating Chinese language banners
    9.
    发明授权
    Generating Chinese language banners 有权
    生成中文横幅

    公开(公告)号:US08862459B2

    公开(公告)日:2014-10-14

    申请号:US13087407

    申请日:2011-04-15

    摘要: Embodiments are disclosed for automatically generating a banner given a first scroll sentence and a second scroll sentence of a Chinese couplet. The first and/or second scroll sentence can be generated by an automatic computer system or by a human (e.g., manually generated and then provided as input to an automated banner generation system) or obtained from any source (e.g., a book) and provided as input. In one embodiment, an information retrieval process is utilized to identify banner candidates that best match the first and second scroll sentences. In one embodiment, candidate banners are automatically generated. In one embodiment, a ranking model is applied in order to rank banner candidates derived from the banner search and generation processes. One or more banners are then selected from the ranked banner candidates.

    摘要翻译: 公开了用于自动生成给定中文对联的第一滚动句和第二滚动句的横幅的实施例。 第一和/或第二滚动句可以由自动计算机系统或人(例如,手动生成然后作为自动横幅生成系统的输入提供)或从任何来源(例如,书)获得并提供 作为输入。 在一个实施例中,使用信息检索处理来识别与第一和第二滚动句子最匹配的横幅候选。 在一个实施例中,自动生成候选横幅。 在一个实施例中,应用排序模型以排序从横幅搜索和生成处理导出的横幅候选。 然后从排名的横幅候选中选择一个或多个横幅。

    Generating Chinese language banners
    10.
    发明授权
    Generating Chinese language banners 有权
    生成中文横幅

    公开(公告)号:US08000955B2

    公开(公告)日:2011-08-16

    申请号:US11788448

    申请日:2007-04-20

    IPC分类号: G06F17/27

    摘要: Embodiments are disclosed for automatically generating a banner given a first scroll sentence and a second scroll sentence of a Chinese couplet. The first and/or second scroll sentence can be generated by an automatic computer system or by a human (e.g., manually generated and then provided as input to an automated banner generation system) or obtained from any source (e.g., a book) and provided as input. In one embodiment, an information retrieval process is utilized to identify banner candidates that best match the first and second scroll sentences. In one embodiment, candidate banners are automatically generated. In one embodiment, a ranking model is applied in order to rank banner candidates derived from the banner search and generation processes. One or more banners are then selected from the ranked banner candidates.

    摘要翻译: 公开了用于自动生成给定中文对联的第一滚动句和第二滚动句的横幅的实施例。 第一和/或第二滚动句可以由自动计算机系统或人(例如,手动生成然后作为自动横幅生成系统的输入提供)或从任何来源(例如,书)获得并提供 作为输入。 在一个实施例中,使用信息检索处理来识别与第一和第二滚动句子最匹配的横幅候选。 在一个实施例中,自动生成候选横幅。 在一个实施例中,应用排序模型以排序从横幅搜索和生成处理导出的横幅候选。 然后从排名的横幅候选中选择一个或多个横幅。