Definition extraction
    1.
    发明申请
    Definition extraction 有权
    定义提取

    公开(公告)号:US20070027863A1

    公开(公告)日:2007-02-01

    申请号:US11194873

    申请日:2005-08-01

    IPC分类号: G06F17/30

    摘要: A method of identifying definitions in documents includes receiving text units as an input. Which of the text units includes a cue phrase is then identified. For text units identified as including a cue phrase, localized parsing is performed around the cue phrase to determine whether the text unit including the cue phrase contains a definition.

    摘要翻译: 识别文档中的定义的方法包括接收文本单元作为输入。 然后识别包括提示短语的哪个文本单位。 对于被标识为包括提示短语的文本单元,围绕提示短语执行本地化解析,以确定包括提示短语的文本单元是否包含定义。

    Integration of Flex and Yacc into a linguistic services platform for named entity recognition
    2.
    发明申请
    Integration of Flex and Yacc into a linguistic services platform for named entity recognition 审中-公开
    将Flex和Yacc集成到用于命名实体识别的语言服务平台中

    公开(公告)号:US20060047690A1

    公开(公告)日:2006-03-02

    申请号:US10939300

    申请日:2004-09-10

    IPC分类号: G06F17/00

    CPC分类号: G06F17/278

    摘要: Method of integrating Flex and Yacc (or their respective equivalents) into a named entity recognition engine used as a component of a general text processing system is provided. The named entity recognition engine adds results into a central representation or lattice for use by various subsequent applications. The applications can configure which named entity classes or types are recognized via an application program interface. The text processing system configures input and output through the lattice for Flex and Yacc to maintain high performance. Optionally, the text processing system minimizes expensive lexicon look-up by maximizing named entity constituents matched by Flex-generated recognizers.

    摘要翻译: 提供了将Flex和Yacc(或其各自的等同物)集成到用作通用文本处理系统的组件的命名实体识别引擎中的方法。 命名实体识别引擎将结果添加到中央表示或格子中以供各种后续应用使用。 应用程序可以通过应用程序界面配置哪些命名实体类或类型被识别。 文本处理系统通过Flex和Yacc的格子配置输入和输出,以保持高性能。 可选地,文本处理系统通过最大化由Flex生成的识别器匹配的命名实体组件来最小化昂贵的词典查找。

    Named entity recognition using compiler methods
    3.
    发明申请
    Named entity recognition using compiler methods 审中-公开
    使用编译方法命名实体识别

    公开(公告)号:US20060047500A1

    公开(公告)日:2006-03-02

    申请号:US10930131

    申请日:2004-08-31

    IPC分类号: G06F17/27

    CPC分类号: G06F17/278

    摘要: Methods of identifying named entities in natural language text using machine or computer compiler tools are provided. A lexical analyzer generator such as Flex or Lex or an equivalent tool can be used to generate a recognizer for named entities, such as digits, date expressions, and email or web addresses. Alternatively, a parser generator, such as Yacc or Bison or an equivalent tool can be used to generate a recognizer for other named entities, such as person and company names. Further, a lexical analyzer generated by Flex, Lex, or its equivalent is used in combination with a parser generated by Yacc, Bison, or its equivalent to identify named entities. Multiple lexical analyzers and/or parsers identify one or more classes of named entities, such as email addresses or person names. In many embodiments, recognized named entities can be used to construct at least one index of web pages or documents including named entities that can be accessed by a natural language processing application.

    摘要翻译: 提供使用机器或计算机编译工具识别自然语言文本中的命名实体的方法。 诸如Flex或Lex之类的词法分析器生成器或等效工具可用于生成命名实体的识别器,例如数字,日期表达式和电子邮件或网址。 或者,可以使用解析器生成器(例如Yacc或Bison或等效工具)来生成其他命名实体的识别器,例如人员和公司名称。 此外,由Flex,Lex或其等价物生成的词法分析器与Yacc,Bison或其等同物识别命名实体的解析器结合使用。 多个词汇分析器和/或解析器识别一个或多个命名实体类,例如电子邮件地址或人名。 在许多实施例中,识别的命名实体可用于构建网页或文档的至少一个索引,包括可被自然语言处理应用程序访问的命名实体。

    Creating a document index from a flex- and Yacc-generated named entity recognizer
    4.
    发明申请
    Creating a document index from a flex- and Yacc-generated named entity recognizer 审中-公开
    从flex和Yacc生成的命名实体识别器创建文档索引

    公开(公告)号:US20060047691A1

    公开(公告)日:2006-03-02

    申请号:US10954610

    申请日:2004-09-30

    IPC分类号: G06F17/00

    CPC分类号: G06F17/278

    摘要: Methods of constructing a document index including named entity information generated by at least one tool associated with parsing computer programs are presented. The methods include using a lexical analyzer generator, e.g. Flex, and/or a parser generator, e.g. Yacc, to generate named entity recognizers. The named entity recognizers are used to identify named entities in documents, in particular, very large document sets such as web pages available on the Internet. The identified named entities are stored as named entity annotations in the document index. Also, methods of performing searches using the document index are presented. The searches are performed based on queries that can be received on an application programming interface (API). Relevant documents are obtained using the named entity annotations, which can be returned across the API. Also presented are associated computer readable media.

    摘要翻译: 提出了构建包括由与解析计算机程序相关联的至少一个工具生成的命名实体信息的文档索引的方法。 这些方法包括使用词法分析器生成器,例如, Flex,和/或解析器生成器,例如。 Yacc,以生成命名实体识别器。 命名实体识别器用于识别文档中的命名实体,特别是非常大的文档集,如Internet上可用的网页。 识别的命名实体作为命名实体注释存储在文档索引中。 此外,呈现使用文档索引执行搜索的方法。 搜索是基于可以在应用程序编程接口(API)上接收的查询来执行的。 使用命名实体注释获取相关文档,可以通过API返回。 还提供了相关联的计算机可读介质。

    MULTI-LINGUAL WORD HYPHENATION USING INDUCTIVE MACHINE LEARNING ON TRAINING DATA
    6.
    发明申请
    MULTI-LINGUAL WORD HYPHENATION USING INDUCTIVE MACHINE LEARNING ON TRAINING DATA 有权
    使用感应机器学习培训数据的多语言词汇

    公开(公告)号:US20090182550A1

    公开(公告)日:2009-07-16

    申请号:US12015489

    申请日:2008-01-16

    IPC分类号: G06F17/28

    CPC分类号: G06F17/26

    摘要: Tools and techniques are described for providing multi-lingual word hyphenation using inductive machine learning on training data. Methods provided by these techniques may receive training data that includes hyphenated words, and may inductively generate hyphenation patterns that represent substrings of these words. The hyphenation patterns may include the substrings and hyphenation codes associated with characters occurring in the substrings. The methods may receive induction parameters applicable to generating the hyphenation patterns, and may store the hyphenation patterns into a language-specific lexicon file. These methods may also receive requests to hyphenate input words that occur in a human language, and may evaluate how to process the request based on the language. The methods may search for hyphenation patterns occurring in the input words, with the hyphenation patterns being stored in the lexicon file. Finally, the methods may respond to the request, indicating whether the hyphenation patterns occurred in the input words.

    摘要翻译: 描述了使用感应机器学习训练数据来提供多语言单词连字的工具和技术。 通过这些技术提供的方法可以接收包括连字字的训练数据,并且可以感应地生成表示这些单词的子串的连字符模式。 连字符模式可以包括与在子字符串中出现的字符相关联的子串和连字符代码。 这些方法可以接收适用于生成连字符模式的归纳参数,并且可以将连字符模式存储到语言特定的词典文件中。 这些方法也可以接收对以人类语言进行连字的输入单词的请求,并且可以评估如何基于该语言来处理该请求。 这些方法可以搜索在输入单词中出现的连字符模式,连字模式存储在词典文件中。 最后,这些方法可以响应请求,指示输入单词中是否发生连字符模式。

    Efficient charge pump apparatus and method
    7.
    发明申请
    Efficient charge pump apparatus and method 有权
    高效电荷泵装置及方法

    公开(公告)号:US20060202828A1

    公开(公告)日:2006-09-14

    申请号:US11434894

    申请日:2006-05-17

    IPC分类号: G08B13/14

    摘要: An identification (ID) tag includes a substrate having an input capable of receiving a high frequency signal. For instance, the high frequency signal can be a radio frequency (RF) signal that is generated as part of a radio frequency (RF) ID system. A first charge pump is coupled to the input and is configured to convert the high frequency signal to a substantially direct current (DC) voltage. A data recovery circuit is coupled to the input and is capable of recovering data from the high frequency signal. A back scatter switch is coupled to the input and is capable of modifying an impedance of the input, responsive to a control signal. A state machine is disposed on the substrate and is responsive to the data recovered by the second charge pump, where the state machine is capable of generating the control signal for the back scatter switch in response to the data. The DC voltage from the first charge pump is capable of providing a voltage supply for at least one of the data recovery circuit, the back scatter switch, and the state machine. The data recovery circuit includes a second charge pump that is capable of operating on the high frequency signal simultaneously with the first charge pump. In other words, the first charge pump can generate the supply voltage for the ID tag from the high frequency signal, while the second charge pump simultaneously retrieves the data from the high frequency signal. The first charge pump also includes a means for limiting the amplitude of the DC voltage by reducing the charge pump efficiency, once a threshold voltage is reached.

    摘要翻译: 识别(ID)标签包括具有能够接收高频信号的输入的基板。 例如,高频信号可以是作为射频(RF)ID系统的一部分而生成的射频(RF)信号。 第一电荷泵耦合到输入端并被配置为将高频信号转换成基本上直流(DC)的电压。 数据恢复电路耦合到输入端并且能够从高频信号中恢复数据。 背散射开关耦合到输入端,并且能够响应于控制信号来修改输入的阻抗。 状态机设置在基板上并且响应于由第二电荷泵恢复的数据,其中状态机能够响应于数据而产生用于背散射开关的控制信号。 来自第一电荷泵的DC电压能够为数据恢复电路,后向散射开关和状态机中的至少一个提供电压源。 数据恢复电路包括能够与第一电荷泵同时对高频信号进行操作的第二电荷泵。 换句话说,第一电荷泵可以从高频信号产生ID标签的电源电压,而第二电荷泵同时从高频信号中检索数据。 第一电荷泵还包括一旦达到阈值电压,则通过降低电荷泵效率来限制直流电压的振幅的装置。

    Efficient language identification
    8.
    发明申请
    Efficient language identification 有权
    有效的语言识别

    公开(公告)号:US20060184357A1

    公开(公告)日:2006-08-17

    申请号:US11056707

    申请日:2005-02-11

    IPC分类号: G06F17/27

    摘要: A system and methods of language identification of natural language text are presented. The system includes stored expected character counts and variances for a list of characters found in a natural language. Expected character counts and variances are stored for multiple languages to be considered during language identification. At run-time, one or more languages are identified for a text sample based on comparing actual and expected character counts. The present methods can be combined with upstream analyzing of Unicode ranges for characters in the text sample to limit the number of languages considered. Further, n-gram methods can be used in downstream processing to select the most probable language from among the languages identified by the present system and methods.

    摘要翻译: 介绍了自然语言文本语言识别的系统和方法。 该系统包括存储的预期字符计数和以自然语言发现的字符列表的方差。 为语言识别期间考虑的多种语言存储预期的字符数和差异。 在运行时,基于比较实际和预期的字符数量,为文本样本识别一种或多种语言。 目前的方法可以与文本样本中的字符的Unicode范围的上游分析相结合,以限制所考虑的语言数量。 此外,n-gram方法可以用于下游处理,以从本系统和方法识别的语言中选择最可能的语言。