摘要:
A method of identifying definitions in documents includes receiving text units as an input. Which of the text units includes a cue phrase is then identified. For text units identified as including a cue phrase, localized parsing is performed around the cue phrase to determine whether the text unit including the cue phrase contains a definition.
摘要:
Method of integrating Flex and Yacc (or their respective equivalents) into a named entity recognition engine used as a component of a general text processing system is provided. The named entity recognition engine adds results into a central representation or lattice for use by various subsequent applications. The applications can configure which named entity classes or types are recognized via an application program interface. The text processing system configures input and output through the lattice for Flex and Yacc to maintain high performance. Optionally, the text processing system minimizes expensive lexicon look-up by maximizing named entity constituents matched by Flex-generated recognizers.
摘要:
Methods of identifying named entities in natural language text using machine or computer compiler tools are provided. A lexical analyzer generator such as Flex or Lex or an equivalent tool can be used to generate a recognizer for named entities, such as digits, date expressions, and email or web addresses. Alternatively, a parser generator, such as Yacc or Bison or an equivalent tool can be used to generate a recognizer for other named entities, such as person and company names. Further, a lexical analyzer generated by Flex, Lex, or its equivalent is used in combination with a parser generated by Yacc, Bison, or its equivalent to identify named entities. Multiple lexical analyzers and/or parsers identify one or more classes of named entities, such as email addresses or person names. In many embodiments, recognized named entities can be used to construct at least one index of web pages or documents including named entities that can be accessed by a natural language processing application.
摘要:
Methods of constructing a document index including named entity information generated by at least one tool associated with parsing computer programs are presented. The methods include using a lexical analyzer generator, e.g. Flex, and/or a parser generator, e.g. Yacc, to generate named entity recognizers. The named entity recognizers are used to identify named entities in documents, in particular, very large document sets such as web pages available on the Internet. The identified named entities are stored as named entity annotations in the document index. Also, methods of performing searches using the document index are presented. The searches are performed based on queries that can be received on an application programming interface (API). Relevant documents are obtained using the named entity annotations, which can be returned across the API. Also presented are associated computer readable media.
摘要:
An analysis module, when triggered by a synchronization framework when a new data item is added to a project data store, runs a series of analysis feature extractors on the new content. An analysis may be conducted, and features of interest may be extracted from the data item. The analysis utilizes natural language processing, as well as other technologies, to provide an automatic or semi-automatic extraction of information. The extracted features of interest are saved as metadata within the project data store, and are associated with the data item from which it was extracted. The analysis module may be utilized to discover additional information that may be gleaned from content that is already in the project data store.
摘要:
Tools and techniques are described for providing multi-lingual word hyphenation using inductive machine learning on training data. Methods provided by these techniques may receive training data that includes hyphenated words, and may inductively generate hyphenation patterns that represent substrings of these words. The hyphenation patterns may include the substrings and hyphenation codes associated with characters occurring in the substrings. The methods may receive induction parameters applicable to generating the hyphenation patterns, and may store the hyphenation patterns into a language-specific lexicon file. These methods may also receive requests to hyphenate input words that occur in a human language, and may evaluate how to process the request based on the language. The methods may search for hyphenation patterns occurring in the input words, with the hyphenation patterns being stored in the lexicon file. Finally, the methods may respond to the request, indicating whether the hyphenation patterns occurred in the input words.
摘要:
An identification (ID) tag includes a substrate having an input capable of receiving a high frequency signal. For instance, the high frequency signal can be a radio frequency (RF) signal that is generated as part of a radio frequency (RF) ID system. A first charge pump is coupled to the input and is configured to convert the high frequency signal to a substantially direct current (DC) voltage. A data recovery circuit is coupled to the input and is capable of recovering data from the high frequency signal. A back scatter switch is coupled to the input and is capable of modifying an impedance of the input, responsive to a control signal. A state machine is disposed on the substrate and is responsive to the data recovered by the second charge pump, where the state machine is capable of generating the control signal for the back scatter switch in response to the data. The DC voltage from the first charge pump is capable of providing a voltage supply for at least one of the data recovery circuit, the back scatter switch, and the state machine. The data recovery circuit includes a second charge pump that is capable of operating on the high frequency signal simultaneously with the first charge pump. In other words, the first charge pump can generate the supply voltage for the ID tag from the high frequency signal, while the second charge pump simultaneously retrieves the data from the high frequency signal. The first charge pump also includes a means for limiting the amplitude of the DC voltage by reducing the charge pump efficiency, once a threshold voltage is reached.
摘要:
A system and methods of language identification of natural language text are presented. The system includes stored expected character counts and variances for a list of characters found in a natural language. Expected character counts and variances are stored for multiple languages to be considered during language identification. At run-time, one or more languages are identified for a text sample based on comparing actual and expected character counts. The present methods can be combined with upstream analyzing of Unicode ranges for characters in the text sample to limit the number of languages considered. Further, n-gram methods can be used in downstream processing to select the most probable language from among the languages identified by the present system and methods.
摘要:
A radio frequency identification (RFID) architecture is described. RFID tags are interrogated by a reader, which may be located in a network of readers. The reader transmits symbols to the tags. Tags respond to the interrogations with symbols that each represent one or more bits of data. An RFID tag includes an antenna pad, a receiver, a state machine, and a modulator. The receiver is coupled to the antenna pad. The receiver receives a symbol from the antenna pad and outputs a received signal. The state machine is configured to determine a response symbol from the received signal and an operating state of the tag. The modulator is coupled to the antenna pad. The modulator is configured to backscatter modulate the received symbol with the response symbol. The modulator is configured to output the backscatter modulated symbol to the antenna pad.
摘要:
A method, system, and apparatus for communicating with a radio frequency identification (RFID) tag population that includes one or more tags are described. The tags are interrogated by a reader which may be located in a network of readers. The reader interrogates the tags by transmitting data symbols to the tags. Tags respond to the reader with backscatter symbols. Bit patterns, such as identification numbers stored in the tags, are collected from the plurality of tags without collisions. Collisions are avoided because the backscatter symbols transmitted by the tags use different characteristics to represent different data bits. For example, a first backscatter symbol frequency is used by the tag to represent a “0” bit, and a second backscatter symbol frequency is used by the tag to represent a “1” bit.