摘要:
Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as information pertaining to the grammatical roles of the terms and ontological and other semantic information. In one embodiment, the enhanced document index is a form of term-clause index, that indexes terms and syntactic and semantic annotations at the clause level. The enhanced document index permits the use of a traditional keyword search engine to process relationship queries as well as to process standard document level keyword searches. In one embodiment, the SQE comprises a Query Processor, a Data Set Preprocessor, a Keyword Search Engine, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface or an application programming interface.
摘要:
Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as information pertaining to the grammatical roles of the terms and ontological and other semantic information. In one embodiment, the enhanced document index is a form of term-clause index, that indexes terms and syntactic and semantic annotations at the clause level. The enhanced document index permits the use of a traditional keyword search engine to process relationship queries as well as to process standard document level keyword searches. This abstract is provided to comply with rules requiring an abstract, and it is submitted with the intention that it will not be used to interpret or limit the scope or meaning of the claims.
摘要:
Techniques for providing quotations obtained from text documents using natural language processing techniques are described. Some embodiments provide a content recommendation system (“CRS”) configured to provide quotations by extracting quotations from a corpus text documents, and providing access to the extracted quotations in response to search requests received from users. The CRS may extract quotations by using natural language processing-based techniques to identify one or more entities, such as people, places, objects, concepts, or the like, that are referenced by the extracted quotations. The CRS may then store the extracted quotations along with identified entities, such as quotation speakers and subjects, for later access via search requests.
摘要:
Techniques for providing quotations obtained from text documents using natural language processing techniques are described. Some embodiments provide a content recommendation system (“CRS”) configured to provide quotations by extracting quotations from a corpus text documents, and providing access to the extracted quotations in response to search requests received from users. The CRS may extract quotations by using natural language processing-based techniques to identify one or more entities, such as people, places, objects, concepts, or the like, that are referenced by the extracted quotations. The CRS may then store the extracted quotations along with identified entities, such as quotation speakers and subjects, for later access via search requests.
摘要:
Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.
摘要:
Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.
摘要:
Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.