摘要:
A method of training language model parameters trains discriminative model parameters in the language model based on a performance measure having discrete values.
摘要:
A pharmaceutical composition comprising cyclodextrin/paclitaxel inclusion, which consists of paclitaxel, cyclodextrin and a pharmaceutically acceptable excipient, wherein the mass ratio of the paclitaxel to cyclodextrin is 1:10-150, the said cyclodextrin is hydroxylpropyl-sulfobutyl-7-β-cyclodextrin, or sulfobutylether-7-β-cyclodextrin, or their mixture; the stability constant of the cyclodextrin/paclitaxel inclusion is Ka=5396M−1−1412M−1. The preparation method of the pharmaceutical composition is as follow: (a) A solution of cyclodextrin is added dropwise to a solution of paclitaxel in ethanol. (b) The resulting mixture is filtered through microporous membrane of 0.2-0.4 μm after being dissolved. (c) Ethanol is removed under reduced pressure to give a liquid inclusion which has the ethanol level of less than 2%, or alternatively water is also removed under reduced pressure, the resulting product is dried giving a solid inclusion.
摘要:
The subject disclosure pertains to systems and methods for performing natural language processing in which tokens are mapped to task slots. The system includes a mapper component that generates a lattice representing possible interpretations of the tokens, a decoder component that creates a ranked list of paths traversing the lattice, a scorer component that generates scores used to rank paths and post-processing components that format the paths for use by other software. Each of these components may be independent, such that the component may be modified or replaced without affecting the remaining components. This allows a variety of different mathematical models and algorithms to be tested or deployed without requiring changes to the remainder of the system.
摘要:
An ensemble of random feature clusters is built from training data using a clustering algorithm where some randomness has been introduced. For each clustered feature space, a classifier, such as a Naïve Bayesian Classifier, is trained, realizing a classifier ensemble. The final classification decision is made by the resulting classifier ensemble.
摘要:
Statistical Machine Translation (SMT) based search query spelling correction techniques are described herein. In one or more implementations, search data regarding searches performed by clients may be logged. The logged data includes query correction pairs that may be used to ascertain error patterns indicating how misspelled substrings may be translated to corrected substrings. The error patterns may be used to determine suggestions for an input query and to develop query correction models used to translate the input query to a corrected query. In one or more implementations, probabilistic features from multiple query correction models are combined to score different correction candidates. One or more top scoring correction candidates may then be exposed as suggestions for selection by a user and/or provided to a search engine to conduct a corresponding search using the corrected query version(s).
摘要:
There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.
摘要:
Log-based rankers and document-based rankers may be combined for searching. In an example embodiment, there is a method for combining rankers to perform a search operation. A count of query instances in log data is ascertained based on a query. A search for the query is performed to produce a set of search results. The set of search results is ranked by relevance score with a document-based ranker and a log-based ranker using a weighting factor that is adapted responsive to the count of the query instances in the log data.
摘要:
The universal text input technique described herein addresses the difficulties of typing text in various languages and scripts, and offers a unified solution, which combines character conversion, next word prediction, spelling correction and automatic script switching to make it extremely simple to type any language from any device. The technique provides a rich and seamless input experience in any language through a universal IME (input method editor). It allows a user to type in any script for any language using a regular qwerty keyboard via phonetic input and at the same time allows for auto-completion and spelling correction of words and phrases while typing. The technique also provides a modeless input that automatically turns on and off an input mode that changes between different types of script.
摘要:
Query-correction pairs can be extracted from search log data. Each query-correction pair can include an original query and a follow-up query, where the follow-up query meets one or more criteria for being identified as a correction of the original query, such as an indication of user input indicating the follow-up query is a correction for the original query. The query-correction pairs can be segmented to identify bi-phrases in the query-correction pairs. Probabilities of corrections between the bi-phrases can be estimated based on frequencies of matches in the query-correction pairs. Identifications of the bi-phrases and representations of the probabilities of those bi-phrases can be stored in a probabilistic model data structure.
摘要:
Systems and methods for identifying translation pairs from web pages are provided. One disclosed method includes receiving monolingual web page data of a source language, and processing the web page data by detecting the occurrence of a predefined pattern in the web page data, and extracting a plurality of translation pair candidates. Each of the translation pair candidates may include a source language string and target language string. The method may further include determining whether each translation pair candidate is a valid transliteration. The method may also include, for each translation pair that is determined not to be a valid transliteration, determining whether each translation pair candidate is a valid translation. The method may further include adding each translation pair that is determined to be a valid translation or transliteration to a dictionary.