摘要:
Systems, methods, and computer media for identifying related strings for search query rewriting are provided. Session data for a user search query session in an accessed click log data is identified. It is determined whether a first additional search query in the session data is related to a first user search query based on at least one of: dwell time; a number of search result links clicked on; and similarity between web page titles or uniform resource locators (URLs). When related, the first additional search query is incorporated into a list of strings related to the first user search query. One or more supplemental strings that are related to the first user search query are also identified. The identified supplemental strings are also included in the list of strings related to the first user search query.
摘要:
A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.
摘要:
Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.
摘要:
Systems, methods, and computer media for identifying related strings for search query rewriting are provided. Session data for a user search query session in an accessed click log data is identified. It is determined whether a first additional search query in the session data is related to a first user search query based on at least one of: dwell time; a number of search result links clicked on; and similarity between web page titles or uniform resource locators (URLs). When related, the first additional search query is incorporated into a list of strings related to the first user search query. One or more supplemental strings that are related to the first user search query are also identified. The identified supplemental strings are also included in the list of strings related to the first user search query.
摘要:
Architecture that provides the capability to subselect the most relevant data from an out-domain corpus to use either in isolation or in combination conjunction with in-domain data. The architecture is a domain adaptation for machine translation that selects the most relevant sentences from a larger general-domain corpus of parallel translated sentences. The methods for selecting the data include monolingual cross-entropy measure, monolingual cross-entropy difference, bilingual cross entropy, and bilingual cross-entropy difference. A translation model is trained on both the in-domain data and an out-domain subset, and the models can be interpolated together to boost performance on in-domain translation tasks.
摘要:
A supervised technique uses relevance judgments to train a dependency parser such that it approximately optimizes Normalized Discounted Cumulative Gain (NDCG) in information retrieval. A weighted tree edit distance between the parse tree for a query and the parse tree for a document is added to a ranking function, where the edit distance weights are parameters from the parser. Using parser parameters in the ranking function enables approximate optimization of the parser's parameters for NDCG by adding some constraints to the objective function.
摘要:
Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.
摘要:
A wavy-shaped electric straight comb, which includes a comb part and a handle. The comb part has a first comb and a second comb, and the first comb has a plurality of first comb teeth, and the second comb has a plurality of second comb teeth, a plurality of through holes, each formed between two adjacent second comb teeth, which the plurality of first comb teeth of the first comb respectively drills through the plurality of through holes of the second comb, and each first comb teeth is disposed between two corresponding adjacent comb teeth, for assembling the first comb and the second comb together, and each of the first comb teeth and the second comb teeth defines a wavy-shaped cross-section, and two adjacent first and second comb teeth keep an interval from 0.25 mm to 1.5 mm and define a wavy-shaped hair accommodating space.