-
公开(公告)号:US10540450B2
公开(公告)日:2020-01-21
申请号:US15696121
申请日:2017-09-05
Applicant: Facebook, Inc.
Inventor: Kay Rottmann , Fei Huang , Ying Zhang
Abstract: Technology is disclosed for snippet pre-translation and dynamic selection of translation systems. Pre-translation uses snippet attributes such as characteristics of a snippet author, snippet topics, snippet context, expected snippet viewers, etc., to predict how many translation requests for the snippet are likely to be received. An appropriate translator can be dynamically selected to produce a translation of a snippet either as a result of the snippet being selected for pre-translation or from another trigger, such as a user requesting a translation of the snippet. Different translators can generate high quality translations after a period of time or other translators can generate lower quality translations earlier. Dynamic selection of translators involves dynamically selecting machine or human translation, e.g., based on a quality of translation that is desired. Translations can be improved over time by employing better machine or human translators, such as when a snippet is identified as being more popular.
-
公开(公告)号:US20190013011A1
公开(公告)日:2019-01-10
申请号:US15866420
申请日:2018-01-09
Applicant: Facebook, Inc.
Inventor: Fei Huang
CPC classification number: G10L15/063 , G06F17/274 , G06F17/275 , G06F17/279 , G06F17/28 , G10L15/005 , G10L15/26 , G10L2015/0633 , G10L2015/0636
Abstract: Technology is disclosed for creating and tuning classifiers for language dialects and for generating dialect-specific language modules. A computing device can receive an initial training data set as a current training data set. The selection process for the initial training data set can be achieved by receiving one or more initial content items, establishing dialect parameters of each of the initial content items, and sorting each of the initial content items into one or more dialect groups based on the established dialect parameters. The computing device can generate, based on the initial training data set, a dialect classifier configured to detect language dialects of content items to be classified. The computing device can augment the current training data set with additional training data by applying the dialect classifier to candidate content items. The computing device can then update the dialect classifier based on the augmented current training data set.
-
公开(公告)号:US10133738B2
公开(公告)日:2018-11-20
申请号:US14967897
申请日:2015-12-14
Applicant: Facebook, Inc.
Inventor: Fei Huang
Abstract: A confidence scoring system can include a model trained using features extracted from translations that have received user translation ratings. The features can include, e.g. sentence length, an amount of out-of-vocabulary or rare words, language model probability scores of the source or translation, or a semantic similarity between the source and a translation. Parameters of the confidence model can then be adjusted based on a comparison of the confidence model output and user translation ratings, where the user translation ratings can be selected or weighted based on a determination of individual user fluentness. After the confidence model has been trained, it can produce confidence scores for new translations. If a confidence score is higher than a threshold, it can indicate the translation should be selected for automatic presentation to users. If the confidence score is below another threshold, it can indicate the translation should be updated.
-
公开(公告)号:US09990361B2
公开(公告)日:2018-06-05
申请号:US14878794
申请日:2015-10-08
Applicant: Facebook, Inc.
Inventor: Ying Zhang , Fei Huang , Xiaolong Wang
CPC classification number: G06F17/289 , G06F17/271 , G06F17/2785 , G06F17/2809
Abstract: Snippets can be represented in a language-independent semantic manner. Each portion of a snippet can be represented by a combination of a semantic representation and a syntactic representation, each in its own dimensional space. A snippet can be divided into portions by constructing a dependency structure based on relationships between words and phrases. Leaf nodes of the dependency structure can be assigned: A) a semantic representation according to pre-defined word mappings and B) a syntactic representation according to the grammatical use of the word. A trained semantic model can assign to each non-leaf node of the dependency structure a semantic representation based on a combination of the semantic and syntactic representations of the corresponding lower-level nodes. A trained syntactic model can assign to each non-leaf node a syntactic representation based on a combination of the syntactic representations of the corresponding lower-level nodes and the semantic representation of that node.
-
公开(公告)号:US20180004734A1
公开(公告)日:2018-01-04
申请号:US15696121
申请日:2017-09-05
Applicant: Facebook, Inc.
Inventor: Kay Rottmann , Fei Huang , Ying Zhang
CPC classification number: G06F17/2854 , G06F17/275 , G06F17/2809 , G06F17/289
Abstract: Technology is disclosed for snippet pre-translation and dynamic selection of translation systems. Pre-translation uses snippet attributes such as characteristics of a snippet author, snippet topics, snippet context, expected snippet viewers, etc., to predict how many translation requests for the snippet are likely to be received. An appropriate translator can be dynamically selected to produce a translation of a snippet either as a result of the snippet being selected for pre-translation or from another trigger, such as a user requesting a translation of the snippet. Different translators can generate high quality translations after a period of time or other translators can generate lower quality translations earlier. Dynamic selection of translators involves dynamically selecting machine or human translation, e.g., based on a quality of translation that is desired. Translations can be improved over time by employing better machine or human translators, such as when a snippet is identified as being more popular.
-
公开(公告)号:US09830386B2
公开(公告)日:2017-11-28
申请号:US14586049
申请日:2014-12-30
Applicant: Facebook, Inc.
Inventor: Fei Huang , Kay Rottmann , Ying Zhang , Matthias Gerhard Eck
CPC classification number: G06F17/30705 , G06F17/2785 , G06Q30/02 , G06Q50/01 , G06Q50/10
Abstract: Technology is discussed herein for identifying comparatively trending topics between groups of posts. Groups of posts can be selected based on parameters such as author age, location, gender, etc., or based on information about content items such as when they were posted or what keywords they contain. Topics, as one or more groups of words, can each be given a rank score for each group based on the topic's frequency within each group. A difference score for selected topics can be computed based on a difference between the rank score for the selected topic in each of the groups. When the difference score for a selected topic is above a specified threshold, that selected topic can be identified as a comparatively trending topic.
-
公开(公告)号:US20160188661A1
公开(公告)日:2016-06-30
申请号:US14586074
申请日:2014-12-30
Applicant: Facebook, Inc.
Inventor: Fei Huang , Kay Rottmann , Ying Zhang , Matthias Gerhard Eck
IPC: G06F17/30
CPC classification number: G06F17/30991 , G06F17/30979
Abstract: Technology is discussed herein for identifying trending actions within a group of posts matching a query. A group of posts can be selected based on specified actions, action targets, or parameters such as author age, location, gender, when the posts were posted or what keywords they contain. Selected posts can be divided into sentences and a dependency structure can be created for each sentence classifying portions of the sentence as actions or action targets. Statistics can be generated for each sentence or post indicating whether it matches the actions, action targets, or other parameters specified in the query. Based on these statistics, additional information can be gathered to respond to questions posed in the query.
Abstract translation: 本文讨论了技术,用于识别匹配查询的一组帖子内的趋势动作。 可以根据指定的操作,操作目标或参数(如作者年龄,位置,性别,发布信息或包含哪些关键字)选择一组帖子。 选定的帖子可以分为句子,并且可以为每个句子创建依赖关系结构,将句子的部分分类为动作或动作目标。 可以为每个句子或者后缀生成统计信息,指出它是否与查询中指定的动作,动作目标或其他参数相匹配。 基于这些统计数据,可以收集附加信息来回应查询中提出的问题。
-
公开(公告)号:US20160188576A1
公开(公告)日:2016-06-30
申请号:US14586022
申请日:2014-12-30
Applicant: Facebook, Inc.
Inventor: Fei Huang
IPC: G06F17/28
CPC classification number: G06F17/289 , G06F17/2854
Abstract: Technology is disclosed to select a preferred machine translation from multiple machine translations of a content item, each machine translation from the multiple machine translations created for the same target language. Each machine translation is assigned a score based on feedback from a user group that receives the machine translation. The machine translation with the highest score is identified as the preferred machine translation, and is provided in response to subsequent requests for translations of the content item. If there is no preferred translation, the several top scoring machine translations are provided to a larger group of users for further scoring. This process may be repeated until either a clearly preferred translation is identified, a maximum number of iterations is reached, or a maximum number of scoring users is reached.
Abstract translation: 技术被公开以从内容项目的多个机器翻译中选择优选的机器翻译,每个机器翻译从为相同目标语言创建的多个机器翻译。 根据接收机器翻译的用户组的反馈,为每个机器翻译分配一个分数。 具有最高分数的机器翻译被识别为优选的机器翻译,并且响应于随后的内容项目的翻译请求被提供。 如果没有首选翻译,则会向较大的一组用户提供多个顶级评分机器翻译,以进一步评分。 可以重复该过程,直到识别出明确优选的翻译,达到最大迭代次数或达到最大数量的评分用户。
-
公开(公告)号:US20190180386A1
公开(公告)日:2019-06-13
申请号:US15838292
申请日:2017-12-11
Applicant: Facebook, Inc.
Inventor: Sohang Chander Gandhi , Do Huy Hoang , Yaniv Shmueli , Fei Huang
Abstract: In one embodiment, a method includes accessing a place-entities graph comprising place-entity nodes, each place-entity node representing a place-entity corresponding to a particular geographic location; identifying a place-entity cluster within the place-entities graph, wherein the place-entity cluster includes place-entity nodes corresponding to respective place-entities each corresponding to the same geographic location; accessing embeddings representing the respective place-entities corresponding to the place-entity cluster; calculating, using a machine-learning model, a cluster-quality score of the place-entity cluster based on the embeddings representing the place-entities corresponding to the place-entity cluster, wherein the cluster-quality score represents a probability that the place-entities corresponding to the place-entity cluster correspond to a valid geographic location; and identifying the place-entities corresponding to the place-entity cluster as corresponding to an invalid geographic location based on a determining that the cluster-quality score is less than a threshold cluster-quality score.
-
公开(公告)号:US10180935B2
公开(公告)日:2019-01-15
申请号:US15422463
申请日:2017-02-02
Applicant: Facebook, Inc.
Inventor: Daniel Matthew Merl , Aditya Pal , Stanislav Funiak , Seyoung Park , Fei Huang , Amac Herdagdelen
Abstract: A system for identifying language(s) for content items is disclosed. The system can identify different languages for content item words segments by identifying segment languages that maximize a probability across the segments. The probability can be a combination of: an author's likelihood for the language identified for the first word; a combination of transition frequencies for selected languages identified for words, the transition frequencies indicating likelihoods that a transition occurred to the selected language from the previous word's language; and a combination of observation probabilities indicating, for a given word in the content item, a likelihood the given word is in the identified language. For an in-vocabulary word, the observation probabilities can be based on learned probability for that word. For an out-of-vocabulary word, the probability can be computed by breaking the word into overlapping n-grams and computing combined learned probabilities that each n-gram is in the given language.
-
-
-
-
-
-
-
-
-