-
公开(公告)号:US08577670B2
公开(公告)日:2013-11-05
申请号:US12684749
申请日:2010-01-08
申请人: Kuansan Wang , Xiaolong Li , Jiangbo Miao , Frederic H. Behr, Jr.
发明人: Kuansan Wang , Xiaolong Li , Jiangbo Miao , Frederic H. Behr, Jr.
IPC分类号: G06F17/27
CPC分类号: G06F17/2715 , G06F17/277 , G06F17/30864 , G10L15/183
摘要: A statistical language model (SLM) may be iteratively refined by considering N-gram counts in new data, and blending the information contained in the new data with the existing SLM. A first group of documents is evaluated to determine the probabilities associated with the different N-grams observed in the documents. An SLM is constructed based on these probabilities. A second group of documents is then evaluated to determine the probabilities associated with each N-gram in that second group. The existing SLM is then evaluated to determine how well it explains the probabilities in the second group of documents, and a weighting parameter is calculated from that evaluation. Using the weighting parameter, a new SLM is then constructed as a weighted average of the existing SLM and the new probabilities.
摘要翻译: 可以通过考虑新数据中的N-gram计数,并将新数据中包含的信息与现有SLM进行混合来迭代地改进统计语言模型(SLM)。 评估第一组文件以确定与文件中观察到的不同N-gram相关联的概率。 基于这些概率构建SLM。 然后评估第二组文件以确定与该第二组中的每个N-gram相关联的概率。 然后评估现有SLM以确定它如何解释第二组文档中的概率,并从该评估计算加权参数。 使用加权参数,然后构建新的SLM作为现有SLM的加权平均值和新概率。
-
公开(公告)号:US20110172988A1
公开(公告)日:2011-07-14
申请号:US12684749
申请日:2010-01-08
申请人: Kuansan Wang , Xiaolong Li , Jiangbo Miao , Frederic H. Behr, Jr.
发明人: Kuansan Wang , Xiaolong Li , Jiangbo Miao , Frederic H. Behr, Jr.
IPC分类号: G06F17/27
CPC分类号: G06F17/2715 , G06F17/277 , G06F17/30864 , G10L15/183
摘要: A statistical language model (SLM) may be iteratively refined by considering N-gram counts in new data, and blending the information contained in the new data with the existing SLM. A first group of documents is evaluated to determine the probabilities associated with the different N-grams observed in the documents. An SLM is constructed based on these probabilities. A second group of documents is then evaluated to determine the probabilities associated with each N-gram in that second group. The existing SLM is then evaluated to determine how well it explains the probabilities in the second group of documents, and a weighting parameter is calculated from that evaluation. Using the weighting parameter, a new SLM is then constructed as a weighted average of the existing SLM and the new probabilities.
摘要翻译: 可以通过考虑新数据中的N-gram计数,并将新数据中包含的信息与现有SLM进行混合来迭代地改进统计语言模型(SLM)。 评估第一组文件以确定与文件中观察到的不同N-gram相关联的概率。 基于这些概率构建SLM。 然后评估第二组文件以确定与该第二组中的每个N-gram相关联的概率。 然后评估现有SLM以确定它如何解释第二组文档中的概率,并从该评估计算加权参数。 使用加权参数,然后构建新的SLM作为现有SLM的加权平均值和新概率。
-
公开(公告)号:US09449078B2
公开(公告)日:2016-09-20
申请号:US12243937
申请日:2008-10-01
申请人: Kuansan Wang , Toby H. Walker , Zijian Zheng , Frederic H. Behr, Jr. , Yu Chen , Robert C. Wang
发明人: Kuansan Wang , Toby H. Walker , Zijian Zheng , Frederic H. Behr, Jr. , Yu Chen , Robert C. Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30648 , G06F17/30722 , G06F17/30864
摘要: The ranking quality of a ranked list may be evaluated. In an example embodiment, a method is implemented by a system to access log data, ascertain which entries of a ranked list are skipped, and determine a ranking quality metric from the skipped entries. More specifically, log data that reflects user interactions with a ranked list having multiple entries is accessed. The user interactions include at least indications of which of the multiple entries are selected entries. It is ascertained which entries of the multiple entries of the ranked list are skipped entries based on the selected entries. The ranking quality metric for the ranked list is determined responsive to the skipped entries.
摘要翻译: 可以评估排名列表的排名质量。 在一个示例性实施例中,系统通过系统实现访问日志数据,确定跳过排名列表的哪些条目并从跳过的条目确定排序质量度量的方法。 更具体地,访问反映与具有多个条目的排名列表的用户交互的日志数据。 用户交互包括至少指示多个条目中的哪一个是选择的条目。 基于所选择的条目,确定排序列表的多个条目的哪些条目被跳过条目。 响应于跳过的条目来确定排名列表的排名质量度量。
-
公开(公告)号:US07937383B2
公开(公告)日:2011-05-03
申请号:US12024989
申请日:2008-02-01
申请人: Michael D. Hintze , Frederic H. Behr, Jr. , Randall F. Kern , Zijian Zheng , Kimberly J. Howell
发明人: Michael D. Hintze , Frederic H. Behr, Jr. , Randall F. Kern , Zijian Zheng , Kimberly J. Howell
IPC分类号: G06F17/30
CPC分类号: G06F17/30
摘要: Assigning session identifications to log entries and generating anonymous log entries are provided. In order to balance users' privacy concerns with the need for analysis of the log entries to provide high quality search results, non-user-specific data fields, such as a user's location (e.g., city, state, and latitude/longitude) and connection speed, are inserted into the log entries, and user-specific data fields, such as the IP address and cookie identifications, are deleted from the log entries. In addition or alternatively, prior to anonymization of the log entries, session identifications are assigned to identified groups of log entries. The groups are identified based on factors such as the user's identification, the IP address, the time of search, and differences between the search terms used in the search queries.
摘要翻译: 为会话标识分配日志条目和生成匿名日志条目。 为了平衡用户的隐私问题,需要分析日志条目以提供高质量的搜索结果,非用户特定的数据字段(例如用户的位置(例如城市,州和纬度/经度))和 连接速度被插入到日志条目中,并且从日志条目中删除用户特定的数据字段,例如IP地址和cookie标识。 另外或替代地,在匿名日志条目之前,将会话标识分配给所识别的日志条目组。 基于用户的识别,IP地址,搜索时间以及搜索查询中使用的搜索词之间的差异来确定组。
-
-
-