Method for retrieving answers from an information retrieval system
    1.
    发明授权
    Method for retrieving answers from an information retrieval system 有权
    从信息检索系统检索答案的方法

    公开(公告)号:US07269545B2

    公开(公告)日:2007-09-11

    申请号:US09823052

    申请日:2001-03-30

    IPC分类号: G06F17/30 G06F17/00

    摘要: The invention is a method for retrieving answers to questions from an information retrieval system. The method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a training set of question/answer pairs, and automatically evaluating the candidate transforms on information retrieval systems. At run time, questions are transformed into a set of queries, and re-ranking is performed on the documents retrieved.

    摘要翻译: 本发明是从信息检索系统检索问题的答案的方法。 该方法涉及自动学习短语特征,将问题分类为不同类型,从问题/答案对的训练集中自动生成候选查询变换,并自动评估信息检索系统上的候选变换。 在运行时,将问题转换成一组查询,并对所检索的文档进行重新排序。

    Question answering over structured content on the web
    2.
    发明申请
    Question answering over structured content on the web 失效
    在网络上回答结构化内容的问题

    公开(公告)号:US20070094285A1

    公开(公告)日:2007-04-26

    申请号:US11256503

    申请日:2005-10-21

    IPC分类号: G06F7/00

    摘要: Structured content and associated metadata from the Web are leveraged to provide specific answer string responses to user questions. The structured content can also be indexed at crawl-time to facilitate searching of the content at search-time. Ranking techniques can also be employed to facilitate in providing an optimum answer string and/or a top K list of answer strings for a query. Ranking can be based on trainable algorithms that utilize feature vectors for candidate answer strings. In one instance, at crawl-time, structured content is indexed and automatically associated with metadata relating to the structured content and the source web page. At search-time, candidate indexed structured content is then utilized to extract an appropriate answer string in response to a user query.

    摘要翻译: 来自网络的结构化内容和相关元数据被用来提供用户问题的特定答案字符串响应。 结构化内容还可以在爬行时间进行索引,以便于搜索时搜索内容。 也可以采用排名技术来促进为查询提供最佳答案字符串和/或回答字符串的顶部K列表。 排名可以基于利用候选答案字符串的特征向量的可训练算法。 在一个实例中,在爬行时,结构化内容被索引并且与结构化内容和源网页相关联的元数据自动关联。 在搜索时间,然后利用候选索引的结构化内容来提取响应于用户查询的适当答案字符串。

    Training a learning system with arbitrary cost functions
    3.
    发明申请
    Training a learning system with arbitrary cost functions 有权
    培训具有任意成本功能的学习系统

    公开(公告)号:US20070094171A1

    公开(公告)日:2007-04-26

    申请号:US11305395

    申请日:2005-12-16

    IPC分类号: G06F15/18

    CPC分类号: G06N3/08

    摘要: The subject disclosure pertains to systems and methods for training machine learning systems. Many cost functions are not smooth or differentiable and cannot easily be used during training of a machine learning system. The machine learning system can include a set of estimated gradients based at least in part upon the ranked or sorted results generated by the learning system. The estimated gradients can be selected to reflect the requirements of a cost function and utilized instead of the cost function to determine or modify the parameters of the learning system during training of the learning system.

    摘要翻译: 本发明涉及用于训练机器学习系统的系统和方法。 许多成本函数不平滑或可微分,并且在机器学习系统的训练期间不能轻易地使用。 机器学习系统可以至少部分地基于学习系统产生的排名或排序结果来包括一组估计梯度。 可以选择估计的梯度来反映成本函数的要求,而不是使用成本函数来确定或修改在学习系统的训练期间学习系统的参数。

    Automatic client-side user-behavior analysis for inferring user intent
    4.
    发明授权
    Automatic client-side user-behavior analysis for inferring user intent 有权
    用于推断用户意图的自动客户端用户行为分析

    公开(公告)号:US08606725B1

    公开(公告)日:2013-12-10

    申请号:US12608965

    申请日:2009-10-29

    IPC分类号: G06F15/18

    摘要: User intent may be inferred from mouse movements made within a user interface. Client-side instrumentation may be provided that collects mouse movement data that is provided to a classification engine. The classification engine receives the mouse movement data and creates a mouse trajectory. The mouse trajectory may be split into segments, and features associated with each segment may be determined. Features representing the context of the search, that is, content of the search result page, previous queries submitted, and interaction features such as scrolling, may be included. By examining the features associated with the mouse trajectories within the context of a search session, the user intent may be classified into categories using machine learning classification techniques. By inferring user intent, Web search engines may be able to predict whether a user's intent is commercial and tailor advertising accordingly.

    摘要翻译: 用户意图可以从用户界面内的鼠标移动推断出来。 可以提供客户端检测器,其收集提供给分类引擎的鼠标移动数据。 分类引擎接收鼠标移动数据并创建鼠标轨迹。 鼠标轨迹可以被分割成段,并且可以确定与每个段相关联的特征。 可以包括表示搜索的上下文的功能,即搜索结果页面的内容,提交的先前查询以及诸如滚动的交互功能。 通过在搜索会话的上下文中检查与鼠标轨迹相关联的特征,用户意图可以使用机器学习分类技术被分类成类别。 通过推断用户意图,网络搜索引擎可能能够预测用户的意图是否是商业广告,并相应地进行广告裁剪。

    MINING WEB SEARCH USER BEHAVIOR TO ENHANCE WEB SEARCH RELEVANCE
    5.
    发明申请
    MINING WEB SEARCH USER BEHAVIOR TO ENHANCE WEB SEARCH RELEVANCE 审中-公开
    采矿网搜索用户行为来增强网页搜索的相关性

    公开(公告)号:US20070208730A1

    公开(公告)日:2007-09-06

    申请号:US11457733

    申请日:2006-07-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/337 G06F16/9535

    摘要: Systems and methods that estimate user preference, via automatic interpretation of user behavior. A user behavior component associated with a search engine can automatically interpret collective behavior of users (e.g., web search users). Such feedback component can include user behavior features and predictive models (e.g., from a user behavior component) that are robust to noise, which can be present in observed user interactions with the search results (e.g., malicious and/or irrational user activity.)

    摘要翻译: 通过用户行为的自动解释来估计用户偏好的系统和方法。 与搜索引擎相关联的用户行为组件可以自动解释用户(例如,网络搜索用户)的集体行为。 这样的反馈组件可以包括用户行为特征和对噪声鲁棒的预测模型(例如,来自用户行为组件),其可以存在于观察到的与搜索结果(例如,恶意和/或不合理的用户活动)的用户交互中。

    Segmentation of strings into structured records
    6.
    发明申请
    Segmentation of strings into structured records 有权
    将字符串分割成结构化记录

    公开(公告)号:US20050234906A1

    公开(公告)日:2005-10-20

    申请号:US10825488

    申请日:2004-04-14

    IPC分类号: G06F7/00 G06F17/30

    摘要: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.

    摘要翻译: 用于将字符串分割成用于数据库管理系统的组件的系统。 字符串记录的引用表被分割成与数据库属性对应的多个子字符串。 分析属性中的子串以提供假定该属性的开始,中间和结束令牌拓扑的状态模型。 空标记考虑了空属性组件,状态复制允许错误的标记插入和错误。 一旦从干净的数据创建了模型,该过程会将输入记录分解或解析成令牌序列。 该过程然后通过将输入记录的令牌与从参考表导出的属性的状态模型进行比较来确定输入记录的最可能的分割。

    SYSTEMS AND METHODS FOR FINDING HIGH QUALITY CONTENT IN SOCIAL MEDIA
    7.
    发明申请
    SYSTEMS AND METHODS FOR FINDING HIGH QUALITY CONTENT IN SOCIAL MEDIA 审中-公开
    在社会媒体中发现高质量内容的系统和方法

    公开(公告)号:US20100036784A1

    公开(公告)日:2010-02-11

    申请号:US12187580

    申请日:2008-08-07

    IPC分类号: G06N5/00

    摘要: The present invention is directed towards systems and methods for identifying high quality content in a social media environment. The method according to one embodiment of the present invention comprises retrieving a content item and retrieving a plurality of quality features associated with said content item wherein said quality features comprise intrinsic, usage and relationship features. The method then performs an analysis of said content item against said quality features and generates a quality score based on said analysis.

    摘要翻译: 本发明涉及用于在社交媒体环境中识别高质量内容的系统和方法。 根据本发明的一个实施例的方法包括检索内容项目并检索与所述内容项目相关联的多个质量特征,其中所述质量特征包括固有的,使用和关系特征。 该方法然后根据所述质量特征执行对所述内容项的分析,并且基于所述分析生成质量得分。

    Segmentation of strings into structured records
    8.
    发明授权
    Segmentation of strings into structured records 有权
    将字符串分割成结构化记录

    公开(公告)号:US07627567B2

    公开(公告)日:2009-12-01

    申请号:US10825488

    申请日:2004-04-14

    IPC分类号: G06F17/30

    摘要: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.

    摘要翻译: 用于将字符串分割成用于数据库管理系统的组件的系统。 字符串记录的引用表被分割成与数据库属性对应的多个子字符串。 分析属性中的子串以提供假定该属性的开始,中间和结束令牌拓扑的状态模型。 空标记考虑了空属性组件,状态复制允许错误的标记插入和错误。 一旦从干净的数据创建了模型,该过程会将输入记录分解或解析成一系列令牌。 该过程然后通过将输入记录的令牌与从参考表导出的属性的状态模型进行比较来确定输入记录的最可能的分割。