GENERATING ANONYMOUS LOG ENTRIES
    1.
    发明申请
    GENERATING ANONYMOUS LOG ENTRIES 有权
    产生匿名登录

    公开(公告)号:US20090198746A1

    公开(公告)日:2009-08-06

    申请号:US12024989

    申请日:2008-02-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30

    摘要: Assigning session identifications to log entries and generating anonymous log entries are provided. In order to balance users' privacy concerns with the need for analysis of the log entries to provide high quality search results, non-user-specific data fields, such as a user's location (e.g., city, state, and latitude/longitude) and connection speed, are inserted into the log entries, and user-specific data fields, such as the IP address and cookie identifications, are deleted from the log entries. In addition or alternatively, prior to anonymization of the log entries, session identifications are assigned to identified groups of log entries. The groups are identified based on factors such as the user's identification, the IP address, the time of search, and differences between the search terms used in the search queries.

    摘要翻译: 为会话标识分配日志条目和生成匿名日志条目。 为了平衡用户的隐私问题,需要分析日志条目以提供高质量的搜索结果,非用户特定的数据字段(例如用户的位置(例如城市,州和纬度/经度))和 连接速度被插入到日志条目中,并且从日志条目中删除用户特定的数据字段,例如IP地址和cookie标识。 另外或替代地,在匿名日志条目之前,将会话标识分配给所识别的日志条目组。 基于用户的识别,IP地址,搜索时间以及搜索查询中使用的搜索词之间的差异来确定组。

    GENERATING ANONYMOUS LOG ENTRIES
    2.
    发明申请
    GENERATING ANONYMOUS LOG ENTRIES 审中-公开
    产生匿名登录

    公开(公告)号:US20110167043A1

    公开(公告)日:2011-07-07

    申请号:US13050706

    申请日:2011-03-17

    IPC分类号: G06F17/30

    CPC分类号: G06F16/00

    摘要: Assigning session identifications to log entries and generating anonymous log entries are provided. In order to balance users' privacy concerns with the need for analysis of the log entries to provide high quality search results, non-user-specific data fields, such as a user's location (e.g., city, state, and latitude/longitude) and connection speed, are inserted into the log entries, and user-specific data fields, such as the IP address and cookie identifications, are deleted from the log entries. In addition or alternatively, prior to anonymization of the log entries, session identifications are assigned to identified groups of log entries. The groups are identified based on factors such as the user's identification, the IP address, the time of search, and differences between the search terms used in the search queries.

    摘要翻译: 为会话标识分配日志条目和生成匿名日志条目。 为了平衡用户的隐私问题,需要分析日志条目以提供高质量的搜索结果,非用户特定的数据字段(例如用户的位置(例如城市,州和纬度/经度))和 连接速度被插入到日志条目中,并且从日志条目中删除用户特定的数据字段,例如IP地址和cookie标识。 另外或替代地,在匿名日志条目之前,将会话标识分配给所识别的日志条目组。 基于用户的识别,IP地址,搜索时间以及搜索查询中使用的搜索词之间的差异来确定组。

    LARGE SCALE SEARCH BOT DETECTION
    3.
    发明申请
    LARGE SCALE SEARCH BOT DETECTION 审中-公开
    大规模搜索检测

    公开(公告)号:US20110208714A1

    公开(公告)日:2011-08-25

    申请号:US12708541

    申请日:2010-02-19

    IPC分类号: G06F17/30 G06F21/00

    摘要: A framework may be used for identifying low-rate search bot traffic within query logs by capturing groups of distributed, coordinated search bots. Search log data may be input to a history-based anomaly detection engine to determine if query-click pairs associated with a query are suspicious in view of historical query-click pairs for the query. Users associated with suspicious query-click pairs may be input to a matrix-based bot detection engine to determine correlations between queries submitted by the users. Those users indicating strong correlations may be categorized as bots, whereas those who do not may be categorized as part of flash crowd traffic.

    摘要翻译: 可以通过捕获分布式,协调的搜索机器人组来识别查询日志中的低速搜索bot流量的框架。 搜索日志数据可以被输入到基于历史的异常检测引擎,以鉴于查询的历史查询 - 点击对来确定与查询相关联的查询 - 点击对是否是可疑的。 与可疑查询点击对相关联的用户可以输入到基于矩阵的机器人检测引擎,以确定用户提交的查询之间的相关性。 指示强相关性的用户可能被归类为机器人,而不能被分类为闪存人群流量的一部分的那些用户。

    EVALUATING THE RANKING QUALITY OF A RANKED LIST
    4.
    发明申请
    EVALUATING THE RANKING QUALITY OF A RANKED LIST 有权
    评估排名列表的排名质量

    公开(公告)号:US20100082566A1

    公开(公告)日:2010-04-01

    申请号:US12243937

    申请日:2008-10-01

    IPC分类号: G06F17/30

    摘要: The ranking quality of a ranked list may be evaluated. In an example embodiment, a method is implemented by a system to access log data, ascertain which entries of a ranked list are skipped, and determine a ranking quality metric from the skipped entries. More specifically, log data that reflects user interactions with a ranked list having multiple entries is accessed. The user interactions include at least indications of which of the multiple entries are selected entries. It is ascertained which entries of the multiple entries of the ranked list are skipped entries based on the selected entries. The ranking quality metric for the ranked list is determined responsive to the skipped entries.

    摘要翻译: 可以评估排名列表的排名质量。 在一个示例实施例中,系统通过系统实现访问日志数据的方法,确定排列列表的哪些条目被跳过,并且从跳过的条目确定排序质量度量。 更具体地,访问反映与具有多个条目的排名列表的用户交互的日志数据。 用户交互包括至少指示多个条目中的哪一个是选择的条目。 基于所选择的条目,确定排序列表的多个条目的哪些条目被跳过条目。 响应于跳过的条目来确定排名列表的排名质量度量。