Methods and apparatus for computing graph similarity via sequence similarity
    1.
    发明授权
    Methods and apparatus for computing graph similarity via sequence similarity 有权
    通过序列相似度计算图相似度的方法和装置

    公开(公告)号:US08417657B2

    公开(公告)日:2013-04-09

    申请号:US13099305

    申请日:2011-05-02

    IPC分类号: G06F17/00 G06N5/02

    CPC分类号: G06F17/30882

    摘要: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a sequence of tokens via a walk algorithm. The sequence is fingerprinted to form a set of shingles. The singles are compared to shingles for other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.

    摘要翻译: 本公开描述了用于识别和校正网络图中的异常的系统和方法。 网路图通过步行算法转换成令牌序列。 该序列被指纹化以形成一组带状疱疹。 将单曲与其他网络图的带状疱疹进行比较,以确定网络图之间的相似性。 然后执行操作以消除异常Web图形并修改控制Web映射的参数,以减少将来构建未来异常Web图形的可能性。

    Methods and apparatus for computing graph similarity via sequence similarity
    2.
    发明授权
    Methods and apparatus for computing graph similarity via sequence similarity 有权
    通过序列相似度计算图相似度的方法和装置

    公开(公告)号:US07996349B2

    公开(公告)日:2011-08-09

    申请号:US11951146

    申请日:2007-12-05

    IPC分类号: G06F17/00 G06N5/00

    CPC分类号: G06F17/30882

    摘要: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a sequence of tokens via a walk algorithm. The sequence is fingerprinted to form a set of shingles. The singles are compared to shingles for other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.

    摘要翻译: 本公开描述了用于识别和校正网络图中的异常的系统和方法。 网路图通过步行算法转换为令牌序列。 该序列被指纹化以形成一组带状疱疹。 将单曲与其他网络图的带状疱疹进行比较,以确定网络图之间的相似性。 然后执行操作以消除异常Web图形并修改控制Web映射的参数,以减少将来构建未来异常Web图形的可能性。

    OBJECT CLASSIFICATION USING TAXONOMIES
    3.
    发明申请
    OBJECT CLASSIFICATION USING TAXONOMIES 有权
    使用TAXONOMIES的对象分类

    公开(公告)号:US20100185577A1

    公开(公告)日:2010-07-22

    申请号:US12414065

    申请日:2009-03-30

    IPC分类号: G06N5/02

    CPC分类号: G06N99/005

    摘要: As provided herein objects from a source catalog, such as a provider's catalog, can be added to a target catalog, such as an enterprise master catalog, in a scalable manner utilizing catalog taxonomies. A baseline classifier determines probabilities for source objects to target catalog classes. Source objects can be assigned to those classes with probabilities that meet a desired threshold and meet a desired rate. A classification cost for target classes can be determined for respective unassigned source objects, which can comprise determining an assignment cost and separation cost for the source objects for respective desired target classes. The separation and assignment costs can be combined to determine the classification cost, and the unassigned source objects can be assigned to those classes having a desired classification cost.

    摘要翻译: 如本文所提供的,可以使用目录分类法将来自源目录的诸如提供者目录的对象以可扩展的方式添加到目标目录,例如企业主目录。 基准分类器确定源对象到目标目录类的概率。 可以将源对象分配给具有满足期望阈值且满足期望速率的概率的那些类。 可以针对相应的未分配的源对象来确定目标类别的分类成本,其可以包括确定用于各个期望目标类别的源对象的分配成本和分离成本。 分离和分配成本可以组合以确定分类成本,并且未分配的源对象可以被分配给具有期望的分类成本的那些类。

    Method and system for generating web pages for topics unassociated with a dominant URL
    4.
    发明授权
    Method and system for generating web pages for topics unassociated with a dominant URL 有权
    用于生成与主要URL无关的主题的网页的方法和系统

    公开(公告)号:US08799260B2

    公开(公告)日:2014-08-05

    申请号:US12971608

    申请日:2010-12-17

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for identifying topics that are unassociated with a dominant URL. A set of keywords associated with a topic is identified. A search log is scanned to identify search queries associated with the set of keywords. The identified search queries are grouped into clusters. Clusters associated with similar URLs are merged to generate an extended seed query string. The extended seed query string is analyzed to determine whether it relates to an existing dominant URL. If the extended seed query string is determined to be unassociated with an existing dominant URL, a web page associated with the topic may be generated.

    摘要翻译: 提供技术来识别与主要URL不相关的主题。 识别与主题相关联的一组关键字。 扫描搜索日志以标识与该组关键字相关联的搜索查询。 所识别的搜索查询被分组成簇。 与相似URL相关联的群集合并生成扩展种子查询字符串。 分析扩展种子查询字符串以确定它是否与现有的主要URL相关。 如果扩展种子查询字符串被确定为与现有的主要URL不相关联,则可以生成与该主题相关联的网页。

    METHOD AND SYSTEM FOR GENERATING WEB PAGES FOR TOPICS UNASSOCIATED WITH A DOMINANT URL
    5.
    发明申请
    METHOD AND SYSTEM FOR GENERATING WEB PAGES FOR TOPICS UNASSOCIATED WITH A DOMINANT URL 有权
    用于生成与主要URL不相关的主题的网页的方法和系统

    公开(公告)号:US20120158693A1

    公开(公告)日:2012-06-21

    申请号:US12971608

    申请日:2010-12-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for identifying topics that are unassociated with a dominant URL. A set of keywords associated with a topic is identified. A search log is scanned to identify search queries associated with the set of keywords. The identified search queries are grouped into clusters. Clusters associated with similar URLs are merged to generate an extended seed query string. The extended seed query string is analyzed to determine whether it relates to an existing dominant URL. If the extended seed query string is determined to be unassociated with an existing dominant URL, a web page associated with the topic may be generated.

    摘要翻译: 提供技术来识别与主要URL不相关的主题。 识别与主题相关联的一组关键字。 扫描搜索日志以标识与该组关键字相关联的搜索查询。 所识别的搜索查询被分组成簇。 与相似URL相关联的群集合并生成扩展种子查询字符串。 分析扩展种子查询字符串以确定它是否与现有的主要URL相关。 如果扩展种子查询字符串被确定为与现有的主要URL不相关联,则可以生成与该主题相关联的网页。

    Methods and apparatus for computing graph similarity via signature similarity
    6.
    发明授权
    Methods and apparatus for computing graph similarity via signature similarity 有权
    通过签名相似度计算图相似度的方法和装置

    公开(公告)号:US08019708B2

    公开(公告)日:2011-09-13

    申请号:US11951172

    申请日:2007-12-05

    IPC分类号: G06N5/00 G06F17/00

    CPC分类号: G06F17/30864

    摘要: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a set of weighted features. The set of weighted features are then transformed into a signature via a SimHash algorithm. The signature is compared to the signature of one or more other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.

    摘要翻译: 本公开描述了用于识别和校正网络图中的异常的系统和方法。 网络图被转换成一组加权特征。 然后通过SimHash算法将该组加权特征转换为签名。 将签名与一个或多个其他网络图形的签名进行比较,以确定网络图形之间的相似性。 然后执行操作以消除异常Web图形并修改控制Web映射的参数,以减少将来构建未来异常Web图形的可能性。

    METHODS AND APPARATUS FOR COMPUTING GRAPH SIMILARITY VIA SEQUENCE SIMILARITY
    7.
    发明申请
    METHODS AND APPARATUS FOR COMPUTING GRAPH SIMILARITY VIA SEQUENCE SIMILARITY 有权
    通过序列相似性计算图形相似性的方法和装置

    公开(公告)号:US20110302147A1

    公开(公告)日:2011-12-08

    申请号:US13099305

    申请日:2011-05-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30882

    摘要: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a sequence of tokens via a walk algorithm. The sequence is fingerprinted to form a set of shingles. The singles are compared to shingles for other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.

    摘要翻译: 本公开描述了用于识别和校正网络图中的异常的系统和方法。 网路图通过步行算法转换为令牌序列。 该序列被指纹化以形成一组带状疱疹。 将单曲与其他网络图的带状疱疹进行比较,以确定网络图之间的相似性。 然后执行操作以消除异常Web图形并修改控制Web映射的参数,以减少将来构建未来异常Web图形的可能性。

    METHODS AND APPARATUS FOR COMPUTING GRAPH SIMILARITY VIA SIGNATURE SIMILARITY
    8.
    发明申请
    METHODS AND APPARATUS FOR COMPUTING GRAPH SIMILARITY VIA SIGNATURE SIMILARITY 有权
    用于通过签名相似性计算图形相似性的方法和装置

    公开(公告)号:US20090150371A1

    公开(公告)日:2009-06-11

    申请号:US11951172

    申请日:2007-12-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a set of weighted features. The set of weighted features are then transformed into a signature via a SimHash algorithm. The signature is compared to the signature of one or more other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.

    摘要翻译: 本公开描述了用于识别和校正网络图中的异常的系统和方法。 网络图被转换成一组加权特征。 然后通过SimHash算法将该组加权特征转换为签名。 将签名与一个或多个其他网络图形的签名进行比较,以确定网络图形之间的相似性。 然后执行操作以消除异常Web图形并修改控制Web映射的参数,以减少将来构建未来异常Web图形的可能性。

    Methods and apparatus for computing graph similarity via signature similarity
    9.
    发明授权
    Methods and apparatus for computing graph similarity via signature similarity 有权
    通过签名相似度计算图相似度的方法和装置

    公开(公告)号:US08880449B2

    公开(公告)日:2014-11-04

    申请号:US12236682

    申请日:2008-09-24

    IPC分类号: G06N5/00 G06F1/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a set of weighted features. The set of weighted features are then transformed into a signature via a SimHash algorithm. The signature is compared to the signature of one or more other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.

    摘要翻译: 本公开描述了用于识别和校正网络图中的异常的系统和方法。 网络图被转换成一组加权特征。 然后通过SimHash算法将该组加权特征转换为签名。 将签名与一个或多个其他网络图形的签名进行比较,以确定网络图形之间的相似性。 然后执行操作以消除异常Web图形并修改控制Web映射的参数,以减少将来构建未来异常Web图形的可能性。

    Object classification using taxonomies
    10.
    发明授权
    Object classification using taxonomies 有权
    使用分类法的对象分类

    公开(公告)号:US08275726B2

    公开(公告)日:2012-09-25

    申请号:US12414065

    申请日:2009-03-30

    CPC分类号: G06N99/005

    摘要: As provided herein objects from a source catalog, such as a provider's catalog, can be added to a target catalog, such as an enterprise master catalog, in a scalable manner utilizing catalog taxonomies. A baseline classifier determines probabilities for source objects to target catalog classes. Source objects can be assigned to those classes with probabilities that meet a desired threshold and meet a desired rate. A classification cost for target classes can be determined for respective unassigned source objects, which can comprise determining an assignment cost and separation cost for the source objects for respective desired target classes. The separation and assignment costs can be combined to determine the classification cost, and the unassigned source objects can be assigned to those classes having a desired classification cost.

    摘要翻译: 如本文所提供的,可以使用目录分类法将来自源目录的诸如提供者目录的对象以可扩展的方式添加到目标目录,例如企业主目录。 基准分类器确定源对象到目标目录类的概率。 可以将源对象分配给具有满足期望阈值且满足期望速率的概率的那些类。 可以针对相应的未分配的源对象来确定目标类别的分类成本,其可以包括确定用于各个期望目标类别的源对象的分配成本和分离成本。 分离和分配成本可以组合以确定分类成本,并且未分配的源对象可以被分配给具有期望的分类成本的那些类。