System and method for adaptive content processing and classification in a high-availability environment
    1.
    发明授权
    System and method for adaptive content processing and classification in a high-availability environment 失效
    在高可用性环境下进行自适应内容处理和分类的系统和方法

    公开(公告)号:US07966270B2

    公开(公告)日:2011-06-21

    申请号:US11678075

    申请日:2007-02-23

    IPC分类号: G06F15/18 G06E3/00

    CPC分类号: G06F17/30067

    摘要: The embodiments of the invention provide a systems, methods, etc. for adaptive content processing and classification in a high-availability environment. More specifically, a system is provided having a plurality of processing engines and at least one server that classifies data objects on the computer system. The classification includes analyzing the data objects for the presence of a type of content. This can include assigning a score corresponding to the amount of the type of content in each of the data objects. Moreover, the server can remove a data object from the computer system based on the results of the analyzing. The results of the analyzing are stored and the computer system is updated with feedback information. This can include allowing a user to review the results of the analyzing and aggregating reviews of the user into the feedback information.

    摘要翻译: 本发明的实施例为高可用性环境中的自适应内容处理和分类提供了系统,方法等。 更具体地,提供了具有多个处理引擎和至少一个服务器的系统,该服务器对计算机系统上的数据对象进行分类。 分类包括分析数据对象的存在类型的内容。 这可以包括分配对应于每个数据对象中的内容类型的量的分数。 此外,服务器可以根据分析结果从计算机系统中删除数据对象。 存储分析结果,并更新计算机系统的反馈信息。 这可以包括允许用户将分析的结果和将用户的评论聚合到反馈信息中。

    Content monitoring in a high volume on-line community application
    2.
    发明授权
    Content monitoring in a high volume on-line community application 失效
    大量在线社区应用程序中的内容监控

    公开(公告)号:US07523138B2

    公开(公告)日:2009-04-21

    申请号:US11622112

    申请日:2007-01-11

    IPC分类号: G06F17/00

    摘要: Disclosed are embodiments a system and method for managing an on-line community. Electronic postings are pre-screened based on one or more metrics to determine a risk value indicative of the likelihood that an individual posting contains objectionable content. These metrics are based on the profile of a poster, including various parameters of the poster and/or the poster's record of objectionable content postings. These metrics can also be based on the social network profile of a poster, including the average of various parameters of other users in the poster's social network and/or a compiled record of objectionable content postings of other users in the poster's social network. If the risk value is relatively low, the posting can be displayed to the on-line community immediately. If the risk value is relatively high, display of the posting can be delayed until further content analysis is completed. Finally, if the risk value is above a predetermined high risk threshold value, the posting can be removed automatically.

    摘要翻译: 公开了一种用于管理在线社区的系统和方法的实施例。 基于一个或多个指标对电子信息进行预先筛选,以确定指示个人发布包含令人反感的内容的可能性的风险值。 这些指标是基于海报的简介,包括海报的各种参数和/或海报的不良内容记录的记录。 这些指标还可以基于海报的社交网络概况,包括海报社交网络中其他用户的各种参数的平均值和/或海报社交网络中其他用户的令人反感的内容发布的编译记录。 如果风险值相对较低,则可以立即将发布显示给在线社区。 如果风险值相对较高,则可以延迟显示显示,直到进一步的内容分析完成。 最后,如果风险值高于预定的高风险阈值,则可以自动移除过帐。

    SYSTEM AND METHOD FOR ADAPTIVE CONTENT PROCESSING AND CLASSIFICATION IN A HIGH-AVAILABILITY ENVIRONMENT
    3.
    发明申请
    SYSTEM AND METHOD FOR ADAPTIVE CONTENT PROCESSING AND CLASSIFICATION IN A HIGH-AVAILABILITY ENVIRONMENT 失效
    高可用性环境中自适应内容处理和分类的系统和方法

    公开(公告)号:US20080208893A1

    公开(公告)日:2008-08-28

    申请号:US11678075

    申请日:2007-02-23

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30067

    摘要: The embodiments of the invention provide a systems, methods, etc. for adaptive content processing and classification in a high-availability environment. More specifically, a system is provided having a plurality of processing engines and at least one server that classifies data objects on the computer system. The classification includes analyzing the data objects for the presence of a type of content. This can include assigning a score corresponding to the amount of the type of content in each of the data objects. Moreover, the server can remove a data object from the computer system based on the results of the analyzing. The results of the analyzing are stored and the computer system is updated with feedback information. This can include allowing a user to review the results of the analyzing and aggregating reviews of the user into the feedback information.

    摘要翻译: 本发明的实施例为高可用性环境中的自适应内容处理和分类提供了系统,方法等。 更具体地,提供了具有多个处理引擎和至少一个服务器的系统,该服务器对计算机系统上的数据对象进行分类。 分类包括分析数据对象的存在类型的内容。 这可以包括分配对应于每个数据对象中的内容类型的量的分数。 此外,服务器可以根据分析结果从计算机系统中删除数据对象。 存储分析结果,并更新计算机系统的反馈信息。 这可以包括允许用户将分析的结果和将用户的评论聚合到反馈信息中。

    CONTENT MONITORING IN A HIGH VOLUME ON-LINE COMMUNITY APPLICATION

    公开(公告)号:US20080172412A1

    公开(公告)日:2008-07-17

    申请号:US11622112

    申请日:2007-01-11

    IPC分类号: G06F7/00 G06F3/00

    摘要: Disclosed are embodiments a system and method for managing an on-line community. Electronic postings are pre-screened based on one or more metrics to determine a risk value indicative of the likelihood that an individual posting contains objectionable content. These metrics are based on the profile of a poster, including various parameters of the poster and/or the poster's record of objectionable content postings. These metrics can also be based on the social network profile of a poster, including the average of various parameters of other users in the poster's social network and/or a compiled record of objectionable content postings of other users in the poster's social network. If the risk value is relatively low, the posting can be displayed to the on-line community immediately. If the risk value is relatively high, display of the posting can be delayed until further content analysis is completed. Finally, if the risk value is above a predetermined high risk threshold value, the posting can be removed automatically.

    CONTENT MONITORING IN A HIGH VOLUME ON-LINE COMMUNITY APPLICATION
    5.
    发明申请
    CONTENT MONITORING IN A HIGH VOLUME ON-LINE COMMUNITY APPLICATION 审中-公开
    内容监控在高容量在线社区应用程序

    公开(公告)号:US20080177834A1

    公开(公告)日:2008-07-24

    申请号:US12055618

    申请日:2008-03-26

    IPC分类号: G06F15/16

    摘要: Disclosed are embodiments a system and method for managing an on-line community. Electronic postings are pre-screened based on one or more metrics to determine a risk value indicative of the likelihood that an individual posting contains objectionable content. These metrics are based on the profile of a poster, including various parameters of the poster and/or the poster's record of objectionable content postings. These metrics can also be based on the social network profile of a poster, including the average of various parameters of other users in the poster's social network and/or a compiled record of objectionable content postings of other users in the poster's social network. If the risk value is relatively low, the posting can be displayed to the on-line community immediately. If the risk value is relatively high, display of the posting can be delayed until further content analysis is completed. Finally, if the risk value is above a predetermined high risk threshold value, the posting can be removed automatically.

    摘要翻译: 公开了一种用于管理在线社区的系统和方法的实施例。 基于一个或多个指标对电子信息进行预先筛选,以确定指示个人发布包含令人反感的内容的可能性的风险值。 这些指标是基于海报的简介,包括海报的各种参数和/或海报的不良内容记录的记录。 这些指标还可以基于海报的社交网络概况,包括海报社交网络中其他用户的各种参数的平均值和/或海报社交网络中其他用户的不良内容发布的编译记录。 如果风险值相对较低,则可以立即将发布显示给在线社区。 如果风险值相对较高,则可以延迟显示显示,直到进一步的内容分析完成。 最后,如果风险值高于预定的高风险阈值,则可以自动移除过帐。

    System and method for bulk processing of semi-structured result streams from multiple resources
    6.
    发明授权
    System and method for bulk processing of semi-structured result streams from multiple resources 有权
    用于批量处理来自多个资源的半结构化结果流的系统和方法

    公开(公告)号:US07877484B2

    公开(公告)日:2011-01-25

    申请号:US10830839

    申请日:2004-04-23

    IPC分类号: G06F15/16 H04J12/28 H04J3/26

    CPC分类号: G06F17/30929

    摘要: A system and associated method for bulk processing of semi-structured results streams from many different resources ingest bytes, parse as many bytes as practical, and return to process additional bytes. The system processes network packets as they arrive from a computing resource, creating intermediate results. The intermediate results are held in a stack until sufficient information is accumulated. The system then merges the intermediate results to form a single document model. As network packets at one connection are consumed by the system, the system can select another connection at which packets are waiting for processing. The processing of a result at a connection can be interrupted while the system processes the results at another connection. In this manner, the system is able to utilize one thread to process many incoming results in parallel.

    摘要翻译: 一种用于批量处理来自许多不同资源的半结构化结果流的系统和相关方法,可实现解析尽可能多的字节,并返回到处理其他字节。 系统在网络数据包从计算资源到达时处理,创建中间结果。 中间结果保持在堆栈中,直到有足够的信息被累积。 然后,系统将中间结果合并以形成单个文档模型。 当系统消耗一个连接的网络数据包时,系统可以选择另一个数据包正在等待处理的连接。 当系统在另一个连接处理结果时,可以中断连接处理结果。 以这种方式,系统能够利用一个线程并行处理许多输入结果。

    DATA INGEST OPTIMIZATION
    7.
    发明申请
    DATA INGEST OPTIMIZATION 审中-公开
    数据优化

    公开(公告)号:US20120330972A1

    公开(公告)日:2012-12-27

    申请号:US13604096

    申请日:2012-09-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30899

    摘要: Methods and systems for optimizing the retrieval of data from multiple sources are described. A slot map including slots for the storage of data elements can be obtained. The data elements associated with the slots can be prioritized by weighting values with costs of retrieving the data elements from respective data sources. Each value can be associated with a different data element and can indicate a respective degree of importance of the associated data element. Further, the systems and methods can direct the retrieval of data elements from the respective data sources in an order in accordance with the priority of the data elements to optimize the quality of data obtainable within a critical time constraint. In addition, the retrieved data elements can be stored in corresponding slots on a storage medium.

    摘要翻译: 描述了用于优化从多个源检索数据的方法和系统。 可以获得包括用于存储数据元素的时隙的时隙映射。 与时隙相关联的数据元素可以通过以相应数据源检索数据元素为代价的加权值进行优先化。 每个值可以与不同的数据元素相关联,并且可以指示相关联的数据元素的相应重要程度。 此外,系统和方法可以按照数据元素的优先级顺序从相应的数据源中取出数据元素的检索,以优化在关键时间约束内可获得的数据的质量。 另外,检索到的数据元素可以存储在存储介质上的相应时隙中。

    DATA DEDUPLICATION FOR STREAMING SEQUENTIAL DATA STORAGE APPLICATIONS
    8.
    发明申请
    DATA DEDUPLICATION FOR STREAMING SEQUENTIAL DATA STORAGE APPLICATIONS 有权
    用于流式排序数据存储应用的数据分配

    公开(公告)号:US20110185149A1

    公开(公告)日:2011-07-28

    申请号:US12695127

    申请日:2010-01-27

    IPC分类号: G06F12/10 G06F12/00

    摘要: Data deduplication compression in a streaming storage application, is provided. The disclosed deduplication process provides a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium. One implementation involves compressing fully sequential data stored in a data repository to a sequential streaming storage, by: splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments.

    摘要翻译: 提供流存储应用中的重复数据删除压缩技术。 所公开的重复数据删除过程提供重复数据删除存档,其能够将存档存储到流存储介质和从流存储介质提取。 一个实施方式涉及通过以下方式将存储在数据存储库中的完全顺序数据压缩到顺序流存储:将完全顺序数据分解成数据块; 每个数据块的散列内容并将每个散列与用于匹配的存储器内查找表进行比较,所述存储器内查找表存储在完全顺序数据的压缩期间遇到的所有散列; 对于没有散列匹配的每个数据块,将数据块添加为用于压缩完全顺序数据的新数据块; 以及使用存储器内查找表将重复数据块编码成数据段。

    Anonymization of Unstructured Data
    9.
    发明申请
    Anonymization of Unstructured Data 审中-公开
    非结构化数据的匿名化

    公开(公告)号:US20110113049A1

    公开(公告)日:2011-05-12

    申请号:US12614554

    申请日:2009-11-09

    IPC分类号: G06F17/30

    CPC分类号: G06F21/6254 G16H10/60

    摘要: A method for anonymization of unstructured data comprises determining structured references in the unstructured data; populating a table with the structured references; anonymizing the structured references in the table using ontological analysis; and rewriting the structured references in the unstructured data with the anonymized structured references from the table to produce anonymized data. A system for anonymizing unstructured data comprises an entity spotting module configured to determine structured references in the unstructured data and populate a table with the determined structured references; an anonymization module configured to anonymizing the structured references in the table using ontological analysis; and a replacement module configured to rewrite the structured references in the unstructured data with the anonymized structured references from the table to produce anonymized data.

    摘要翻译: 非结构化数据的匿名化方法包括确定非结构化数据中的结构化引用; 用结构化引用填充表; 使用本体论分析对表中的结构化引用进行匿名化; 并用来自该表的匿名结构化引用重写非结构化数据中的结构化引用以产生匿名数据。 用于匿名非结构化数据的系统包括:身份发现模块,被配置为确定所述非结构化数据中的结构化引用,并且使用所确定的结构化引用填充表; 匿名化模块,被配置为使用本体论分析对所述表中的结构化引用进行匿名化; 以及替换模块,被配置为使用来自该表的匿名结构化引用来重写非结构化数据中的结构化引用以产生匿名数据。

    Data ingest optimization
    10.
    发明授权
    Data ingest optimization 有权
    数据摄取优化

    公开(公告)号:US09589065B2

    公开(公告)日:2017-03-07

    申请号:US13604096

    申请日:2012-09-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30899

    摘要: Methods and systems for optimizing the retrieval of data from multiple sources are described. A slot map including slots for the storage of data elements can be obtained. The data elements associated with the slots can be prioritized by weighting values with costs of retrieving the data elements from respective data sources. Each value can be associated with a different data element and can indicate a respective degree of importance of the associated data element. Further, the systems and methods can direct the retrieval of data elements from the respective data sources in an order in accordance with the priority of the data elements to optimize the quality of data obtainable within a critical time constraint. In addition, the retrieved data elements can be stored in corresponding slots on a storage medium.

    摘要翻译: 描述了用于优化从多个源检索数据的方法和系统。 可以获得包括用于存储数据元素的时隙的时隙映射。 与时隙相关联的数据元素可以通过以相应数据源检索数据元素为代价的加权值进行优先化。 每个值可以与不同的数据元素相关联,并且可以指示相关联的数据元素的相应重要程度。 此外,系统和方法可以按照数据元素的优先级顺序从相应的数据源中取出数据元素的检索,以优化在关键时间约束内可获得的数据的质量。 另外,检索到的数据元素可以存储在存储介质上的相应时隙中。