SYSTEM AND METHOD FOR INVESTIGATING LARGE AMOUNTS OF DATA
    1.
    发明申请
    SYSTEM AND METHOD FOR INVESTIGATING LARGE AMOUNTS OF DATA 有权
    用于调查大量数据的系统和方法

    公开(公告)号:US20120330908A1

    公开(公告)日:2012-12-27

    申请号:US13167680

    申请日:2011-06-23

    IPC分类号: G06F17/30

    摘要: A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being stored to minimize storage requirements. The results of searches present input data in its original form. The input data may include access logs, call data records (CDRs), e-mail messages, etc. The system allows a data analyst to efficiently identify information of interest in a very large dynamic data set up to multiple petabytes in size. Once information of interest has been identified, that subset of the large data set can be imported into a dedicated or specialized data analysis system for an additional in-depth investigation and contextual analysis.

    摘要翻译: 提出了一种数据分析系统,用于从可能的多个异构输入数据源提供细粒度的低延迟访问大容量输入数据。 输入数据被解析,可选地变换,索引并存储在水平可扩展的键值数据存储库中,在该存储库中可以使用低延迟搜索进行访问。 输入数据可以在存储之前被压缩成块,以最小化存储要求。 搜索结果以原始形式显示输入数据。 输入数据可以包括访问日志,呼叫数据记录(CDR),电子邮件消息等。该系统允许数据分析者在大小上达到多PB的非常大的动态数据集中有效地识别感兴趣的信息。 一旦确定了感兴趣的信息,大数据集的该子集可以被导入到专门的或专门的数据分析系统中以进行进一步的深入调查和上下文分析。

    System and method for investigating large amounts of data
    2.
    发明授权
    System and method for investigating large amounts of data 有权
    用于调查大量数据的系统和方法

    公开(公告)号:US08799240B2

    公开(公告)日:2014-08-05

    申请号:US13167680

    申请日:2011-06-23

    IPC分类号: G06F17/00

    摘要: A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being stored to minimize storage requirements. The results of searches present input data in its original form. The input data may include access logs, call data records (CDRs), e-mail messages, etc. The system allows a data analyst to efficiently identify information of interest in a very large dynamic data set up to multiple petabytes in size. Once information of interest has been identified, that subset of the large data set can be imported into a dedicated or specialized data analysis system for an additional in-depth investigation and contextual analysis.

    摘要翻译: 提出了一种数据分析系统,用于从可能的多个异构输入数据源提供细粒度的低延迟访问大容量输入数据。 输入数据被解析,可选地变换,索引并存储在水平可扩展的键值数据存储库中,在该存储库中可以使用低延迟搜索进行访问。 输入数据可以在存储之前被压缩成块,以最小化存储要求。 搜索结果以原始形式显示输入数据。 输入数据可以包括访问日志,呼叫数据记录(CDR),电子邮件消息等。该系统允许数据分析者在大小上达到多PB的非常大的动态数据集中有效地识别感兴趣的信息。 一旦确定了感兴趣的信息,大数据集的该子集可以被导入到专门的或专门的数据分析系统中以进行进一步的深入调查和上下文分析。