-
公开(公告)号:US11546364B2
公开(公告)日:2023-01-03
申请号:US17003398
申请日:2020-08-26
发明人: David Cohen , Jason Ma , Bing Jie Fu , Ilya Nepomnyashchiy , Steven Berler , Alex Smaliy , Jack Grossman , James Thompson , Julia Boortz , Matthew Sprague , Parvathy Menon , Michael Kross , Michael Harris , Adam Borochoff
摘要: Embodiments of the present disclosure relate to a data analysis system that may automatically generate memory-efficient clustered data structures, automatically analyze those clustered data structures, and provide results of the automated analysis in an optimized way to an analyst. The automated analysis of the clustered data structures (also referred to herein as data clusters) may include an automated application of various criteria or rules so as to generate a compact, human-readable analysis of the data clusters. The human-readable analyses (also referred to herein as “summaries” or “conclusions”) of the data clusters may be organized into an interactive user interface so as to enable an analyst to quickly navigate among information associated with various data clusters and efficiently evaluate those data clusters in the context of, for example, a fraud investigation. Embodiments of the present disclosure also relate to automated scoring of the clustered data structures.
-
公开(公告)号:US20220239672A1
公开(公告)日:2022-07-28
申请号:US17658893
申请日:2022-04-12
发明人: Harkirat Singh , Geoffrey Stowe , Brendan Weickert , Matthew Sprague , Michael Kross , Adam Borochoff , Parvathy Menon , Michael Harris
IPC分类号: H04L9/40 , G06Q40/00 , G06F16/2457 , G06F16/23 , G06F16/242 , G06F16/28 , G06F16/9535 , G06Q10/10 , G06Q40/02 , G06F16/335 , G06F16/35 , G06F16/26 , G06F16/2458 , G06Q20/40 , G06Q30/00 , G06Q20/38
摘要: In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, tax fraud detection, beaconing malware detection, malware user-agent detection, and/or activity trend detection, among various others.
-
公开(公告)号:US10264014B2
公开(公告)日:2019-04-16
申请号:US14928512
申请日:2015-10-30
发明人: Geoff Stowe , Harkirat Singh , Stefan Bach , Matthew Sprague , Michael Kross , Adam Borochoff , Parvathy Menon , Michael Harris
摘要: In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, tax fraud detection, beaconing malware detection, malware user-agent detection, and/or activity trend detection, among various others.
-
公开(公告)号:US09946738B2
公开(公告)日:2018-04-17
申请号:US15287715
申请日:2016-10-06
发明人: Jacob Meacham , Michael Harris , Gustav Brodman , Lynn Cuthriell , Hannah Korus , Brian Toth , Jonathan Hsiao , Mark Elliot , Brian Schimpf , Michael Garland , Evelyn Nguyen
IPC分类号: G06F17/30
CPC分类号: G06F17/30309 , G06F11/1451 , G06F17/30227 , G06F17/3023 , G06F17/30292 , G06F17/30371 , G06F17/3038 , G06F17/30563
摘要: A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.
-
公开(公告)号:US20160006749A1
公开(公告)日:2016-01-07
申请号:US14486991
申请日:2014-09-15
发明人: David Cohen , Jason Ma , Bing Jie Fu , Ilya Nepomnyashchiy , Steven Berler , Alex Smaliy , Jack Grossman , James Thompson , Julia Boortz , Matthew Sprague , Parvathy Menon , Michael Kross , Michael Harris , Adam Borochoff
CPC分类号: H04L63/1425 , G06F17/30598 , G06Q40/12 , H04L63/1408 , H04L63/145
摘要: Embodiments of the present disclosure relate to a data analysis system that may automatically generate memory-efficient clustered data structures, automatically analyze those clustered data structures, and provide results of the automated analysis in an optimized way to an analyst. The automated analysis of the clustered data structures (also referred to herein as data clusters) may include an automated application of various criteria or rules so as to generate a compact, human-readable analysis of the data clusters. The human-readable analyses (also referred to herein as “summaries” or “conclusions”) of the data clusters may be organized into an interactive user interface so as to enable an analyst to quickly navigate among information associated with various data clusters and efficiently evaluate those data clusters in the context of, for example, a fraud investigation. Embodiments of the present disclosure also relate to automated scoring of the clustered data structures.
摘要翻译: 本公开的实施例涉及一种数据分析系统,其可以自动生成存储器有效的集群数据结构,自动分析这些集群数据结构,并以优化的方式向分析者提供自动化分析的结果。 集群数据结构(本文中也称为数据集群)的自动化分析可以包括各种标准或规则的自动应用,以便生成数据集群的紧凑的,人类可读的分析。 可以将数据集群的可读分析(也称为“摘要”或“结论”)组织成交互式用户界面,以使分析人员能够在与各种数据集群相关联的信息之间快速导航,并有效地评估 这些数据集群在例如欺诈调查的背景下。 本公开的实施例还涉及聚类数据结构的自动评分。
-
公开(公告)号:US20150261817A1
公开(公告)日:2015-09-17
申请号:US14726211
申请日:2015-05-29
发明人: Michael Harris , John Carrino , Eric Wong
IPC分类号: G06F17/30
CPC分类号: G06F17/30451 , G06F17/30442 , G06F17/30463 , G06F17/30477 , G06F17/30864
摘要: A fair scheduling system with methodology for fairly scheduling queries for execution by a database management system. The system obtains query jobs for execution by the database management system and cost estimates to execute the query jobs. The cost estimate can be a number of results the query is expected to return. Based on the cost estimates, the system causes the database management system to execute the query jobs as separately sub-query tasks in a round-robin fashion. By doing so, the execution latency of “low cost” query jobs that return few results is reduced when the query jobs are concurrently executed with “high cost” query jobs that return a large number of results.
摘要翻译: 一个公平的调度系统,具有用于公平调度数据库管理系统执行查询的方法。 系统获取数据库管理系统执行的查询作业和执行查询作业的成本估算。 成本估算可以是查询预期返回的一些结果。 基于成本估算,系统使数据库管理系统以循环方式单独执行查询任务。 通过这样做,当查询作业同时执行“高成本”查询作业返回大量结果时,返回少量结果的“低成本”查询作业的执行延迟将减少。
-
公开(公告)号:US08788405B1
公开(公告)日:2014-07-22
申请号:US13968265
申请日:2013-08-15
IPC分类号: G06Q40/00
CPC分类号: G06F17/3053 , G06F17/30345 , G06F17/30412 , G06F17/30539 , G06F17/30572 , G06F17/30598 , G06F17/30601 , G06F17/30604 , G06F17/30699 , G06F17/30705 , G06F17/3071 , G06F17/30867 , G06Q10/10 , G06Q20/4016 , G06Q30/0185 , G06Q40/00 , G06Q40/02 , G06Q40/025 , G06Q40/10 , G06Q40/123
摘要: Techniques are disclosed for generating a collection of clusters of related data from a seed. Doing so may generally include retrieving a seed and adding the seed to a first cluster and include retrieving a cluster strategy referencing one or more data bindings. Each data binding specifies a search protocol for retrieving data. For each of the one or more data bindings, data parameters input to the search protocol are identified, the search protocol is performed using the identified data parameters, and data returned by the search protocol is evaluated for inclusion in the first cluster.
摘要翻译: 公开了用于从种子生成相关数据集合的集合的技术。 这样做通常可以包括检索种子并将种子添加到第一群集,并且包括检索引用一个或多个数据绑定的群集策略。 每个数据绑定指定用于检索数据的搜索协议。 对于一个或多个数据绑定中的每一个,识别输入到搜索协议的数据参数,使用所识别的数据参数执行搜索协议,并且评估由搜索协议返回的数据以包含在第一集群中。
-
公开(公告)号:US20240338348A1
公开(公告)日:2024-10-10
申请号:US18745838
申请日:2024-06-17
发明人: Allen Chang , Christopher Male , David Cohen , Dragos-Florian Ristache , Danielle Kramer , John Garrod , Michael Harris , Ryan Zheng , Stephen Freiberg
CPC分类号: G06F16/214 , G06F16/254 , G06F16/258
摘要: Systems and methods including a framework for migration of live data. The method may comprised, by one or more hardware processors executing program instructions, receiving, at a migration proxy of the framework, code for reading data and writing data compatible with each of a plurality of states of a migration of data in a data store, wherein a service is at least intermittently reading data from and writing data to the data store; determining, by a migration runner of the framework, to perform the migration of the data; initiating, by the migration runner, the migration of the data, wherein the migration comprises a plurality of stages; storing, as the migration progresses through the plurality of stages, and at a migration data store of the framework, a current stage of the migration; and during the migration, using the migration proxy to read data from and write data to the data store.
-
公开(公告)号:US12050567B2
公开(公告)日:2024-07-30
申请号:US17818272
申请日:2022-08-08
发明人: Allen Chang , Christopher Male , David Cohen , Dragos-Florian Ristache , Danielle Kramer , John Garrod , Michael Harris , Ryan Zheng , Stephen Freiberg
CPC分类号: G06F16/214 , G06F16/254 , G06F16/258
摘要: Systems and methods including a framework for migration of live data. The method may comprised, by one or more hardware processors executing program instructions, receiving, at a migration proxy of the framework, code for reading data and writing data compatible with each of a plurality of states of a migration of data in a data store, wherein a service is at least intermittently reading data from and writing data to the data store; determining, by a migration runner of the framework, to perform the migration of the data; initiating, by the migration runner, the migration of the data, wherein the migration comprises a plurality of stages; storing, as the migration progresses through the plurality of stages, and at a migration data store of the framework, a current stage of the migration; and during the migration, using the migration proxy to read data from and write data to the data store.
-
公开(公告)号:US11895137B2
公开(公告)日:2024-02-06
申请号:US18061195
申请日:2022-12-02
发明人: David Cohen , Jason Ma , Bing Jie Fu , Ilya Nepomnyashchiy , Steven Berler , Alex Smaliy , Jack Grossman , James Thompson , Julia Boortz , Matthew Sprague , Parvathy Menon , Michael Kross , Michael Harris , Adam Borochoff
CPC分类号: H04L63/1425 , G06F16/285 , G06Q40/12 , H04L63/145 , H04L63/1408
摘要: Embodiments of the present disclosure relate to a data analysis system that may automatically generate memory-efficient clustered data structures, automatically analyze those clustered data structures, and provide results of the automated analysis in an optimized way to an analyst. The automated analysis of the clustered data structures (also referred to herein as data clusters) may include an automated application of various criteria or rules so as to generate a compact, human-readable analysis of the data clusters. The human-readable analyses (also referred to herein as “summaries” or “conclusions”) of the data clusters may be organized into an interactive user interface so as to enable an analyst to quickly navigate among information associated with various data clusters and efficiently evaluate those data clusters in the context of, for example, a fraud investigation. Embodiments of the present disclosure also relate to automated scoring of the clustered data structures.
-
-
-
-
-
-
-
-
-