- 专利标题: RECORD PROFILING FOR DATASET SAMPLING
-
申请号: US15338161申请日: 2016-10-28
-
公开(公告)号: US20180121525A1公开(公告)日: 2018-05-03
- 发明人: Daniel G. Simmons , Kevin David James Grealish , Sumit Gulwani , Ranvijay Kumar , Kevin Michael Ellis , Saswat Padhi
- 申请人: Microsoft Technology Licensing, LLC
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人地址: US WA Redmond
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A method for generating a smaller dataset from a larger dataset, each dataset holding a plurality of records, includes profiling the larger dataset to identify a plurality of patterns, each of which is descriptive of one or more records held in the larger dataset. A plurality of slots of the smaller dataset is filled with records held in the larger dataset. Multiple records held in the larger dataset are individually retrieved, and for each retrieved record it is determined whether to place the retrieved record into a slot of the smaller dataset and evict a record already occupying that slot, or not place the retrieved record into the smaller dataset. This determination is based on a pattern of the retrieved record and a representation status of the pattern in the smaller dataset.
公开/授权文献
信息查询