Distributed algorithm to find reliable, significant and relevant patterns in large data sets

    公开(公告)号:US20170344890A1

    公开(公告)日:2017-11-30

    申请号:US15166233

    申请日:2016-05-26

    IPC分类号: G06N5/04 G06N7/00 G06F17/30

    摘要: System pre-processes and computes class distribution of decision attribute and statistics for discretization of continuous attributes through use of compute buckets. System computes the variability of each of the attributes and considers only the non-zero variability attributes. System computes the discernibility strength of each attribute. The software system generates size 1 patterns using compute bucket and calculates if each pattern of size 1 is a reliable pattern for any class. The system calculates if reliable pattern of size 1 is a significant pattern for any class. The system generates size k patterns from size k−1 patterns checking for significance of size k patterns and refinability. The system readjusts pattern statistics for only significant patterns for size k−1 patterns. The system computes a cumulative coverage of the sorted relevant patterns of up to size k by finding out the union of records of that particular class.