-
31.
公开(公告)号:US20070078808A1
公开(公告)日:2007-04-05
申请号:US11239044
申请日:2005-09-30
Applicant: Peter Haas , Volker Markl , Nimrod Megiddo , Utkarsh Srivastava
Inventor: Peter Haas , Volker Markl , Nimrod Megiddo , Utkarsh Srivastava
IPC: G06F17/30
CPC classification number: G06F17/30536
Abstract: A novel method is employed for collecting optimizer statistics for optimizing database queries by gathering feedback from the query execution engine about the observed cardinality of predicates and constructing and maintaining multidimensional histograms. This makes use of the correlation between data columns without employing an inefficient data scan. The maximum entropy principle is used to approximate the true data distribution by a histogram distribution that is as “simple” as possible while being consistent with the observed predicate cardinalities. Changes in the underlying data are readily adapted to, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The size of the histogram is controlled by retaining only the most “important” feedback.
Abstract translation: 采用一种新颖的方法来收集优化器统计数据,以优化数据库查询,方法是从查询执行引擎收集有关观察到的谓词的基数并构建和维护多维直方图的反馈。 这使得利用数据列之间的相关性而不采用低效的数据扫描。 最大熵原理用于通过尽可能“简单”的直方图分布近似真实数据分布,同时与观察到的谓词基数一致。 底层数据的变化很容易适应于以有效的方式自动检测和消除不一致的反馈信息。 通过仅保留最重要的反馈来控制直方图的大小。