Distributed reservoir sampling for web applications
    83.
    发明授权
    Distributed reservoir sampling for web applications 有权
    Web应用程序的分布式油藏采样

    公开(公告)号:US07308447B2

    公开(公告)日:2007-12-11

    申请号:US11212301

    申请日:2005-08-26

    IPC分类号: G06F17/30

    摘要: Random samples without replacement are extracted from a distributed set of items by leveraging techniques for aggregating sampled subsets of the distributed set. This provides a uniform random sample without replacement representative of the distributed set, allowing statistical information to be gleaned from extremely large sets of distributed information. Subset random samples without replacement are extracted from independent subsets of the distributed set of items. The subset random samples are then aggregated to provide a uniform random sample without replacement of a fixed size that is representative of a distributed set of items of unknown size. In one instance, a multivariate hyper-geometric distribution is sampled by breaking up the multivariate hyper-geometric distribution into a set of univariate hyper-geometric distributions. Individual items of a uniform random sample without replacement are then determined utilizing a normal approximation of the univariate hyper-geometric distributions and a finite population correction factor.

    摘要翻译: 通过利用用于聚合分布集合的采样子集的技术,从分布式集合中提取不带替换的随机样本。 这提供了一个统一的随机样本,而不需要替代代表分布集,允许从极大的分布式信息集中收集统计信息。 从分配的项目集的独立子集中提取不具有替换的子集随机样本。 然后将子集随机样本聚合以提供均匀的随机样本,而不替换代表未知大小的分布式项目集合的固定大小。 在一种情况下,通过将多变量超几何分布分解成一组单变量超几何分布来对多变量超几何分布进行采样。 然后使用单变量超几何分布的正态近似和有限群体校正因子来确定不具有替换的均匀随机样本的单个项目。

    Automatic data perspective generation for a target variable
    84.
    发明授权
    Automatic data perspective generation for a target variable 有权
    为目标变量生成自动数据透视图

    公开(公告)号:US07225200B2

    公开(公告)日:2007-05-29

    申请号:US10824108

    申请日:2004-04-14

    IPC分类号: G06F17/00 G06F7/00

    摘要: The present invention leverages machine learning techniques to provide automatic generation of conditioning variables for constructing a data perspective for a given target variable. The present invention determines and analyzes the best target variable predictors for a given target variable, employing them to facilitate the conveying of information about the target variable to a user. It automatically discretizes continuous and discrete variables utilized as target variable predictors to establish their granularity. In other instances of the present invention, a complexity and/or utility parameter can be specified to facilitate generation of the data perspective via analyzing a best target variable predictor versus the complexity of the conditioning variable(s) and/or utility. The present invention can also adjust the conditioning variables (i.e., target variable predictors) of the data perspective to provide an optimum view and/or accept control inputs from a user to guide/control the generation of the data perspective.

    摘要翻译: 本发明利用机器学习技术来提供用于为给定目标变量构建数据透视图的自动生成调节变量。 本发明确定和分析给定目标变量的最佳目标变量预测变量,使用它们来促进向用户传达关于目标变量的信息。 它自动离散化用作目标变量预测变量的连续和离散变量以确定其粒度。 在本发明的其他实例中,可以规定复杂性和/或效用参数,以通过分析最佳目标变量预测器与调节变量和/或效用的复杂性来促进数据透视的产生。 本发明还可以调整数据透视图的调节变量(即,目标变量预测器),以提供最佳视图和/或接受来自用户的控制输入以指导/控制数据视角的产生。

    Staged mixture modeling
    85.
    发明授权
    Staged mixture modeling 有权
    分阶段混合建模

    公开(公告)号:US07133811B2

    公开(公告)日:2006-11-07

    申请号:US10270914

    申请日:2002-10-15

    IPC分类号: G06F17/10

    摘要: A system and method for generating staged mixture model(s) is provided. The staged mixture model includes a plurality of mixture components each having an associated mixture weight, and, an added mixture component having an initial structure, parameters and associated mixture weight. The added mixture component is modified based, at least in part, upon a case that is undesirably addressed by the plurality of mixture components using a structural expectation maximization (SEM) algorithm to modify at the structure, parameters and/or associated mixture weight of the added mixture component.The staged mixture model employs a data-driven staged mixture modeling technique, for example, for building density, regression, and classification model(s). The basic approach is to add mixture component(s) (e.g., sequentially) to the staged mixture model using an SEM algorithm.

    摘要翻译: 提供了一种用于生成分段混合模型的系统和方法。 分级混合物模型包括各自具有相关混合物重量的多种混合物组分,以及具有初始结构,参数和相关混合物重量的添加的混合物组分。 至少部分地,添加的混合物组分基于使用结构期望最大化(SEM)算法不期望地由多个混合物组分解决的情况进行修饰,以在结构,参数和/或相关联的混合物重量 加入的混合物组分。 分级混合模型采用数据驱动的分段混合建模技术,例如建筑密度,回归和分类模型。 基本方法是使用SEM算法将混合物组分(例如,顺序地)添加到分级混合物模型中。

    Cluster-based visualization of user traffic on an internet site
    86.
    发明授权
    Cluster-based visualization of user traffic on an internet site 失效
    互联网网站上用户流量的基于群集的可视化

    公开(公告)号:US06771289B1

    公开(公告)日:2004-08-03

    申请号:US09517462

    申请日:2000-03-02

    IPC分类号: G06F314

    摘要: Visualizing Internet web traffic is disclosed. In one embodiment, a number of windows are displayed, corresponding to a number of clusters into which users have been partitioned based on similar web browsing behavior. The windows are ordered from the cluster having the greatest number of users to the cluster having the least number of users. Each window has one or more rows, where each row corresponds to a user within the cluster. Each row has an ordered number of visible units, such as blocks, where each block corresponds to a web page visited by the user. The blocks can be color coded by the type of web page they represent. In one embodiment, the corresponding cluster models for the clusters are alternatively displayed in the windows.

    摘要翻译: 透露互联网流量的可视化。 在一个实施例中,根据类似的网页浏览行为,显示对应于用户已被划分到的群集的数量的多个窗口。 Windows从具有最多用户数量的群集排队到具有最少用户数量的群集。 每个窗口都有一行或多行,其中每行对应于群集内的用户。 每行都有一个有序数量的可见单元,例如块,其中每个块对应于用户访问的网页。 这些块可以通过它们所代表的网页的类型进行颜色编码。 在一个实施例中,用于集群的对应的集群模型交替地显示在窗口中。

    Visualization of high-dimensional data
    87.
    发明授权
    Visualization of high-dimensional data 有权
    高维数据的可视化

    公开(公告)号:US06519599B1

    公开(公告)日:2003-02-11

    申请号:US09517138

    申请日:2000-03-02

    IPC分类号: G06F1730

    摘要: Visualization of high-dimensional data sets is disclosed, particularly the display of a network model for a data set. The network, such as a dependency or a Bayesian network, has a number of nodes having dependencies thereamong. The network can be displayed items and connections, corresponding to nodes and dependencies, respectively. Selection of a particular item in one embodiment results in the display of the local distribution associated with the node for the item. In one embodiment, only a predetermined number of the items are shown, such as only the items representing the most popular nodes. Furthermore, in one embodiment, in response to receiving a user input, a sub-set of the connections is displayed, proportional to the user input. In another embodiment, a particular item is displayed in an emphasized manner, and the particular connections representing dependencies including the node represented by the particular item, as well as the items representing nodes also in these dependencies, are also displayed in the emphasized manner. Furthermore, in one embodiment, only an indicated sub-set of the items is displayed.

    摘要翻译: 公开了高维数据集的可视化,特别是显示数据集的网络模型。 诸如依赖关系或贝叶斯网络的网络具有多个具有依赖关系的节点。 网络可以分别显示对应于节点和依赖关系的项目和连接。 在一个实施例中,特定项目的选择导致与项目的节点相关联的本地分布的显示。 在一个实施例中,仅显示预定数量的项目,诸如仅表示最受欢迎节点的项目。 此外,在一个实施例中,响应于接收到用户输入,显示与用户输入成比例的连接的子集。 在另一个实施例中,以强调方式显示特定项目,并且还以强调的方式显示表示依赖性的特定连接,包括由特定项目表示的节点以及表示节点的项目也在这些依赖关系中。 此外,在一个实施例中,仅显示所指示的项目子集。

    Speech recognition with mixtures of bayesian networks
    88.
    发明授权
    Speech recognition with mixtures of bayesian networks 有权
    语音识别与贝叶斯网络的混合

    公开(公告)号:US06336108B1

    公开(公告)日:2002-01-01

    申请号:US09220197

    申请日:1998-12-23

    IPC分类号: G06F1518

    摘要: The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state. Each HSBN has nodes corresponding to the elements of the acoustic observations. These nodes store probability parameters corresponding to the probabilities with causal links representing dependencies between ones of said nodes.

    摘要翻译: 本发明使用贝叶斯网络混合的阵列来执行语音识别。 贝叶斯网络(MBN)的混合由多个具有隐藏和观察变量的假设特定贝叶斯网络(HSBN)组成。 常见的外部隐藏变量与MBN相关联,但不包括在任何HSBN中。 MBN中的HSBN的数量对应于共同外部隐藏变量的状态数,并且每个HSBN在假设下共同的外部隐藏变量处于相应的一个状态的假设下对世界进行建模。 根据本发明,MBN编码了考虑到所述话音部分中的相应一个的话语来观察声学观测组的概率。 每个HSBN编码观察给定语音相应的一个语音的发音并给出隐藏的公共变量处于特定状态的声学观察组的概率。 每个HSBN具有对应于声学观测元素的节点。 这些节点存储对应于概率的概率参数,其中因果链接表示所述节点之间的依赖关系。

    Method and system for visually indicating a selection query
    89.
    发明授权
    Method and system for visually indicating a selection query 有权
    用于可视化地指示选择查询的方法和系统

    公开(公告)号:US6111574A

    公开(公告)日:2000-08-29

    申请号:US258002

    申请日:1999-02-25

    摘要: A method and system for specifying a selection query for a collection of data items. The system allows a user to define a various conditions (e.g., "Supervisor=Smith") that relate to the collection. A unique icon is then assigned to represent each condition. These icon can either be assigned automatically by the system or assigned by a user. When a selection query is to be specified, the system displays a selection query grid. The selection query grid contains a row for each possible combination of the defined conditions. Each possible combination is represented by displaying the icons for the conditions in that combination in the row. A user can then select which combinations should form the selection query by selecting rows of the selection query grid. The selection query is the logical-AND of each condition or logical inverse of each condition of a selected combination and the logical-OR of all the selected combinations. The system then uses this selection query to retrieve the data items from the collection.

    摘要翻译: 一种用于指定数据项集合的选择查询的方法和系统。 该系统允许用户定义与收集相关的各种条件(例如,“Supervisor = Smith”)。 然后分配唯一的图标来表示每个条件。 这些图标可以由系统自动分配或由用户分配。 当指定选择查询时,系统显示选择查询网格。 选择查询网格包含所定义条件的每个可能组合的行。 每个可能的组合通过显示行中该组合中的条件的图标来表示。 然后,用户可以通过选择选择查询网格的行来选择哪些组合应形成选择查询。 选择查询是所选组合的每个条件的每个条件或逻辑逆的逻辑AND,以及所有选择的组合的逻辑或。 然后,系统使用此选择查询从集合中检索数据项。