专利检索 ap:("Oracle International Corporation") AND inv:"Jean-Baptiste Tristan" 第 1 页

1.

发明申请
STREAMING LATENT DIRICHLET ALLOCATION 审中-公开

公开(公告)号：US20190114319A1

公开(公告)日：2019-04-18

申请号：US15934262

申请日：2018-03-23

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Michael Wick , Stephen Green

IPC分类号： G06F17/27 , G06N5/04 , G06F17/22 , G06F17/30

摘要： Embodiments make novel use of random data structures to facilitate streaming inference for a Latent Dirichlet Allocation (LDA) model. Utilizing random data structures facilitates streaming inference by entirely avoiding the need for pre-computation, which is generally an obstacle to many current “streaming” variants of LDA as described above. Specifically, streaming inference—based on an inference algorithm such as Stochastic Cellular Automata (SCA), Gibbs sampling, and/or Stochastic Expectation Maximization (SEM)—is implemented using a count-min sketch to track sufficient statistics for the inference procedure. Use of a count-min sketch avoids the need to know the vocabulary size V a priori. Also, use of a count-min sketch directly enables feature hashing, which addresses the problem of effectively encoding words into indices without the need of pre-computation. Approximate counters are also used within the count-min sketch to avoid bit overflow issues with the counts in the sketch.

2.

发明申请
DATA-PARALLEL PARAMETER ESTIMATION OF THE LATENT DIRICHLET ALLOCATION MODEL BY GREEDY GIBBS SAMPLING 审中-公开
标题翻译：通过GREEDY GIBBS采样的数据并行参数估计最小二分配分配模型

公开(公告)号：US20160210718A1

公开(公告)日：2016-07-21

申请号：US14599272

申请日：2015-01-16

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Guy Steele

IPC分类号： G06T1/20

CPC分类号： G06K9/00 , G06F16/93 , G06K9/00442 , G06K9/00986

摘要： A novel data-parallel algorithm is presented for topic modeling on a highly-parallel hardware architectures. The algorithm is a Markov-Chain Monte Carlo algorithm used to estimate the parameters of the LDA topic model. This algorithm is based on a highly parallel partially-collapsed Gibbs sampler, but replaces a stochastic step that draws from a distribution with an optimization step that computes the mean of the distribution directly and deterministically. This algorithm is correct, it is statistically performant, and it is faster than state-of-the art algorithms because it can exploit the massive amounts of parallelism by processing the algorithm on a highly-parallel architecture, such as a GPU. Furthermore, the partially-collapsed Gibbs sampler converges about as fast as the collapsed Gibbs sampler and identifies solutions that are as good, or even better, as the collapsed Gibbs sampler.

摘要翻译： 提出了一种新颖的数据并行算法，用于高并行硬件架构上的主题建模。该算法是用于估计LDA主题模型参数的马尔可夫链蒙特卡罗算法。该算法基于高度并行的部分折叠Gibbs采样器，但是替代了从分布中抽取的随机步骤，其中优化步骤直接和确定地计算分布的平均值。该算法是正确的，它是统计学上的，它比现有技术的算法更快，因为它可以通过在诸如GPU的高度并行架构上处理算法来利用大量的并行性。此外，部分折叠的吉布斯采样器收敛的速度与收缩的吉布斯取样器一样快，并识别与折叠的吉布斯取样器一样好甚至更好的解决方案。

3.

发明授权
Data-parallel parameter estimation of the Latent Dirichlet allocation model by greedy Gibbs sampling 有权

公开(公告)号：US10860829B2

公开(公告)日：2020-12-08

申请号：US14599272

申请日：2015-01-16

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Guy Steele

IPC分类号： G06K9/00 , G06F16/93

摘要： A novel data-parallel algorithm is presented for topic modeling on a highly-parallel hardware architectures. The algorithm is a Markov-Chain Monte Carlo algorithm used to estimate the parameters of the LDA topic model. This algorithm is based on a highly parallel partially-collapsed Gibbs sampler, but replaces a stochastic step that draws from a distribution with an optimization step that computes the mean of the distribution directly and deterministically. This algorithm is correct, it is statistically performant, and it is faster than state-of-the art algorithms because it can exploit the massive amounts of parallelism by processing the algorithm on a highly-parallel architecture, such as a GPU. Furthermore, the partially-collapsed Gibbs sampler converges about as fast as the collapsed Gibbs sampler and identifies solutions that are as good, or even better, as the collapsed Gibbs sampler.

4.

发明申请
Ensembled Decision Systems Using Feature Hashing Models 审中-公开

公开(公告)号：US20190095805A1

公开(公告)日：2019-03-28

申请号：US15717830

申请日：2017-09-27

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Adam Pocock , Michael Wick , Guy Steele

IPC分类号： G06N5/04 , G06N99/00 , G06F17/30

摘要： Systems and methods are disclosed to build and execute a decision system based on multiple machine learned decision models. In embodiments, the decision system performs a hashing technique to reduce relevant features of the input data into a feature vector for each decision model. The feature vector reduces the dimensionality of the feature universe of the input data, and its use allows the decision models to be trained and executed using less computing resources. In embodiments, the decision system implements an ensembled decision model that makes decisions based on a combination function that combines the decision results of the individual models in the ensemble. The decision models employ different hashing techniques to hash the input features differently, so that errors caused by the feature hashing of individual models are reduced in the aggregate.

5.

发明申请
DATA-PARALLEL PROBABILISTIC INFERENCE 审中-公开
标题翻译：数据并行概率论

公开(公告)号：US20150095277A1

公开(公告)日：2015-04-02

申请号：US14316186

申请日：2014-06-26

申请人： ORACLE INTERNATIONAL CORPORATION

发明人： Jean-Baptiste Tristan , Guy L. Steele, JR. , Daniel E. Huang , Joseph Tassarotti

IPC分类号： G06N7/00 , G06N5/04

CPC分类号： G06N7/005

摘要： The present invention relates to a probabilistic programming compiler that (a) generates data-parallel inference code to sample from probability distributions in models provided to the compiler; and (b) utilizes a modular framework to allow addition and removal of inference algorithm information based on which the compiler generates the inference code. For a given model, the described compiler can generate inference code that implements any one or more of the inference algorithms that are available to the compiler. The modular compiler framework utilizes an intermediate representation (IR) that symbolically represents features of probability distributions. The compiler then uses the IR as a basis for emitting inference code to sample from the one or more probability distributions represented in the IR. Further, the compiler produces parallelized inference code that facilitates efficient parallel processing of inference computations in order to take advantage of highly data-parallel architectures, such as GPUs.

摘要翻译： 本发明涉及一种概率编程编译器，其（a）生成数据并行推理码以从提供给编译器的模型中的概率分布中抽样; 和（b）利用模块化框架来允许根据编译器生成推理码的推理算法信息的添加和移除。对于给定的模型，所描述的编译器可以生成实现编译器可用的任何一个或多个推理算法的推理代码。模块化编译器框架利用了符号表示概率分布特征的中间表示（IR）。然后，编译器使用IR作为从IR中表示的一个或多个概率分布发射推理代码进行采样的基础。此外，编译器产生并行化推理代码，有助于推理计算的有效并行处理，以便利用高度数据并行架构，如GPU。

6.

发明授权
Differentiable set to increase the memory capacity of recurrent neural net works 有权

公开(公告)号：US11636308B2

公开(公告)日：2023-04-25

申请号：US15339303

申请日：2016-10-31

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Michael Wick , Manzil Zaheer

IPC分类号： G06N3/04 , G06N3/08 , G06N3/044 , G06N3/084

摘要： According to embodiments, a recurrent neural network (RNN) is equipped with a set data structure whose operations are differentiable, which data structure can be used to store information for a long period of time. This differentiable set data structure can “remember” an event in the sequence of sequential data that may impact another event much later in the sequence, thereby allowing the RNN to classify the sequence based on many kinds of long dependencies. An RNN that is equipped with the differentiable set data structure can be properly trained with backpropagation and gradient descent optimizations. According to embodiments, a differentiable set data structure can be used to store and retrieve information with a simple set-like interface. According to further embodiments, the RNN can be extended to support several add operations, which can make the differentiable set data structure behave like a Bloom filter.

7.

发明授权
Learning topics by simulation of a stochastic cellular automaton 有权

公开(公告)号：US10394872B2

公开(公告)日：2019-08-27

申请号：US14932825

申请日：2015-11-04

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Stephen J. Green , Guy L. Steele, Jr. , Manzil Zaheer

IPC分类号： G06F16/35 , G06F16/951 , G06F16/33

摘要： Herein is described an unsupervised learning method to discover topics and reduce the dimensionality of documents by designing and simulating a stochastic cellular automaton. A key formula that appears in many inference methods for LDA is used as the local update rule of the cellular automaton. Approximate counters may be used to represent counter values being tracked by the inference algorithms. Also, sparsity may be used to reduce the amount of computation needed for sampling a topic for particular words in the corpus being analyzed.

8.

发明申请
SPARSE AND DATA-PARALLEL INFERENCE METHOD AND SYSTEM FOR THE LATENT DIRICHLET ALLOCATION MODEL 有权
标题翻译：最小二乘法分配模型的稀疏和数据并行推理方法与系统

公开(公告)号：US20160224544A1

公开(公告)日：2016-08-04

申请号：US14755312

申请日：2015-06-30

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Guy L. Steele, JR. , Joseph Tassarotti

IPC分类号： G06F17/28 , G06F9/50 , G06F17/27

CPC分类号： G06N7/005 , G06F9/5066 , G06F17/277 , G06F17/2785 , G06F17/3071

摘要： Herein is described a data-parallel and sparse algorithm for topic modeling. This algorithm is based on a highly parallel algorithm for a Greedy Gibbs sampler. The Greedy Gibbs sampler is a Markov-Chain Monte Carlo algorithm that estimates topics, in an unsupervised fashion, by estimating the parameters of the topic model Latent Dirichlet Allocation (LDA). The Greedy Gibbs sampler is a data-parallel algorithm for topic modeling, and is configured to be implemented on a highly-parallel architecture, such as a GPU. The Greedy Gibbs sampler is modified to take advantage of data sparsity while maintaining the parallelism. Furthermore, in an embodiment, implementation of the Greedy Gibbs sampler uses both densely-represented and sparsely-represented matrices to reduce the amount of computation while maintaining fast accesses to memory for implementation on a GPU.

摘要翻译： 这里描述了用于主题建模的数据并行和稀疏算法。该算法基于Greedy Gibbs采样器的高度并行算法。贪婪吉布斯取样器是一种马尔可夫链蒙特卡罗算法，通过估计主题模型潜在狄更斯特分配（LDA）的参数，以无监督的方式估计主题。 Greedy Gibbs采样器是用于主题建模的数据并行算法，并被配置为在诸如GPU的高度并行架构上实现。贪婪吉布斯取样器被修改为利用数据稀疏性，同时保持并行性。此外，在一个实施例中，Greedy Gibbs采样器的实现使用密集表示和稀疏表示的矩阵来减少计算量，同时保持对存储器的快速访问以在GPU上实现。

9.

发明授权
Ensembled decision systems using feature hashing models 有权

公开(公告)号：US11263541B2

公开(公告)日：2022-03-01

申请号：US15717830

申请日：2017-09-27

申请人： Oracle International Corporation

发明人： Jean-Baptiste Tristan , Adam Pocock , Michael Wick , Guy Steele

IPC分类号： G06N5/04 , G06N20/00 , G06F16/901

摘要： Systems and methods are disclosed to build and execute a decision system based on multiple machine learned decision models. In embodiments, the decision system performs a hashing technique to reduce relevant features of the input data into a feature vector for each decision model. The feature vector reduces the dimensionality of the feature universe of the input data, and its use allows the decision models to be trained and executed using less computing resources. In embodiments, the decision system implements an ensembled decision model that makes decisions based on a combination function that combines the decision results of the individual models in the ensemble. The decision models employ different hashing techniques to hash the input features differently, so that errors caused by the feature hashing of individual models are reduced in the aggregate.

10.

发明授权
Parallel Gibbs sampler using butterfly-patterned partial sums 有权

公开(公告)号：US10157346B2

公开(公告)日：2018-12-18

申请号：US14713205

申请日：2015-05-15

申请人： Oracle International Corporation

发明人： Guy L. Steele, Jr. , Jean-Baptiste Tristan

IPC分类号： G06N7/00

摘要： An efficient parallel Gibbs sampler using butterfly-patterned partial sums is provided. Instead of building and searching a complete prefix sums table, an alternative “butterfly patterned partial sums table” is described that integrates a lightweight transposition and partial sums operation. Accordingly, the usual full matrix transposition and full prefix sums table building operations can be omitted in favor of building the butterfly-patterned partial sums table, which requires less computational and communication effort. This butterfly-patterned partial sums table is used by a modified binary search phase that calculates the needed prefix-sum table values on-the-fly using the butterfly-patterned partial sums table. Transposed memory access is also provided while avoiding the full matrix transform, providing significant performance benefits for highly parallel architectures, such as graphics processing units (GPUs) where 1-stride or sequential memory accesses are important for optimization.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类