Unfolded convolution for fast feature extraction
    1.
    发明授权
    Unfolded convolution for fast feature extraction 有权
    用于快速特征提取的展开卷积

    公开(公告)号:US07634137B2

    公开(公告)日:2009-12-15

    申请号:US11250819

    申请日:2005-10-14

    IPC分类号: G06K9/46

    CPC分类号: G06K9/4628 G06K2209/01

    摘要: Systems and methods are described that facilitate performing feature extraction across multiple received input features to reduce computational overhead associated with feature processing related to, for instance, optical character recognition. Input feature information can be unfolded and concatenated to generate an aggregated input matrix, which can be convolved with a kernel matrix to produce output feature information for multiple output features concurrently.

    摘要翻译: 描述了有助于在多个接收到的输入特征之间执行特征提取的系统和方法,以减少与例如光学字符识别相关的特征处理相关联的计算开销。 输入特征信息可以展开并连接以生成聚合输入矩阵,其可以与内核矩阵进行卷积以同时产生多个输出特征的输出特征信息。

    Ink warping for normalization and beautification / ink beautification
    2.
    发明授权
    Ink warping for normalization and beautification / ink beautification 失效
    油墨翘曲正常化和美化/油墨美化

    公开(公告)号:US07593574B2

    公开(公告)日:2009-09-22

    申请号:US11173243

    申请日:2005-07-01

    IPC分类号: G06K9/18

    CPC分类号: G06K9/00416

    摘要: Systems and methods are disclosed that facilitate normalizing and beautifying digitally generated handwriting, such as can be generated on a tablet PC or via scanning a handwritten document. A classifier can identify extrema in the digital handwriting and label such extrema according to predefined categories (e.g., bottom, baseline, midline, top, other, . . . ). Multi-linear regression, polynomial regression, etc., can be performed to align labeled extrema to respective and corresponding desired points as indicated by the labels. Additionally, displacement techniques can be applied to the regressed handwriting to optimize legibility for reading by a human viewer and/or for character recognition by a handwriting recognition application. The displacement techniques can comprise a “rubber sheet” displacement algorithm in conjunction with a “rubber rod” displacement algorithm, which can collectively preserve spatial features of the handwriting during warping thereof.

    摘要翻译: 公开了促进数字生成的笔迹的归一化和美化的系统和方法,诸如可以在平板PC上生成或通过扫描手写文档。 分类器可以根据预定类别(例如,底部,基线,中线,顶部,其他等)识别数字手写中的极值并标记这样的极值。 可以执行多线性回归,多项式回归等,以将标记的极值与标签所示的相应和对应的期望点对齐。 此外,位移技术可以应用于回归的笔迹,以优化由人类观察者阅读的可读性和/或通过手写识别应用的字符识别。 位移技术可以包括“橡胶片”位移算法,结合“橡胶棒”位移算法,其可以在其翘曲期间共同保留笔迹的空间特征。

    Elastic distortions for automatic generation of labeled data
    3.
    发明授权
    Elastic distortions for automatic generation of labeled data 有权
    用于自动生成标记数据的弹性失真

    公开(公告)号:US07418128B2

    公开(公告)日:2008-08-26

    申请号:US10631511

    申请日:2003-07-31

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6256

    摘要: A system that facilitates generation of data that can be employed in connection with training a classifier. The system comprises a component that receives a data set that is employed in connection with training the classifier, and an expansion component that applies elastic distortion algorithm(s) to a subset of the data set to generate additional labeled training data.

    摘要翻译: 一种有助于产生可以与训练分类器结合使用的数据的系统。 该系统包括接收与训练分类器结合使用的数据集的组件以及将弹性失真算法应用于数据集的子集以产生附加标记的训练数据的扩展组件。

    System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit
    4.
    发明授权
    System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit 有权
    用于加速和优化使用图形处理单元的机器学习技术的处理的系统和方法

    公开(公告)号:US07219085B2

    公开(公告)日:2007-05-15

    申请号:US10837382

    申请日:2004-04-30

    IPC分类号: G06F15/80

    CPC分类号: G06K9/00986 G06N3/08

    摘要: A system and method for processing machine learning techniques (such as neural networks) and other non-graphics applications using a graphics processing unit (GPU) to accelerate and optimize the processing. The system and method transfers an architecture that can be used for a wide variety of machine learning techniques from the CPU to the GPU. The transfer of processing to the GPU is accomplished using several novel techniques that overcome the limitations and work well within the framework of the GPU architecture. With these limitations overcome, machine learning techniques are particularly well suited for processing on the GPU because the GPU is typically much more powerful than the typical CPU. Moreover, similar to graphics processing, processing of machine learning techniques involves problems with solving non-trivial solutions and large amounts of data.

    摘要翻译: 一种用于处理机器学习技术(例如神经网络)和使用图形处理单元(GPU)来加速和优化处理的其他非图形应用的系统和方法。 该系统和方法传输一种可用于从CPU到GPU的各种机器学习技术的架构。 处理到GPU的转移是通过克服这些限制并在GPU架构的框架内工作良好的几种新技术实现的。 由于克服了这些限制,机器学习技术特别适用于GPU上的处理,因为GPU通常比典型的CPU功能更强大。 此外,类似于图形处理,机器学习技术的处理涉及解决非平凡解决方案和大量数据的问题。

    Processing machine learning techniques using a graphics processing unit
    5.
    发明授权
    Processing machine learning techniques using a graphics processing unit 有权
    处理机器学习技术使用图形处理单元

    公开(公告)号:US07548892B2

    公开(公告)日:2009-06-16

    申请号:US11748474

    申请日:2007-05-14

    IPC分类号: G06F15/18 G06K9/62

    CPC分类号: G06N99/005 G06N3/08

    摘要: A system and method for processing machine learning techniques (such as neural networks) and other non-graphics applications using a graphics processing unit (GPU) to accelerate and optimize the processing. The system and method transfers an architecture that can be used for a wide variety of machine learning techniques from the CPU to the GPU. The transfer of processing to the GPU is accomplished using several novel techniques that overcome the limitations and work well within the framework of the GPU architecture. With these limitations overcome, machine learning techniques are particularly well suited for processing on the GPU because the GPU is typically much more powerful than the typical CPU. Moreover, similar to graphics processing, processing of machine learning techniques involves problems with solving non-trivial solutions and large amounts of data.

    摘要翻译: 一种用于处理机器学习技术(例如神经网络)和使用图形处理单元(GPU)来加速和优化处理的其他非图形应用的系统和方法。 该系统和方法传输一种可用于从CPU到GPU的各种机器学习技术的架构。 处理到GPU的转移是通过克服这些限制并在GPU架构的框架内工作良好的几种新技术实现的。 由于克服了这些限制,机器学习技术特别适用于GPU上的处理,因为GPU通常比典型的CPU功能更强大。 此外,类似于图形处理,机器学习技术的处理涉及解决非平凡解决方案和大量数据的问题。

    Optimizing performance of a graphics processing unit for efficient execution of general matrix operations
    6.
    发明授权
    Optimizing performance of a graphics processing unit for efficient execution of general matrix operations 有权
    优化图形处理单元的性能,以有效执行一般矩阵运算

    公开(公告)号:US07567252B2

    公开(公告)日:2009-07-28

    申请号:US10877730

    申请日:2004-06-25

    IPC分类号: G06F15/00

    摘要: A system and method for optimizing the performance of a graphics processing unit (GPU) for processing and execution of general matrix operations such that the operations are accelerated and optimized. The system and method describes the layouts of operands and results in graphics memory, as well as partitioning the processes into a sequence of passes through a macro step. Specifically, operands are placed in memory in a pattern, results are written into memory in a pattern appropriate for use as operands in a later pass, data sets are partitioned to insure that each pass fits into fixed sized memory, and the execution model incorporates generally reusable macro steps for use in multiple passes. These features enable greater efficiency and speed in processing and executing general matrix operations.

    摘要翻译: 一种用于优化用于处理和执行一般矩阵运算的图形处理单元(GPU)的性能的系统和方法,使得加速和优化操作。 该系统和方法描述了图形存储器中的操作数和结果的布局,以及将进程划分为通过宏步骤的顺序。 具体来说,操作数以模式放置在存储器中,结果以适合在稍后传递中用作操作数的模式写入存储器,数据集被分区以确保每个通过符合固定大小的存储器,并且执行模型通常包含 可重复使用的宏步骤可用于多次通过。 这些特性使得在处理和执行通用矩阵运算时能够提高效率和速度。

    Method and system for performing phrase/word clustering and cluster merging

    公开(公告)号:US06578032B1

    公开(公告)日:2003-06-10

    申请号:US09605404

    申请日:2000-06-28

    IPC分类号: G06F1730

    摘要: Text classification has become an important aspect of information technology. Present text classification techniques range from simple text matching to more complex clustering methods. Clustering describes a process of discovering structure in a collection of characters. The invention automatically analyzes a text string and either updates an existing cluster or creates a new cluster. To that end, the invention may use a character n-gram matching process in addition to other heuristic-based clustering techniques. In the character n-gram matching process, each text string is first normalized using several heuristics. It is then divided into a set of overlapping character n-grams, where n is the number of adjacent characters. If the commonality between the text string and the existing cluster members satisfies a pre-defined threshold, the text string is added to the cluster. If, on the other hand, the commonality does not satisfy the pre-defined threshold, a new cluster may be created. Each cluster may have a selected topic name. The topic name allows whole clusters to be compared in a similar way to the individual clusters, and merged when a predetermined level of commonality exists between the subject clusters. The topic name also may be used as a suggested alternative to the text string. In this instance, the topic name of the cluster to which the text string was added may be outputted as an alternative to the text string.

    Method and system for performing phrase/word clustering and cluster merging
    8.
    发明授权
    Method and system for performing phrase/word clustering and cluster merging 有权
    用于执行短语/单词聚类和集群合并的方法和系统

    公开(公告)号:US07519590B2

    公开(公告)日:2009-04-14

    申请号:US10457686

    申请日:2003-06-09

    IPC分类号: G06F17/30

    摘要: Text classification has become an important aspect of information technology. Present text classification techniques range from simple text matching to more complex clustering methods. Clustering describes a process of discovering structure in a collection of characters. The invention automatically analyzes a text string and either updates an existing cluster or creates a new cluster. To that end, the invention may use a character n-gram matching process in addition to other heuristic-based clustering techniques. In the character n-gram matching process, each text string is first normalized using several heuristics. It is then divided into a set of overlapping character n-grams, where n is the number of adjacent characters. If the commonality between the text string and the existing cluster members satisfies a pre-defined threshold, the text string is added to the cluster. If, on the other hand, the commonality does not satisfy the pre-defined threshold, a new cluster may be created. Each cluster may have a selected topic name. The topic name allows whole clusters to be compared in a similar way to the individual clusters, and merged when a predetermined level of commonality exists between the subject clusters. The topic name also may be used as a suggested alternative to the text string. In this instance, the topic name of the cluster to which the text string was added may be outputted as an alternative to the text string.

    摘要翻译: 文本分类已成为信息技术的一个重要方面。 现在的文本分类技术从简单的文本匹配到更复杂的聚类方法。 聚类描述了一个发现字符集合中的结构的过程。 本发明自动分析文本字符串,并更新现有集群或创建新集群。 为此,除了其他基于启发式的聚类技术之外,本发明可以使用字符n-gram匹配过程。 在字符n-gram匹配过程中,每个文本字符串首先使用几个启发式进行归一化。 然后将其分成一组重叠的字符n-gram,其中n是相邻字符的数量。 如果文本字符串和现有集群成员之间的通用性满足预定义的阈值,则将文本字符串添加到集群中。 另一方面,如果共通性不能满足预定义的阈值,则可以创建新的集群。 每个群集可能具有选定的主题名称。 主题名称允许以类似于每个集群的方式比较整个集群,并且当主体集群之间存在预定级别的共性时进行合并。 主题名称也可以用作文本字符串的建议替代。 在这种情况下,可以输出添加文本字符串的集群的主题名称作为文本字符串的替代方法。

    Method and apparatus for concept searching using a Boolean or keyword search engine
    9.
    发明授权
    Method and apparatus for concept searching using a Boolean or keyword search engine 有权
    使用布尔或关键词搜索引擎进行概念搜索的方法和装置

    公开(公告)号:US06363373B1

    公开(公告)日:2002-03-26

    申请号:US09164284

    申请日:1998-10-01

    IPC分类号: G06F1730

    摘要: Concept searching using a Boolean or keyword search engine. Documents are preprocessed before being passed to a search engine by identifying, on a word-by-word basis, the “word tokens” contained in the document. Once the word tokens have been extracted, each word token is referenced in a concept database that maps word tokens to concept identifiers. The concept identifiers associated with the word tokens are converted into unique non-word concept tokens and arranged into a list. The list is then inserted into the document as invisible but searchable text. The document is then transferred to the server monitored by the search engine. Search queries are preprocessed before being passed to the search engine in the same manner. The query is first broken into word tokens and the word tokens are then referenced in the concept database. All associated concept identifiers are retrieved and converted to unique concept tokens. The concept tokens are then combined into a string and sent to the search engine as an ordinary query.

    摘要翻译: 使用布尔或关键字搜索引擎进行概念搜索。 文件在被传递给搜索引擎之前通过逐字地识别文档中包含的“单词令牌”来进行预处理。 一旦提取了单词令牌,在将单词标记映射到概念标识符的概念数据库中引用每个单词标记。 与单词令牌相关联的概念标识符被转换为唯一的非单词概念令牌并被排列成列表。 然后将列表作为不可见但可搜索的文本插入到文档中。 然后将文档传输到由搜索引擎监控的服务器。 搜索查询在以相同的方式传递到搜索引擎之前进行预处理。 该查询首先被分成词令牌,然后在概念数据库中引用单词令牌。 检索所有关联的概念标识符并将其转换为唯一的概念令牌。 然后将概念令牌组合成字符串,并作为普通查询发送到搜索引擎。