Patent search ap:"Kenneth Heafield" Page 1

1.

发明授权
Systems and methods for identifying similar documents 有权
Title translation: 识别类似文件的系统和方法

公开(公告)号：US07958136B1

公开(公告)日：2011-06-07

申请号：US12050626

申请日：2008-03-18

Applicant: Taylor Curtis , Kenneth Heafield

Inventor： Taylor Curtis , Kenneth Heafield

IPC: G06F17/30

CPC classification number: G06F17/30616

Abstract: The present invention provides systems and methods for identifying similar documents. In an embodiment, the present invention identifies similar documents by (1) receiving document text for a current document that includes at least one word; (2) calculating a prominence score and a descriptiveness score for each word and each pair of consecutive words; (3) calculating a comparison metric for the current document; (4) finding at least one potential document, where document text for each potential document includes at least one of the words; and (5) analyzing each potential document to identify at least one similar document.

Abstract translation: 本发明提供了用于识别类似文档的系统和方法。在一个实施例中，本发明通过（1）接收包括至少一个单词的当前文档的文档文本来识别类似的文档; （2）计算每个单词和每对连续词的突出分数和描述性分数; （3）计算当前文档的比较度量; （4）找到至少一个潜在文件，其中每个潜在文件的文件文本包括至少一个词; 和（5）分析每个潜在的文件以识别至少一个类似的文档。

2.

发明授权
Systems and methods for identifying similar documents 有权
Title translation: 识别类似文件的系统和方法

公开(公告)号：US08713034B1

公开(公告)日：2014-04-29

申请号：US13153319

申请日：2011-06-03

Applicant: Taylor Curtis , Kenneth Heafield

Inventor： Taylor Curtis , Kenneth Heafield

IPC: G06F17/30

CPC classification number: G06F17/30616

Abstract: The present invention provides systems and methods for identifying similar documents. In an embodiment, the present invention identifies similar documents by (1) receiving document text for a current document that includes at least one word; (2) calculating a prominence score and a descriptiveness score for each word and each pair of consecutive words; (3) calculating a comparison metric for the current document; (4) finding at least one potential document, where document text for each potential document includes at least one of the words; and (5) analyzing each potential document to identify at least one similar document.

Abstract translation: 本发明提供了用于识别类似文档的系统和方法。在一个实施例中，本发明通过（1）接收包括至少一个单词的当前文档的文档文本来识别类似的文档; （2）计算每个单词和每对连续词的突出分数和描述性分数; （3）计算当前文档的比较度量; （4）找到至少一个潜在文件，其中每个潜在文件的文件文本包括至少一个词; 和（5）分析每个潜在的文件以识别至少一个类似的文档。

3.

发明申请
IDENTIFICATION OF TOPICS IN SOURCE CODE 有权
Title translation: 源代码中的主题识别

公开(公告)号：US20090254884A1

公开(公告)日：2009-10-08

申请号：US12212534

申请日：2008-09-17

Applicant: Girish Maskeri Rama , Kenneth Heafield , Santonu Sarkar

Inventor： Girish Maskeri Rama , Kenneth Heafield , Santonu Sarkar

IPC: G06F9/44

CPC classification number: G06F8/75

Abstract: Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.

Abstract translation: 可以通过接收源代码，从源代码识别特定于域的关键字，生成关键字矩阵，使用LDA处理关键字矩阵和源代码，以及输出主题列表，来识别源代码中的主题，使用潜在的Dirichlet分配（LDA）。主题列表作为域特定关键字的集合输出。也可以输出属于其各自主题的特定于域的关键字的概率。关键词矩阵包括源代码中的域特定关键词的出现的加权和。

4.

发明授权
Identification of topics in source code 有权
Title translation: 识别源代码中的主题

公开(公告)号：US08209665B2

公开(公告)日：2012-06-26

申请号：US12212534

申请日：2008-09-17

Applicant: Girish Maskeri Rama , Kenneth Heafield , Santonu Sarkar

Inventor： Girish Maskeri Rama , Kenneth Heafield , Santonu Sarkar

IPC: G06F9/44

CPC classification number: G06F8/75

Abstract: Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.

Abstract translation: 可以通过接收源代码，从源代码识别特定于域的关键字，生成关键字矩阵，使用LDA处理关键字矩阵和源代码，以及输出主题列表，来识别源代码中的主题，使用潜在的Dirichlet分配（LDA）。主题列表作为域特定关键字的集合输出。也可以输出属于其各自主题的特定于域的关键字的概率。关键词矩阵包括源代码中的域特定关键词的出现的加权和。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification