专利检索 ap:("Microsoft Technology Licensing, LLC") AND inv:"Nathan Roy Evans" 第 1 页

1.

发明申请
Revealing Content Reuse Using Fine Analysis 有权

公开(公告)号：US20210004582A1

公开(公告)日：2021-01-07

申请号：US16460967

申请日：2019-07-02

申请人： Microsoft Technology Licensing, LLC

发明人： Nathan Roy Evans , Christopher Miles White , Jonathan Karl Larson , Darren Keith Edge

IPC分类号： G06K9/00 , G06F17/27 , G06F16/901 , G06F16/906

摘要： Systems and methods for managing content provenance are provided. A network system accesses a document of a plurality of documents to be analyzed. The network system extracts text fragments from the document including a first fragment and a second fragment. A determination is made whether each of the text fragments match an entry in a hash table. Based on a first fragment not matching any entries in the hash table, the network system creates a new entry in the hash table, whereby the first fragment is used to generate a key in the hash table. Based on a second fragment matching an entry of the hash table, the network system associates the document with a key of the matching entry in the hash table, whereby the associating comprising updating the hash table with an identifier of the document.

2.

发明授权
Revealing content reuse using coarse analysis 有权

公开(公告)号：US11710330B2

公开(公告)日：2023-07-25

申请号：US16460980

申请日：2019-07-02

申请人： Microsoft Technology Licensing, LLC

发明人： Nathan Roy Evans , Christopher Miles White , Jonathan Karl Larson , Darren Keith Edge

IPC分类号： G06F16/906 , G06F16/901 , G06F40/216 , G06V30/414 , G06V30/416

CPC分类号： G06V30/414 , G06F16/906 , G06F16/9014 , G06F16/9024 , G06F40/216 , G06V30/416

摘要： Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.

3.

发明授权
Revealing content reuse using fine analysis 有权

公开(公告)号：US11341761B2

公开(公告)日：2022-05-24

申请号：US16460967

申请日：2019-07-02

申请人： Microsoft Technology Licensing, LLC

发明人： Nathan Roy Evans , Christopher Miles White , Jonathan Karl Larson , Darren Keith Edge

IPC分类号： G06F16/906 , G06F16/901 , G06V30/416 , G06V30/414 , G06F40/216

摘要： Systems and methods for managing content provenance are provided. A network system accesses a document of a plurality of documents to be analyzed. The network system extracts text fragments from the document including a first fragment and a second fragment. A determination is made whether each of the text fragments match an entry in a hash table. Based on a first fragment not matching any entries in the hash table, the network system creates a new entry in the hash table, whereby the first fragment is used to generate a key in the hash table. Based on a second fragment matching an entry of the hash table, the network system associates the document with a key of the matching entry in the hash table, whereby the associating comprising updating the hash table with an identifier of the document.

4.

发明申请
Revealing Content Reuse Using Coarse Analysis 有权

公开(公告)号：US20210004583A1

公开(公告)日：2021-01-07

申请号：US16460980

申请日：2019-07-02

申请人： Microsoft Technology Licensing, LLC

发明人： Nathan Roy Evans , Christopher Miles White , Jonathan Karl Larson , Darren Keith Edge

IPC分类号： G06K9/00 , G06F17/27 , G06F16/901 , G06F16/906

摘要： Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.