Method and apparatus for processing dataset

Invention Grant

US11663258B2 Method and apparatus for processing dataset 有权

Please log in to see more content

Patent Title: Method and apparatus for processing dataset
Application No.: US17133869

Application Date: 2020-12-24
Publication No.: US11663258B2

Publication Date: 2023-05-30
Inventor: Zhe Hu , Cheng Peng , Xuefeng Luo
Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO LTD
Applicant Address: CN Beijing
Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
Current Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
Current Assignee Address: CN Beijing
Agency: Lippes Mathias LLP
Priority: CN 2010430339.0 2020.05.20
Main IPC: G06F16/35
IPC: G06F16/35 ; G06F16/242 ; G06F16/22 ; G06F16/2455 ; G06V30/414 ; G06F18/214

Method and apparatus for processing dataset

Abstract:

The present disclosure discloses a method and apparatus for processing a dataset. The method includes: obtaining a first text set meeting a preset similarity matching condition with a target text from multiple text blocks provided by a target user; obtaining a second text set from the first text set, in which each text in the second text set does not belong to a same text block as the target text; generating a negative sample set of the target text based on content of a candidate text block to which each text in the second text set belongs; generating a positive sample set of the target text based on content of a target text block to which the target text belongs; and generating a dataset of the target user based on the negative sample set and the positive sample set, and training a matching model based on the dataset.

Public/Granted literature

US20210365444A1 METHOD AND APPARATUS FOR PROCESSING DATASET Public/Granted day:2021-11-25

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）
G06F16/35	..••聚类；分类