Method and apparatus for processing dataset
Abstract:
The present disclosure discloses a method and apparatus for processing a dataset. The method includes: obtaining a first text set meeting a preset similarity matching condition with a target text from multiple text blocks provided by a target user; obtaining a second text set from the first text set, in which each text in the second text set does not belong to a same text block as the target text; generating a negative sample set of the target text based on content of a candidate text block to which each text in the second text set belongs; generating a positive sample set of the target text based on content of a target text block to which the target text belongs; and generating a dataset of the target user based on the negative sample set and the positive sample set, and training a matching model based on the dataset.
Public/Granted literature
Information query
Patent Agency Ranking
0/0