Semi-supervised hybrid clustering/classification system

    公开(公告)号:US11023710B2

    公开(公告)日:2021-06-01

    申请号:US16280760

    申请日:2019-02-20

    摘要: System and method for classifying data objects occurring in an unstructured dataset, comprising: extracting feature vectors from the unstructured dataset, each feature vector representing an occurrence of a data object in the unstructured dataset; classifying the feature vectors into feature vector sets that each correspond to a respective object class from a plurality of object classes; for each feature vector set: performing multiple iterations of a clustering operation, each iteration including clustering feature vectors from the feature vector set into clusters of similar feature vectors and identifying outlier feature vectors, wherein for at least one iteration after a first iteration of the clustering operation, outlier feature vectors identified in a previous iteration are excluded from the clustering operation; and outputting a key cluster for the feature vector set from a final iteration of the multiple iterations, the key cluster including a greater number of similar feature vectors than any of the other clusters of the final iteration; and assembling a dataset that includes at least the feature vectors from the key clusters of the feature vector sets.

    SEMI-SUPERVISED HYBRID CLUSTERING/CLASSIFICATION SYSTEM

    公开(公告)号:US20200265218A1

    公开(公告)日:2020-08-20

    申请号:US16280760

    申请日:2019-02-20

    摘要: System and method for classifying data objects occurring in an unstructured dataset, comprising: extracting feature vectors from the unstructured dataset, each feature vector representing an occurrence of a data object in the unstructured dataset; classifying the feature vectors into feature vector sets that each correspond to a respective object class from a plurality of object classes; for each feature vector set: performing multiple iterations of a clustering operation, each iteration including clustering feature vectors from the feature vector set into clusters of similar feature vectors and identifying outlier feature vectors, wherein for at least one iteration after a first iteration of the clustering operation, outlier feature vectors identified in a previous iteration are excluded from the clustering operation; and outputting a key cluster for the feature vector set from a final iteration of the multiple iterations, the key cluster including a greater number of similar feature vectors than any of the other clusters of the final iteration; and assembling a dataset that includes at least the feature vectors from the key clusters of the feature vector sets.

    Method and system for video segmentation

    公开(公告)号:US10963702B1

    公开(公告)日:2021-03-30

    申请号:US16566179

    申请日:2019-09-10

    摘要: Methods and systems for video segmentation and scene recognition are described. A video having a plurality of frames and a subtitle file associated with the video are received. Segmentation is performed on the video to generate a first set video frames comprising one or more video frames based on a frame-by-frame comparison of features in the frames of the video. Each video frame in the first includes a frame indicator which indicates at least a first start frame of the video frame. The subtitle file associated with the video is parsed to generate one or more subtitle segments based on a start and an end time of each dialogue in the subtitle file. A second set of video frames comprising one or more second video frames are generated based on the video frames of the first set of video frames and the e or more subtitle segments.