SYSTEM, METHOD AND APPARATUS FOR AUTOMATIC TOPIC RELEVANT CONTENT FILTERING FROM SOCIAL MEDIA TEXT STREAMS USING WEAK SUPERVISION
    1.
    发明申请
    SYSTEM, METHOD AND APPARATUS FOR AUTOMATIC TOPIC RELEVANT CONTENT FILTERING FROM SOCIAL MEDIA TEXT STREAMS USING WEAK SUPERVISION 审中-公开
    自动主题相关内容的系统,方法和装置使用弱监督从社会媒体文本流中过滤

    公开(公告)号:US20160117400A1

    公开(公告)日:2016-04-28

    申请号:US14877970

    申请日:2015-10-08

    Abstract: Presented are a system, method, and apparatus for automatic topic relevant content filtering from social media text streams using weak supervision. A computing device utilizes heuristic rules allowing topic filtering and a data stream data chunk identifier. A plurality of messages are transmitted as streaming message data from a social media network in real-time. The messages are split into a plurality of data stream data chunks according to the data stream data chunk identifier. A rule-based labeled data set L0 is built from one or more data instances in the first stream data chunk. An initial classifier is built based upon features of L0. The initial classifier is applied to a next data stream data chunk to build a labeled data set L1. A subset of representative instances S1 is selected from labeled data set L1. A first representative classifier C1 is constructed from representative instance S1.

    Abstract translation: 提出了一种使用弱势监控从社交媒体文本流自动主题相关内容过滤的系统,方法和装置。 计算设备利用允许主题过滤和数据流数据块标识符的启发式规则。 多个消息作为来自社交媒体网络的流消息数据被实时地发送。 消息根据数据流数据块标识符被分割成多个数据流数据块。 基于规则的标记数据集L0由第一流数据块中的一个或多个数据实例构建。 基于L0的特征构建了初始分类器。 将初始分类器应用于下一个数据流数据块以构建标记数据集L1。 从标记数据集L1中选择代表性实例S1的子集。 第一代表性分类器C1由代表性实例S1构成。

Patent Agency Ranking