发明授权
- 专利标题: Augmenting a training set for document categorization
- 专利标题(中): 增加文件分类培训
-
申请号: US12254798申请日: 2008-10-20
-
公开(公告)号: US09058382B2公开(公告)日: 2015-06-16
- 发明人: Tie-Yan Liu , Wei-Ying Ma
- 申请人: Tie-Yan Liu , Wei-Ying Ma
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人地址: US WA Redmond
- 代理商 Sandy Swain; Judy Yee; Micky Minhas
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
A method and system for augmenting a training set used to train a classifier of documents is provided. The augmentation system augments a training set with training data derived from features of documents based on a document hierarchy. The training data of the initial training set may be derived from the root documents of the hierarchies of documents. The augmentation system generates additional training data that includes an aggregate feature that represents the overall characteristics of a hierarchy of documents, rather than just the root document. After the training data is generated, the augmentation system augments the initial training set with the newly generated training data.
公开/授权文献
- US20090043764A1 AUGMENTING A TRAINING SET FOR DOCUMENT CATEGORIZATION 公开/授权日:2009-02-12
信息查询