METHOD FOR AUTOMATIC THEMATIC CLASSIFICATION OF A DIGITAL TEXT FILE
    1.
    发明申请
    METHOD FOR AUTOMATIC THEMATIC CLASSIFICATION OF A DIGITAL TEXT FILE 审中-公开
    一种数字文本文件自动分类的方法

    公开(公告)号:US20160140220A1

    公开(公告)日:2016-05-19

    申请号:US14898141

    申请日:2014-06-04

    申请人: PROXEM

    IPC分类号: G06F17/30

    摘要: A thematic classification method for a digital text file from an encyclopedic database comprising a category graph. A thematic classification model is developed during a learning phase. For each category node, all articles directly linked to the category node is grouped to obtain, for each category node, a “bag of words.” A term-frequency vector characteristic of the category node is determined. At each category node the term-frequency vector, directly connected thereto, with term-frequency vectors of more specific nodes are combined. During the production phase, the term-frequency vector of the digital text file is calculated. N category nodes in the thematic classification model having the closest term-frequency vectors to the term-frequency of the digital text file are selected.

    摘要翻译: 用于包含类别图的百科全书数据库的数字文本文件的专题分类方法。 在学习阶段开发了专题分类模型。 对于每个类别节点,直接链接到类别节点的所有文章被分组,以为每个类别节点获得“一袋子”。确定类别节点的术语频率矢量特征。 在每个类别节点处,与直接连接到其上的术语频率矢量与更特定节点的术语频率矢量相组合。 在生产阶段,计算数字文本文件的术语频率矢量。 选择具有与数字文本文件的术语频率最接近的项目频率向量的专题分类模型中的N个类别节点。

    Method for automatically constructing inter-language queries for a search engine

    公开(公告)号:US11055370B2

    公开(公告)日:2021-07-06

    申请号:US15757649

    申请日:2016-09-06

    申请人: PROXEM

    摘要: A method for automatically constructing inter-language queries performed by a search engine from a text file containing a learning corpus. The learning corpus includes a set of phrases expressed in a corresponding manner in at least two languages. Each word of each of the two languages being associated with a target vector. The target vectors of the words of the learning corpus in at least two languages aligned. N words in each of the at least two languages having the closest target vectors with respect to a target vector associated with a query word are retrieved. The queries to be performed by the search engine from the N preceding words recovered in the at least two languages are established.

    METHOD FOR AUTOMATICALLY CONSTRUCTING INTER-LANGUAGE QUERIES FOR A SEARCH ENGINE

    公开(公告)号:US20190026371A1

    公开(公告)日:2019-01-24

    申请号:US15757649

    申请日:2016-09-06

    申请人: PROXEM

    IPC分类号: G06F17/30 G06F17/27

    摘要: A method for automatically constructing inter-language queries performed by a search engine from a text file containing a learning corpus. The learning corpus includes a set of phrases expressed in a corresponding manner in at least two languages. Each word of each of the two languages being associated with a target vector. The target vectors of the words of the learning corpus in at least two languages aligned. N words in each of the at least two languages having the closest target vectors with respect to a target vector associated with a query word are retrieved. The queries to be performed by the search engine from the N preceding words recovered in the at least two languages are established.