METHOD, DEVICE, EQUIPMENT, AND STORAGE MEDIUM FOR MINING TOPIC CONCEPT

    公开(公告)号:EP3896594A1

    公开(公告)日:2021-10-20

    申请号:EP20201845.3

    申请日:2020-10-14

    IPC分类号: G06F40/279 G06F40/30

    摘要: The present disclosure provides a method, a device, an equipment and a storage medium for mining a topic concept. The method includes: acquiring a plurality of candidate topic concepts based on a query; performing word segmentation on the plurality of candidate topic concepts and performing part-of-speech tagging on words obtained after performing the word segmentation, to obtain a part-of-speech sequence of each of the plurality of candidate topic concepts; and filtering the plurality of candidate topic concepts based on the part-of-speech sequence, to filter out a topic concept corresponding to a target part-of-speech sequence among the plurality of candidate topic concepts, in which a proportion of accurate topic concepts in the target part-of-speech sequence is lower than or equal to a first preset threshold, or a proportion of inaccurate topic concepts in the target part-of-speech sequence is higher than or equal to a second preset threshold. The present disclosure can reduce the labor cost required for mining the topic concept.

    THEME CLASSIFICATION METHOD AND APPARATUS BASED ON MULTIMODALITY, AND STORAGE MEDIUM

    公开(公告)号:EP3866026A1

    公开(公告)日:2021-08-18

    申请号:EP20202345.3

    申请日:2020-10-16

    IPC分类号: G06F16/45 G06F16/75

    摘要: Embodiments of the present disclosure relate to a theme classification method based on multimodality, a device and a storage medium, more particularly to a field of a knowledge map. The method includes obtaining text information and non-text information of an object to be classified. The non-text information includes at least one of visual information and audio information. The method also includes determining an entity set of the text information based on a pre-established knowledge base, and then extracting a text feature of the object based on the text information and the entity set. The method also includes determining a theme classification of the object based on the text feature and a non-text feature of the object.

    TEXT PROCESSING METHOD AND DEVICE BASED ON AMBIGUOUS ENTITY WORDS

    公开(公告)号:EP3514702A1

    公开(公告)日:2019-07-24

    申请号:EP18215238.9

    申请日:2018-12-21

    IPC分类号: G06F17/27 G06N5/02 G06N7/00

    摘要: The present disclosure provides a text processing method and device based on ambiguous entity words. The method includes: obtaining (101) a context of a text to be disambiguated and at least two candidate entities represented by the text to be disambiguated; generating (102) a semantic vector of the context based on a trained word vector model; generating (103) a first entity vector of each of the at least two candidate entities based on a trained unsupervised neural network model; determining (104) a similarity between the context and each candidate entity; and determining (105) a target entity represented by the text to be disambiguated in the context. By the present disclosure, entity information of the text to be disambiguated is completely depicted, and accuracy disambiguation for the text to be disambiguated is improved.

    METHOD AND APPARATUS FOR ACQUIRING PRE-TRAINED MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM

    公开(公告)号:EP4123516A1

    公开(公告)日:2023-01-25

    申请号:EP22184865.8

    申请日:2022-07-14

    IPC分类号: G06N3/08

    摘要: The present disclosure provides a method and apparatus for acquiring a pre-trained model, an electronic device and a storage medium, and relates to the fields such as deep learning, natural language processing, knowledge graph and intelligent voice. The method may include: acquiring a pre-training task set composed of M pre-training tasks, M being a positive integer greater than 1, the pre-training tasks including: N question-answering tasks corresponding to different question-answering forms, N being a positive integer greater than 1 and less than or equal to M; and jointly pre-training the pre-trained model according to the M pre-training tasks. By use of the solutions of the present disclosure, resource consumption may be reduced, and time costs may be saved.

    METHOD AND APPARATUS FOR OUTPUTTING INFORMATION

    公开(公告)号:EP3779730A1

    公开(公告)日:2021-02-17

    申请号:EP20160791.8

    申请日:2020-03-03

    摘要: A method and an apparatus for outputting information are provided according to embodiments of the disclosure. A specific embodiment of the method includes: recognizing a target video, to recognize at least one entity and obtain a confidence degree of each entity, the entity including a main entity and related entities; matching the at least one entity with a pre-stored knowledge base to determine at least one candidate entity; obtaining at least one main entity by expanding the related entities of the at least one candidate entity based on the knowledge base, and obtaining a confidence degree of the obtained main entity; and calculating a confidence level of the obtained main entity based on the confidence degree of each of the related entities of the at least one candidate entity and the confidence degree of the obtained main entity, and outputting the confidence level of the obtained main entity. This embodiment integrates single-modal entity annotation results, and performs multi-modal fusion, to break through the bottleneck of the single-modal entity annotation effects, thereby achieving expansion based on knowledge base, and achieving inference and expansion of fine-grained contents on the existing entity annotation results based on the knowledge base.

    METHOD AND APPARATUS FOR GENERATING TEXT TOPICS, AND ELECTRONIC DEVICE

    公开(公告)号:EP3851975A1

    公开(公告)日:2021-07-21

    申请号:EP21151156.3

    申请日:2021-01-12

    IPC分类号: G06F16/35 G06F16/36

    摘要: A method and an apparatus for generating a text topic, an electronic device, a storage medium, and a computer program product are disclosed. The method includes: obtaining entities included in a text to be processed by mining the entities in the text to be processed; determining each candidate entity in a knowledge graph corresponding to each entity included in the text to be processed through entity links; determining a set of core entities corresponding to the text to be processed by clustering candidate entities; determining each candidate topic included in the text to be processed based on a matching degree between each keyword in the text to be processed and each reference topic in a preset topic graph; and obtaining the topic of the text to be processed from the set of core entities and the candidate topics based on association between each core entity in the set of core entities and the text to be processed as well as association between each candidate topic and the text to be processed.