-
1.
公开(公告)号:EP4369245A1
公开(公告)日:2024-05-15
申请号:EP23201621.2
申请日:2023-10-04
IPC分类号: G06F40/295 , G06F40/205
CPC分类号: G06F40/295 , G06F40/205
摘要: Pre-trained models for Named Entity Recognition (NER) come with static NE classes, limited in number, and remain same irrespective of domain of the input text. Thus, domain specific training is required. Embodiments of the present disclosure provide a method and system for enhanced NER using a custom-built REGEX matcher and a heuristic entity ruler. The invention helps in discovering the NE's of the given text with pipeline-based approach with combination of models of NLP transformer, custom-built REGEX, and heuristic entity rules. The method automatically handles class resolution based on the heuristic entity ruler. The method enables a user to customize or add any new heuristic rules for entity ruler or custom regex as a knowledgebase to train the model with automatic relearning and unlearning. The extracted NEs are provided for further processing or masking in a structured format.
-
公开(公告)号:EP4250131A1
公开(公告)日:2023-09-27
申请号:EP23160085.9
申请日:2023-03-06
发明人: TAKAWANE, ADESH RAMDAS , PATWARDHAN, NIKHIL GIRISH , KANDASAMY, KALAISELVAN , ROY, ASHIM , KULKARNI, RUPALI KEDAR
IPC分类号: G06F16/28
摘要: Unique key fields in structured data is a critical characteristic of data which plays significant role in data management. Profiling high volume of data for discovering all possible unique keys with high accuracy is a costly and time taking affair. A method and system for finding one or more unique entities in a data have been provided. The one or more unique entities obtained by this approach are complete and the response time is quick. The method is scalable to increasing volume of data and number of fields. The system is configured to perform the analysis process in multiple phases taking the less volume initially and increasing it gradually, thereby reducing the load on later phases as the unique results found with light volume phases. The method also comprises a time check mechanism after different stages if a user wants to do discovery for limited time.
-