-
公开(公告)号:US11966699B2
公开(公告)日:2024-04-23
申请号:US17350116
申请日:2021-06-17
Applicant: International Business Machines Corporation
Inventor: Abhishek Shah , Ladislav Kunc , Haode Qi , Lin Pan , Saloni Potdar
IPC: G06F40/30 , G06F16/33 , G06F16/35 , G06F40/284 , G06N5/04 , G06N20/00 , G10L15/18 , G06F40/263 , G06F40/279 , G06F40/295 , G06F40/53
CPC classification number: G06F40/284 , G06F16/3344 , G06F16/355 , G06N5/04 , G06N20/00 , G10L15/1822 , G06F40/263 , G06F40/279 , G06F40/295 , G06F40/53
Abstract: A system for classifying a language sample intent by receiving a language sample including a set of features, identifying language sample features, determining a tokenization score for the language sample according to the language sample features, eliminating duplicate features according to the tokenization score, determining a term frequency (tf) according to the identified features and the tokenization score, determining an inverse document frequency (idf) according to the identified features and the tokenization score, and generating a term frequency-inverse document frequency (tf-idf) matrix for the identified features.
-
公开(公告)号:US11106875B2
公开(公告)日:2021-08-31
申请号:US16417459
申请日:2019-05-20
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Tin Kam Ho , Abhishek Shah , Neil Mallinar , Rajendra G. Ugrani , Ayush Gupta
Abstract: Evaluating intent authoring processes, by a processor in a computing environment. Results are received of a simulated intent labeling effort of a dataset comprising utterances of interactive dialog sessions between agents and clients for a given product or service. Figures of merits for respective algorithms used to perform the simulated intent labeling effort are computed. Each of the respective algorithms are evaluated according to the computed figures of merits; and one of the respective algorithms is implemented for labeling intents of a remaining corpus of the synthesized dataset according to parameters evaluated in the computed figures of merits.
-
公开(公告)号:US11144727B2
公开(公告)日:2021-10-12
申请号:US16417444
申请日:2019-05-20
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Tin Kam Ho , Abhishek Shah , Neil Mallinar , Rajendra G. Ugrani , Ayush Gupta
IPC: G06F40/30 , G06F16/61 , G10L15/08 , G06F16/332 , G10L19/04
Abstract: Evaluating intent authoring processes, by a processor in a computing environment. A dataset comprising utterances of interactive dialog sessions between agents and clients for a given product or service is received. A classification of at least a portion of the utterances is performed for a target intent according to at least one of a plurality of recommendation algorithms, where the classification is performed by an automatic driver invoking the recommendation algorithm and simulating a manual confirmation of the algorithm's decision by a user. A classifier trained with the utterances recommended and confirmed by the automatic driver is automatically evaluated according to at least one of the plurality of evaluation criteria. A report tracking the evaluation results is generated.
-
公开(公告)号:US11748393B2
公开(公告)日:2023-09-05
申请号:US16203000
申请日:2018-11-28
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Abhishek Shah , Tin Kam Ho
IPC: G06K9/62 , G06F16/35 , G06F18/214 , G06F18/21 , G06F18/23213 , G06V30/262 , G06N20/00
CPC classification number: G06F16/35 , G06F18/217 , G06F18/2148 , G06F18/23213 , G06V30/274 , G06N20/00
Abstract: Embodiments for creating compact example subsets for intent classification in a conversational system are provided. A set of content used for training an intent classifier is received from a conversational corpus. Entries within the set of content are separated into a first subset and a second subset, and a cross-validation operation is performed on the first and second subsets to identify a correctly labeled portion and an incorrectly labeled portion of the set of content. A reduced content used for performing a final training of the intent classifier is formed by combining a first number of the entries from the correctly labeled portion and a second number of the entries from the incorrectly labeled portion of the set of content.
-
公开(公告)号:US11568856B2
公开(公告)日:2023-01-31
申请号:US16949232
申请日:2020-10-21
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Tin Kam Ho , Robert L. Yates , Blake McGregor , Rajendra G. Ugrani , Neil R. Mallinar , Abhishek Shah , Ayush Gupta
Abstract: A combination of propagation operations and learning algorithms is applied, using a selected set of labeled conversational logs retrieved from a subset of a plurality of conversational logs, to a remaining corpus of the plurality of conversational logs to train an automated response system according to an intent associated with each of the conversational logs. The combination of propagation operations and learning algorithms may include defining the labels by a user for the selected set of the subset of the plurality of conversational logs; training a probabilistic classifier using the defined labels of features of the selected set, wherein the probabilistic classifier produces labeling decisions for the subset of conversational logs; weighting the features of the selected set in a model optimization process; and/or training an additional classifier using the weighted features of the selected set and applying the additional classifier to the remaining corpus.
-
公开(公告)号:US20220405472A1
公开(公告)日:2022-12-22
申请号:US17350116
申请日:2021-06-17
Applicant: International Business Machines Corporation
Inventor: Abhishek Shah , Ladislav Kunc , Haode Qi , LIN PAN , Saloni Potdar
IPC: G06F40/284 , G06F16/33 , G06F16/35 , G06N20/00 , G06N5/04
Abstract: A system for classifying a language sample intent by receiving a language sample including a set of features, identifying language sample features, determining a tokenization score for the language sample according to the language sample features, eliminating duplicate features according to the tokenization score, determining a term frequency (tf) according to the identified features and the tokenization score, determining an inverse document frequency (idf) according to the identified features and the tokenization score, and generating a term frequency-inverse document frequency (tf-idf) matrix for the identified features.
-
公开(公告)号:US10977443B2
公开(公告)日:2021-04-13
申请号:US16180902
申请日:2018-11-05
Applicant: International Business Machines Corporation
Inventor: Abhishek Shah , Tin Kam Ho
Abstract: Embodiments provide for class balancing for intent authoring using search via: receiving a positive example of an utterance associated with an intent, building an in-intent pool of utterances from a conversation log using the positive example in a first search query of the conversation log; adding the in-intent pool of utterances as a positive class to a training dataset; applying Boolean operators to negate the positive example to form a complement example; building an out-intent pool of utterances from the conversation log using the complement example in a first search query of the conversation log; and adding the out-intent pool of utterances as a complement class to the training dataset. The training dataset may be balanced to include a predefined ratio of positive and complement examples. The training dataset may be used to train or retrain an intent classifier.
-
公开(公告)号:US11853712B2
公开(公告)日:2023-12-26
申请号:US17303728
申请日:2021-06-07
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Haode Qi , Lin Pan , Abhishek Shah , Ladislav Kunc , Saloni Potdar
Abstract: A method, computer system, and computer program product for multi-lingual chatlog training are provided. The embodiment may include receiving, by a processor, a plurality of data related to conversational data in multiple languages. The embodiment may also include assigning an intent label to each conversational data. The embodiment may further include assigning a language label to each conversational data. The embodiment may also include paring the plurality of the data related to the conversational data according to the intent label and the language label. The embodiment may further include training a machine learning model using a multi-lingual and multi-intent conversational data pairing. The embodiment may also include training the machine learning model using a single language and multi-intent conversational data paring.
-
公开(公告)号:US20220391600A1
公开(公告)日:2022-12-08
申请号:US17303728
申请日:2021-06-07
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Haode Qi , LIN PAN , Abhishek Shah , Ladislav Kunc , Saloni Potdar
Abstract: A method, computer system, and computer program product for multi-lingual chatlog training are provided. The embodiment may include receiving, by a processor, a plurality of data related to conversational data in multiple languages. The embodiment may also include assigning an intent label to each conversational data. The embodiment may further include assigning a language label to each conversational data. The embodiment may also include paring the plurality of the data related to the conversational data according to the intent label and the language label. The embodiment may further include training a machine learning model using a multi-lingual and multi-intent conversational data pairing. The embodiment may also include training the machine learning model using a single language and multi-intent conversational data paring.
-
公开(公告)号:US11494802B2
公开(公告)日:2022-11-08
申请号:US16742819
申请日:2020-01-14
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Abhishek Shah , Ananya Aniruddha Poddar , Inkit Padhi , Nishtha Madaan , Sameep Mehta , Kuntal Dey
Abstract: A service receives a persuasion-based input comprising a text and one or more marketing objectives to persuade a desired response. The service evaluates persuasion values of text segments of the text and persuasion transition values consecutively between respective persuasion values of the persuasion values across the text segments. The service generates a desired curve of persuasion factors across the text segments according to the one or more marketing objectives. The service recommends one or more replacement words to replace one or more selected words in text to move a deviation between the persuasion values and transition values in comparison to the desired curve of persuasion factors.
-
-
-
-
-
-
-
-
-