-
公开(公告)号:US20220004715A1
公开(公告)日:2022-01-06
申请号:US16918048
申请日:2020-07-01
IPC分类号: G06F40/295 , G06N3/04 , H04L12/58
摘要: Knowledge gaps in a chatbot are identified with reference to a domain-specific document and a set of QA pairs of the chatbot. Entities and/or entity values associated with the document are compared to the entities and/or entity values of the QA pairs. Entities of the document not associated with the QA pairs are identified as knowledge gaps. The QA pairs and knowledge gaps are ranked by relevance to the domain.
-
公开(公告)号:US20240202573A1
公开(公告)日:2024-06-20
申请号:US18067871
申请日:2022-12-19
发明人: Nagarjuna Surabathina , Nitin Gupta , Shramona Chakraborty , Hima Patel , Sameep Mehta , Ramkumar Ramalingam , Matu Agarwal
摘要: A method, computer program product, and computer system for transforming sets of source data having different formats into respective sets of target data having a same format. N source patterns are determined and respectively describe N different formats in which N sets of source data items are formatted, where N≥1. A target format pattern is determined and describes a target format in which a target data items are formatted. N graphs are generated and respectively describe transformations of the N source patterns to the target pattern. Each graph includes multiple transformation paths. Each transformation path transforms the source pattern to the target pattern in a manner that maps source strings in the source pattern to each target string in the target pattern. A single transformation path is selected from the multiple transformation paths resulting in N single transformation paths having been selected.
-
公开(公告)号:US11836219B2
公开(公告)日:2023-12-05
申请号:US17518156
申请日:2021-11-03
IPC分类号: G06F18/2115 , G06F18/211 , G06N20/00 , G06N3/08 , G06F18/214 , G06F18/21 , G06F18/2431
CPC分类号: G06F18/214 , G06F18/211 , G06F18/2115 , G06F18/2178 , G06F18/2431 , G06N3/08 , G06N20/00
摘要: One embodiment provides a method, including: receiving a sample set for training a machine-learning model, wherein the sample set includes a plurality of classes, wherein classes within the plurality of classes have an imbalance in a number of samples; creating an enlarged minority class by generating new samples from the samples within the minority class and adding the new samples to the minority class; selecting subset samples from both the samples within the enlarged minority class and the majority class; weighting each of the subset samples based upon user input defining goals for attributes of a training sample set to be used in training the machine-learning model; and generating, using the neural network, the training sample set by re-running the selecting in view of the weighting.
-
公开(公告)号:US11966453B2
公开(公告)日:2024-04-23
申请号:US17175896
申请日:2021-02-15
发明人: Naveen Panwar , Anush Sankaran , Kuntal Dey , Hima Patel , Sameep Mehta
IPC分类号: G06K9/00 , G06F18/2113 , G06F18/214 , G06N20/00
CPC分类号: G06F18/2148 , G06F18/2113 , G06F18/2155 , G06N20/00
摘要: Embodiments are disclosed for a method. The method includes receiving an annotation set for a machine learning model. The annotation set includes multiple data points relevant to a task for the machine learning model. The method also includes determining total weights corresponding to the data points. The total weights are determined based on multiple ordering constraints indicating multiple data classes and corresponding weights. The corresponding weights represent a relative priority of the data classes with respect to each other. The method further includes generating an ordered annotation set from the annotation set. The ordered annotation set includes the data points in a sequence based on the determined total weights.
-
公开(公告)号:US20230177113A1
公开(公告)日:2023-06-08
申请号:US17540660
申请日:2021-12-02
CPC分类号: G06K9/6256 , G06K9/628 , G06K9/6215 , G06N20/00
摘要: Methods, systems, and computer program products for privacy-preserving class label standardization in federated learning settings are provided herein. A computer-implemented method includes determining, using one or more data privacy-preserving techniques, a signature for each of one or more classes of data for each of multiple client devices within a federated learning environment; identifying one or more signature matches across at least a portion of the multiple client devices; generating one or more class labels for the one or more classes of data associated with the one or more signature matches; labeling, across the at least a portion of the multiple client devices, the one or more classes of data associated with the one or more signature matches with the one or more generated class labels; and performing one or more automated actions based at least in part on the one or more labeled classes of data.
-
公开(公告)号:US20220405631A1
公开(公告)日:2022-12-22
申请号:US17353978
申请日:2021-06-22
摘要: Techniques for qualitatively assessing unlabeled data in an unsupervised machine learning environment are disclosed. In one example, a method comprises the following steps. A dataset of unlabeled data points is converted into a graph structure. Nodes of the graph structure represent the unlabeled data points in the dataset and weighted edges between at least a portion of the nodes represent similarity between the unlabeled data points represented by the nodes. A metric is computed for each node of the graph structure. A value generated by the metric for a given node represents a measure of dissimilarity between the corresponding unlabeled data point of the given node and one or more other unlabeled data points of one or more other nodes. A subset of the dataset is generated by removing one or more unlabeled data points from the dataset based on one or more values of the computed metric.
-
公开(公告)号:US20220164698A1
公开(公告)日:2022-05-26
申请号:US17104642
申请日:2020-11-25
发明人: Arunima Chaudhary , Dakuo Wang , Abel Valente , Carolina Maria Spina , Hima Patel , Nitin Gupta , Gregory Bramble , Horst Cornelius Samulowitz , Sameep Mehta , Theodoros Salonidis , Daniel M. Gruen , Chaung Gan
摘要: A method to automatically assess data quality of data input into a machine learning model and remediate the data includes receiving input data for an automated machine learning model. Selections for a multiple data quality metrics are displayed. A selection for data quality metrics is received. The data quality metrics are determined according to the selection. Selections for data remediation strategies based on the selection of the data quality metrics are displayed. A selection for remediation recommendation strategies is received. The selected data remediation strategies are performed on the input data. Learning from the selection of the data quality metrics and the selection for the remediation strategies is performed. A new customized machine learning model is generated based on the learning.
-
公开(公告)号:US11928126B1
公开(公告)日:2024-03-12
申请号:US17821309
申请日:2022-08-22
CPC分类号: G06F16/258 , G06F16/86
摘要: A computer implemented method transforms data. Responsive to receiving a data transformation of an input string to an output string, a computer system identifies mappable tokens in the input string that are mappable to the output string. The computer system creates a set of initial mappings for a set of common tokens in the mappable tokens. The set of initial mappings maps the set of common tokens from the input string to the output string. The computer system creates a set of user mappings that maps the mappable tokens from input string to the output string using a user input to the set of initial mappings. The computer system generates program code that transform input strings to output strings using the set of user mappings that maps the mappable tokens from input string to the output string, wherein the program code is used to transform input strings to output strings.
-
9.
公开(公告)号:US20230274160A1
公开(公告)日:2023-08-31
申请号:US17681984
申请日:2022-02-28
发明人: Shashank Mujumdar , Hima Patel , Sambaran Bandyopadhyay , Pooja Aggarwal , Anbang Xu , Hau-Wen Chang , Harshit Kumar , Katherine Guo , Rama Kalyani T. Akkiraju , Gargi B. Dasgupta
IPC分类号: G06N5/02
CPC分类号: G06N5/022
摘要: Methods, systems, and computer program products for automatically detecting periods of normal activity by analyzing observability data in IT operations environments are provided herein. A computer-implemented method includes obtaining multiple types of data related to one or more artificial intelligence-related information technology operations; modelling at least a portion of the obtained data as time series data; automatically identifying, from the time series data, one or more time periods associated with one or more given levels of data activity; and performing one or more automated actions, in at least one artificial intelligence-related information technology operations environment, based at least in part on the data corresponding to the one or more identified time periods.
-
公开(公告)号:US20230169070A1
公开(公告)日:2023-06-01
申请号:US17456854
申请日:2021-11-29
发明人: Ramkumar Ramalingam , Nagarjuna Surabathina , Thanmayi Mruthyunjaya , Nitin Gupta , Pranay Kumar Lohia , Shanmukha Chaitanya Guttula , Hima Patel , Sameep Mehta , Matu Agarwal , Mudit Mehrotra
IPC分类号: G06F16/242 , G06F16/25
CPC分类号: G06F16/242 , G06F16/258
摘要: A computer implemented method, computer system, and computer program product for transforming mapped data fields of enterprise applications. A number of processor units receiving a matching from a source data field to a target data field. The set of processor units receiving a number of annotated examples of transformations from a source format to a target format. Based on the annotated examples, the set of processor units autogenerating a query language expression for transforming data items from the source format to the target format.
-
-
-
-
-
-
-
-
-