DOC4CODE - AN AI-DRIVEN DOCUMENTATION RECOMMENDER SYSTEM TO AID PROGRAMMERS

    公开(公告)号:US20240345811A1

    公开(公告)日:2024-10-17

    申请号:US18202756

    申请日:2023-05-26

    CPC classification number: G06F8/36 G06F16/955 G06F40/40

    Abstract: Herein for each source logic in a corpus, a computer stores an identifier of the source logic and operates a logic encoder that infers a distinct fixed-size encoded logic that represents the variable-size source logic. At build time, a multidimensional index is generated and populated based on the encoded logics that represent the source logics in the corpus. At runtime, a user may edit and select a new source logic such as in a text editor or an integrated development environment (IDE). The logic encoder infers a new encoded logic that represents the new source logic. The multidimensional index accepts the new encoded logic as a lookup key and automatically selects and returns a result subset of encoded logics that represent similar source logics in the corpus. For display, the multidimensional index may select and return only encoded logics that are the few nearest neighbors to the new encoded logic.

    TRAINING SYNTAX-AWARE LANGUAGE MODELS WITH AST PATH PREDICTION

    公开(公告)号:US20240345815A1

    公开(公告)日:2024-10-17

    申请号:US18202564

    申请日:2023-05-26

    CPC classification number: G06F8/427

    Abstract: In an embodiment, a computer stores and operates a logic encoder that is an artificial neural network that infers a fixed-size encoded logic from textual or tokenized source logic. Without machine learning, a special parser generates a parse tree that represents the source logic and a fixed-size correctly encoded tree that represents the parse tree. For finetuning the logic encoder, an encoded tree generator is an artificial neural network that accepts the fixed-size encoded logic as input and responsively infers a fixed-size incorrectly encoded tree that represents the parse tree. The neural weights of the logic encoder (and optionally of the encoded tree generator) are adjusted based on backpropagation of error (i.e. loss) as a numerically measured difference between the fixed-size incorrectly encoded tree and the fixed-size correctly encoded tree.

    PARTIAL GRAPH PATH PREDICTION AND NEXT TOKEN PREDICTION JOINT TRAINING ALGORITHM FOR GENERATIVE LANGUAGE MODELS

    公开(公告)号:US20250165852A1

    公开(公告)日:2025-05-22

    申请号:US18514391

    申请日:2023-11-20

    Abstract: During pretraining, a computer generates three untrained machine learning models that are a token sequence encoder, a token predictor, and a decoder that infers a frequency distribution of graph traversal paths. A sequence of lexical tokens is generated that represents a lexical text in a training corpus. A graph is generated that represents the lexical text. In the graph, multiple traversal paths are selected that collectively represent a sliding subsequence of the sequence of lexical tokens. From the subsequence, the token sequence encoder infers an encoded sequence that represents the subsequence of the sequence of lexical tokens. The decoder and token predictor accept the encoded sequence as input for respective inferencing for which respective training losses are measured. Both training losses are combined into a combined loss that is used to increase the accuracy of the three machine learning models by, for example, backpropagation of the combined loss.

    CONTEXTUAL RE-RANKING BASED ON CURSOR POSITION FOR DOCUMENTATION RECOMMENDER SYSTEMS

    公开(公告)号:US20250110961A1

    公开(公告)日:2025-04-03

    申请号:US18374209

    申请日:2023-09-28

    Abstract: Here is dynamic and contextual ranking of reference documentation based on an interactively selected position in new source logic. A computer receives a vocabulary of lexical tokens, a sequence of references that contains a first reference to a first reference document before a second reference to a second reference document, respective subsets of the vocabulary that occur in the first and second reference documents, a new source logic that contains a sequence of lexical tokens, respective measurements of semantic distance between the new source logic and the first and second reference documents, and a selected position in the sequence of lexical tokens. Based on the selected position, the measurements of semantic distance are selectively increased. Based on that increasing the measurements of the semantic distance, a relative ordering of the first and second references is reversed to generate and display a reordered sequence of references.

    APPROXIMATE CONFUSION MATRIX FOR MULTI-LABEL CLASSIFICATION

    公开(公告)号:US20250036934A1

    公开(公告)日:2025-01-30

    申请号:US18227758

    申请日:2023-07-28

    Abstract: Herein is validation of a trained classifier based on novel and accelerated estimation of a confusion matrix. In an embodiment, a computer hosts a trained classifier that infers, from many objects, an inferred frequency of each class. An upscaled magnitude of each class is generated from the inferred frequency of the class. An integer of each class is generated from the upscaled magnitude of the class. Based on those integers of the classes and a target integer for each class, counts are generated of the objects that are true positives, false positives, and false negatives of the class. Based on those counts, an estimated total of true positives, false positives, false negatives are generated that characterizes fitness of the trained classifier. In an embodiment, those counts and totals are downscaled to be fractions from zero to one.

    Semi-supervised framework for purpose-oriented anomaly detection

    公开(公告)号:US12143408B2

    公开(公告)日:2024-11-12

    申请号:US17739968

    申请日:2022-05-09

    Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.

Patent Agency Ranking