METHOD, APPARATUS, DEVICE AND MEDIUM FOR THE IDENTIFICATION OF CANDIDATE GENES THAT REGULATE THE SHAPE OF BACTERIA

    公开(公告)号:US20240301511A1

    公开(公告)日:2024-09-12

    申请号:US18220800

    申请日:2023-07-11

    摘要: The present application relates to a method, apparatus, device and medium for identifying candidate genes that regulate the shape of bacteria. The method includes: obtaining reference genome data of bacteria and performing protein domain analysis on the reference genome data of bacteria; determining the feature value dataset for each bacterium based on the structural domains of all proteins obtained from the analysis; obtaining shape information of each bacterium; training a bacterial shape prediction model based on the shape information of each bacterium and the feature value dataset, and determining the weights of each protein domain in influencing the shape of the bacterium according to the bacterial prediction model; determining candidate genes that regulate the shape of bacteria based on the weights. This method can be used to rapidly screen out the candidate genes that regulate the shape of bacteria, and establish a new method for mining biofunctional genes.

    METHOD FOR TRAINING VECTOR MODEL AND GENERATING NEGATIVE SAMPLE

    公开(公告)号:US20240265993A1

    公开(公告)日:2024-08-08

    申请号:US18018858

    申请日:2022-01-04

    发明人: Zhenzhong ZHANG

    IPC分类号: G16B5/00 G06N3/08 G16B40/20

    CPC分类号: G16B5/00 G06N3/08 G16B40/20

    摘要: A method for training a vector model, including: obtaining more than one RNA sequence and more than one protein sequence; obtaining more than one first RNA vector by vectorizing the more than one RNA sequence; obtaining more than one first protein vector by vectorizing the more than one protein sequence; determining an interaction between the RNA sequence and the protein sequence according to the first RNA vector and the first protein vector; obtaining a similarity of more than one RNA-RNA pair by calculating a distance between any two RNA sequences; obtaining a similarity of more than one protein-protein pair by calculating a distance between any two protein sequences; training the vector model according to an interaction between the RNA sequence and the protein sequence, the similarity of the RNA-RNA pair and the similarity of the protein-protein pair.