Outlier detection for streaming data using locality sensitive hashing

    公开(公告)号:US10778707B1

    公开(公告)日:2020-09-15

    申请号:US15153712

    申请日:2016-05-12

    Abstract: A matching record set with respect to a particular data record of a stream is identified based on output values produced by a particular band of locality sensitive hash functions. Using respective matching record sets corresponding to the particular data record and one or more other bands of locality sensitive hash functions, an estimate of a count of data records of the stream which meet a particular inter-record distance criterion is obtained. A determination as to whether the particular data record is to be designated as an outlier with respect to previously-observed records of the data stream is made using the estimated count.

    Scalable product influence prediction using feature smoothing

    公开(公告)号:US11556945B1

    公开(公告)日:2023-01-17

    申请号:US15714816

    申请日:2017-09-25

    Abstract: Systems and methods are disclosed to implement an item metric prediction system that predicts a metric for an item using a feature-based model built using other similar items. In embodiments, the system is used to predict item influence values (IIVs) of items indicating an expected amount of subsequent transactions that is caused by an initial transaction of the items. In embodiments, a sample of item transaction data is distributed to a plurality of task nodes, which execute in parallel to determine the items' observed IIVs from the transaction data. Subsequently, a new IIV is determined for an item whose observed IIV has a low confidence level. A set of similar items is selected, and a set of parameters of a feature-based model are tuned to fit the model to the observed IIVs of the similar items. A new IIV having a high confidence level is then obtained using the model.

    Model-based artificial intelligence data mining system for dimension estimation

    公开(公告)号:US10963812B1

    公开(公告)日:2021-03-30

    申请号:US15462556

    申请日:2017-03-17

    Abstract: Some aspects of the present disclosure relate to computer processes for generating and training a generative machine learning model to estimate the true sizes of items and users of an electronic catalog and subsequently applied to determine fit recommendations, as well as confidence values for the fit recommendations, for how a particular item may fit a particular user. During training, the disclosed generative model can implement Bayesian statistical inference to calculate estimated true sizes of both items and users of an electronic catalog using both (1) a prior distribution of sizes for items and users and (2) a distribution based on obtained evidence regarding how items actually fit users. The resulting posterior distribution can be approximated using a proposal distribution used to generate the fit recommendations and associated confidence values.

    Persona based data mining system
    5.
    发明授权

    公开(公告)号:US10157351B1

    公开(公告)日:2018-12-18

    申请号:US14918444

    申请日:2015-10-20

    Abstract: Data mining systems and methods are disclosed for associating users with items based on underlying personas. The system associates each user account with one or more underlying personas that contribute to the user's interactions with different items, and predicts an active persona for a user based on the user's recent interactions with items and make item related recommendations that are oriented to the active persona. Thus, for example, even though multiple individuals may share a computer and/or account, the content (e.g., item recommendations) presented during a browsing session may be based primarily or exclusively on the past browsing behaviors of the particular individual conducting the browsing session.

    Normalizing text attributes for machine learning models

    公开(公告)号:US11915104B2

    公开(公告)日:2024-02-27

    申请号:US16672243

    申请日:2019-11-01

    CPC classification number: G06N20/00 G06F16/35 G06N5/04

    Abstract: Respective correlation metrics between token groups of a particular text attribute of a data set and a prediction target attribute are computed. Based on the correlation metrics, a predictive token group list is created. For various observation records of the data set, values of a derived categorical attribute corresponding to the particular text attribute are determined based on matches between the particular text attribute value and the predictive token group list. A measure of the predictive utility of the particular text attribute is obtained using correlations between the categorical attribute and the prediction target attribute.

    Normalizing text attributes for machine learning models

    公开(公告)号:US10467547B1

    公开(公告)日:2019-11-05

    申请号:US14935426

    申请日:2015-11-08

    Abstract: Respective correlation metrics between token groups of a particular text attribute of a data set and a prediction target attribute are computed. Based on the correlation metrics, a predictive token group list is created. For various observation records of the data set, values of a derived categorical attribute corresponding to the particular text attribute are determined based on matches between the particular text attribute value and the predictive token group list. A measure of the predictive utility of the particular text attribute is obtained using correlations between the categorical attribute and the prediction target attribute.

    Platform services to enable one-click execution of the end-to-end sequence of modeling steps

    公开(公告)号:US10380498B1

    公开(公告)日:2019-08-13

    申请号:US14720474

    申请日:2015-05-22

    Abstract: This disclosure is directed to the automated generation of Machine Learning (ML) models. The system receives a user directive containing one or more requirements for building the ML model. The system further identifies common requirements between the user directive and one or more prior user directives and associates characteristics of the prior user directive, or model generated therefrom, with the user directive. The system further associates performance values generated by continuous monitoring of deployed ML models to individual characteristics of the user directive used to generate each of the deployed ML models. The system continuously improves model generation efficiency, model performance, and first run performance of individual ML models by learning from the improvements made to one or more prior ML models having similar characteristics.

    NORMALIZING TEXT ATTRIBUTES FOR MACHINE LEARNING MODELS

    公开(公告)号:US20240185130A1

    公开(公告)日:2024-06-06

    申请号:US18416755

    申请日:2024-01-18

    CPC classification number: G06N20/00 G06F16/35 G06N5/04

    Abstract: Respective correlation metrics between token groups of a particular text attribute of a data set and a prediction target attribute are computed. Based on the correlation metrics, a predictive token group list is created. For various observation records of the data set, values of a derived categorical attribute corresponding to the particular text attribute are determined based on matches between the particular text attribute value and the predictive token group list. A measure of the predictive utility of the particular text attribute is obtained using correlations between the categorical attribute and the prediction target attribute.

Patent Agency Ranking