-
公开(公告)号:US11798029B2
公开(公告)日:2023-10-24
申请号:US16657533
申请日:2019-10-18
Applicant: Microsoft Technology Licensing, LLC
Inventor: Miroslav Dudik , Akshay Krishnamurthy , Maria Dimakopoulou , Yi Su
IPC: G06Q30/02 , G06Q30/0251 , G06N3/088 , G06F17/18 , G06Q30/0242 , G06F18/21 , G06F18/214
CPC classification number: G06Q30/0254 , G06F17/18 , G06F18/2148 , G06F18/2193 , G06N3/088 , G06Q30/0243
Abstract: Off-policy evaluation of a new “target” policy is performed using historical data gathered based on a previous “logging” policy to estimate the performance of the target policy. An estimator may be used, wherein either a quality-based estimator or a quality-agnostic estimator is used to weight the difference between an observed reward in the historical data and an estimated reward generated by the target policy. A quality-agnostic estimator may be used to evaluate an importance weight according to a threshold. In such examples, when the importance weight exceeds the threshold, the quality-agnostic estimator clips the importance weight at the threshold, thereby providing an fixed upper bound irrespective of the quality of the reward predictor. In other examples, a quality-based estimator is used, in which an upper bound incorporates the quality of the reward predictor in order to modify an importance weight used by the estimator.
-
公开(公告)号:US20170308535A1
公开(公告)日:2017-10-26
申请号:US15136688
申请日:2016-04-22
Applicant: Microsoft Technology Licensing, LLC
Inventor: Alekh Agarwal , Miroslav Dudik , Akshay Krishnamurthy , John Langford , Adith Swaminathan
CPC classification number: G06F16/24578 , G06F16/248 , G06F16/3326 , G06F16/337 , G06F16/9535 , G06N7/005
Abstract: A computing device can determine a decomposition of data of actions of a first session based at least in part on a first computational model associating the actions of the first session with corresponding state values of the first session. The computing device can determine a second computational model based at least in part on the decomposition and an operation template. The computing device can receive a query via the communications interface, the query associated with the second session. The computing device can determine a state value of the second session based at least in part on the query. The computing device can operate the second computational model to determine at least one response associated with the query based at least in part on the state value of the second session. The computing device can provide an indication of the at least one response via the communications interface.
-