We estimate nonparametric learning rules using data from dynamic two-armed bandit (probabilistic reversal learning) experiments, supplemented with auxiliary eye-movement measures of subjects’ beliefs. We apply recent econometric developments in the estimation of dynamic models. The direct estimation of learning rules differs from the usual modus operandi of the experimental literature. The estimated choice probabilities and learning rules from our nonparametric models have some distinctive features; notably that subjects tend to update in a non-smooth manner following positive ‘exploitative’ choices (those made in accordance with current beliefs). Simulation results show how the estimated nonparametric learning rules fit aspects of subjects’ observed choice sequences better than alternative parameterized learning rules from Bayesian and reinforcement learning models.