Dynamic treatment regimes are treatment allocations tailored to heterogeneous individuals. The optimal dynamic treatment regime is a regime that maximizes counter-factual welfare. We introduce a framework in which we can partially learn the optimal dynamic regime from observational data, relaxing the sequential randomization assumption commonly employed in the literature but instead using (binary) instrumental variables. We propose the notion of sharp partial ordering of counterfactual welfares with respect to dynamic regimes and establish mapping from data to partial ordering via a set of linear programs. We then characterize the identified set of the optimal regime as the set of maximal elements associated with the partial ordering. One main contribution of this paper is that we develop simple analytical conditions to establish the ordering, which bypass solving a large number of large-scale linear programs, and thus facilitate estimation and inference. This paper’s analytical framework has broader applicability beyond the current context, e.g., in establishing signs of various treatment effects and rankings of policies across different counterfactual scenarios.