Reinforcement learning shows promise for guiding sequential clinical decisions, yet common data preprocessing introduces temporal misalignment that violates causal assumptions. Using sepsis management as a case study, we demonstrate that such misalignment produces inappropriate treatment recommendations in nearly half of patient states. This widespread methodological flaw affects over 80% of the literature but is obscured by inflated performance metrics. We propose a simple fix and advocate decision-centric problem formulations.