arXiv:2306.08762v4 Announce Type: replace-cross
Abstract: Partially observable Markov decision processes (POMDPs) are a general framework for sequential decision-making under latent state uncertainty, yet learning in POMDPs is intractable in the worst case. Motivated by sensing and probing constraints in practice, we study how much online state information (OSI) is sufficient to enable efficient learning guarantees. We formalize a model in which the learner can query only partial OSI (POSI) during interaction. We first prove an information-theoretic hardness result showing that, for general POMDPs, achieving an $epsilon$-optimal policy can require sample complexity that is exponential unless full OSI is available. We then identify two structured subclasses that remain learnable under POSI and propose corresponding algorithms with provably efficient performance guarantees. In particular, we establish regret upper bounds with $tildeO(sqrtK)$ dependence on the number of episodes $K$, together with complementary lower bounds, thereby delineating when POSI suffices for efficient reinforcement learning. Our results highlight a principled separation between intractable and tractable regimes under incomplete online state access and provide new tools for jointly optimizing POSI queries and learning control actions.
FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning
arXiv:2601.21682v1 Announce Type: cross Abstract: Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing

