arXiv:2604.22109v2 Announce Type: replace-cross Abstract: Large language models (LLMs) possess strong persuasive capabilities that outperform humans in head-to-head comparisons. Users report consulting LLMs to inform major life decisions in relationships, medical settings, and when seeking professional advice. Prior work measures persuasion as intentional attempts at producing the most effective argument or convincing statement. This fails […]
Unconstrained Multi-view Human Pose Estimation with Algebraic Priors
arXiv:2604.24312v1 Announce Type: cross Abstract: Recovering 3D human pose from multi-view imagery typically relies on precise camera calibration, which is often unavailable in real-world scenarios, thereby severely limiting the applicability of existing methods. To overcome this challenge, we propose an unconstrained framework that synergizes deep neural networks, algebraic priors, and temporal dynamics for uncalibrated multi-view […]
CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation
arXiv:2510.22609v2 Announce Type: replace Abstract: Accurate symptom-to-disease classification and clinically grounded treatment recommendations remain challenging, particularly in heterogeneous patient settings with high diagnostic risk. Existing large language model (LLM)-based systems often lack medical grounding and fail to quantify uncertainty, resulting in unsafe outputs. We propose CLIN-LLM, a safety-constrained hybrid pipeline that integrates multimodal patient encoding, […]
Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization
arXiv:2604.12290v2 Announce Type: replace Abstract: Current LLM agent benchmarks, which predominantly focus on binary pass/fail tasks such as code generation or search-based question answering, often neglect the value of real-world engineering that is often captured through the iterative optimization of feasible designs. To this end, we introduce Frontier-Eng, a human-verified benchmark for generative optimization — […]
Green Prompting: Characterizing Prompt-driven Energy Costs of LLM Inference
arXiv:2503.10666v4 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have become widely used across various domains spanning search engines, code generation, and text creation. However, a major concern associated with their adoption is the high cost of inference, impacting both their sustainability and financial feasibility. In this study, we empirically study how different prompt and […]
InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training
arXiv:2509.21275v4 Announce Type: replace-cross Abstract: Long context training is crucial for LLM’s context extension. Existing schemes, such as sequence parallelism, incur substantial communication overhead. Pipeline parallelism (PP) reduces this cost, but its effectiveness hinges on partitioning granularity. Batch-level PP employing sequence packing exhibits high memory consumption in long-context scenarios, whereas token-level PP splitting sequences into […]
Generating Verifiable Chain of Thoughts from Exection-Traces
arXiv:2512.00127v3 Announce Type: replace-cross Abstract: Getting language models to reason correctly about code requires training on data where each reasoning step can be checked. Current synthetic Chain-of-Thought (CoT) training data often consists of plausible-sounding explanations generated by teacher models, and not verifiable accounts of actual program behavior. Models trained on such data learn logically flawed […]
Reinforcement Learning with Backtracking Feedback
arXiv:2602.08377v2 Announce Type: replace-cross Abstract: Addressing the critical need for robust safety in Large Language Models (LLMs), particularly against adversarial attacks and in-distribution errors, we introduce Reinforcement Learning with Backtracking Feedback (RLBF). This framework advances upon prior methods, such as BSAFE, by primarily leveraging a Reinforcement Learning (RL) stage where models learn to dynamically correct […]
Security Considerations for Multi-agent Systems
arXiv:2603.09002v2 Announce Type: replace-cross Abstract: Multi-agent artificial intelligence systems or MAS are systems of autonomous agents that exercise delegated tool authority, share persistent memory, and coordinate via inter-agent communication. MAS introduces qualitatively distinct security vulnerabilities from those documented for singular AI models. Existing security and governance frameworks were not designed for these emerging attack surfaces. […]
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
arXiv:2604.17457v3 Announce Type: replace-cross Abstract: Q-value iteration (Q-VI) is usually analyzed through the (gamma)-contraction of the Bellman operator. This argument proves convergence to (Q^*), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set […]
TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training
arXiv:2604.24088v1 Announce Type: cross Abstract: Handling communication overhead in large-scale tensor-parallel training remains a critical challenge due to the dense, near-zero distributions of intermediate tensors, which exacerbate errors under frequent communication and introduce significant computational overhead during compression. To this end, we propose TACO (Tensor-parallel Adaptive COmmunication compression), a robust FP8-based framework for compressing TP […]
CMGL: Confidence-guided Multi-omics Graph Learning for Cancer Subtype Classification
arXiv:2604.24201v1 Announce Type: cross Abstract: Motivation: Multi-omics integration can improve cancer subtyping, but modality informativeness and noise vary across cancer types and patients. Existing graph-based methods optimize modality weights jointly with the classification objective and therefore lack independent reliability estimates, so low-quality omics distort patient similarity graphs and amplify noise through message passing. Results: We […]