Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data

arXiv:2511.17276v1 Announce Type: cross Abstract: This paper presents an efficient approach for determining the joint configuration of a multifingered gripper solely from the point cloud data of its poly-articulated chain, as generated by visual sensors, simulations or even generative neural networks. Well-known inverse kinematics (IK) techniques can provide mathematically exact solutions (when they exist) for […]

Sionna RT: Technical Report

arXiv:2504.21719v2 Announce Type: replace-cross Abstract: Sionna is an open-source, GPU-accelerated library that, as of version 0.14, incorporates a ray tracer, Sionna RT, for simulating radio wave propagation. A unique feature of Sionna RT is differentiability, enabling the calculation of gradients for the channel impulse responses (CIRs), radio maps, and other related metrics with respect to […]

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding

arXiv:2510.09110v3 Announce Type: replace-cross Abstract: Visual grouping — operationalized through tasks such as instance segmentation, visual grounding, and object detection — enables applications ranging from robotic perception to photo editing. These fundamental problems in computer vision are powered by large-scale, painstakingly annotated datasets. Despite their impact, these datasets are costly to build, biased in coverage, […]

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

arXiv:2506.08708v2 Announce Type: replace-cross Abstract: While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic […]

Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing

arXiv:2511.17269v1 Announce Type: cross Abstract: Training autonomous driving and navigation systems requires large and diverse point cloud datasets that capture complex edge case scenarios from various dynamic urban settings. Acquiring such diverse scenarios from real-world point cloud data, especially for critical edge cases, is challenging, which restricts system generalization and robustness. Current methods rely on […]

AgriChrono: A Multi-modal Dataset Capturing Crop Growth and Lighting Variability with a Field Robot

arXiv:2508.18694v2 Announce Type: replace-cross Abstract: Advances in AI and Robotics have accelerated significant initiatives in agriculture, particularly in the areas of robot navigation and 3D digital twin creation. A significant bottleneck impeding this progress is the critical lack of “in-the-wild” datasets that capture the full complexities of real farmland, including non-rigid motion from wind, drastic […]

A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture

arXiv:2509.24068v2 Announce Type: replace-cross Abstract: Strategy Choice Theory (SCT; Siegler and Shrager, 1984; Siegler, 2000) explains important aspects of children’s arithmetic learning based upon principles including learning from developmentally naturalistic data, probabilistic representation, confidence-based retrieval, and the phase-like importance of scaffolding strategies, such as finger-counting. Here we recast SCT as a “Small Math Model” (SMM), […]

Evaluating AI-Driven Automated Map Digitization in QGIS

arXiv:2504.18777v3 Announce Type: replace Abstract: Map digitization is an important process that converts maps into digital formats that can be used for further analysis. This process typically requires a deep human involvement because of the need for interpretation and decision-making when translating complex features. With the advancement of artificial intelligence, there is an alternative to […]

Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration

arXiv:2511.00794v2 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has improved the reasoning ability of large language models, yet training remains costly because many rollouts contribute little to optimization, considering the amount of computation required. This study investigates how simply leveraging intrinsic data properties, almost free benefit during training, can improve data efficiency […]

MOCHA: Multi-modal Objects-aware Cross-arcHitecture Alignment

arXiv:2509.14001v4 Announce Type: replace-cross Abstract: Personalized object detection aims to adapt a general-purpose detector to recognize user-specific instances from only a few examples. Lightweight models often struggle in this setting due to their weak semantic priors, while large vision-language models (VLMs) offer strong object-level understanding but are too computationally demanding for real-time or on-device applications. […]

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards

arXiv:2511.17473v1 Announce Type: cross Abstract: Test-time scaling has been shown to substantially improve large language models’ (LLMs) mathematical reasoning. However, for a large portion of mathematical corpora, especially theorem proving, RLVR’s scalability is limited: intermediate reasoning is crucial, while final answers are difficult to directly and reliably verify. Meanwhile, token-level SFT often degenerates into rote […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844