Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning

arXiv:2603.20181v1 Announce Type: cross Abstract: The use of ML in cybersecurity has long been impaired by generalization issues: Models that work well in controlled scenarios fail to maintain performance in production. The root cause often lies in ML algorithms learning superficial patterns (shortcuts) rather than underlying cybersecurity concepts. We investigate contrastive multi-modal learning as a […]

Mapping Connectomic Structure to Function(s) in Cerebellar-like Networks using Kernel Regression

arXiv:2601.09320v2 Announce Type: replace Abstract: Cerebellar-like networks, in which input activity patterns are separated by projection to a much higher-dimensional space before classification, are a recurring neurobiological motif, present in the cerebellum, dentate gyrus, insect olfactory system, and electrosensory system of the electric fish. Their relatively well-understood design presents a promising test-case for probing principles […]

RAM: Recover Any 3D Human Motion in-the-Wild

arXiv:2603.19929v1 Announce Type: cross Abstract: RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity association under severe occlusions and dynamic interactions. A memory-augmented Temporal HMR module further enhances human motion reconstruction by injecting spatio-temporal priors for consistent and smooth motion estimation. Moreover, a lightweight Predictor module forecasts future poses to […]

An Empirical Study of SFT-DPO Interaction and Parameterization in Small Language Models

arXiv:2603.20100v1 Announce Type: cross Abstract: Direct Preference Optimization (DPO) is widely used after supervised fine-tuning (SFT) to align language models, yet empirical behavior under small backbones and modest data is under-specified. We systematically compare SFT-only, DPO-only, and staged SFT-to-DPO training alongside full fine-tuning (FFT) versus LoRA on a GPT-2-scale decoder, evaluating paraphrase detection and Shakespearean […]

FORWARD: Dataset of a forwarder operating in rough terrain

arXiv:2511.17318v3 Announce Type: replace-cross Abstract: We present FORWARD, a high-resolution multimodal dataset of a cut-to-length forwarder operating in rough terrain on two harvest sites in the middle part of Sweden. The forwarder is a large Komatsu model equipped with vehicle telematics sensors, including global positioning via satellite navigation, movement sensors, accelerometers, and engine sensors. The […]

An SO(3)-equivariant reciprocal-space neural potential for long-range interactions

arXiv:2603.18389v2 Announce Type: replace-cross Abstract: Long-range electrostatic and polarization interactions play a central role in molecular and condensed-phase systems, yet remain fundamentally incompatible with locality-based machine-learning interatomic potentials. Although modern SO(3)-equivariant neural potentials achieve high accuracy for short-range chemistry, they cannot represent the anisotropic, slowly decaying multipolar correlations governing realistic materials, while existing long-range extensions […]

On Sample-Efficient Generalized Planning via Learned Transition Models

arXiv:2602.23148v3 Announce Type: replace Abstract: Generalized planning studies the construction of solution strategies that generalize across families of planning problems sharing a common domain model, formally defined by a transition function $gamma : S times A rightarrow S$. Classical approaches achieve such generalization through symbolic abstractions and explicit reasoning over $gamma$. In contrast, recent Transformer-based […]

MOSS-TTS Technical Report

arXiv:2603.18090v2 Announce Type: replace-cross Abstract: This technical report presents MOSS-TTS, a speech generation foundation model built on a scalable recipe: discrete audio tokens, autoregressive modeling, and large-scale pretraining. Built on MOSS-Audio-Tokenizer, a causal Transformer tokenizer that compresses 24 kHz audio to 12.5 fps with variable-bitrate RVQ and unified semantic-acoustic representations, we release two complementary generators: […]

PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents

arXiv:2603.18377v2 Announce Type: replace-cross Abstract: Cloud-hosted large language models (LLMs) have become the de facto planners in agentic systems, coordinating tools and guiding execution over local environments. In many deployments, however, the environment being planned over is private, containing source code, files, credentials, and metadata that cannot be exposed to the cloud. Existing solutions address […]

A proxy-based approach for unmeasured confounding in electronic health records research

arXiv:2506.12177v2 Announce Type: replace-cross Abstract: Electronic health records (EHR) are widely used to study clinical decisions, yet unmeasured confounding remains a persistent challenge. Proxy variables offer a potential solution. In EHR data, clinicians already record many such measurements (e.g., vitals), each revealing something about a patient’s underlying health. Despite this, proxy-based methods are rarely used […]

Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech?

arXiv:2603.19831v1 Announce Type: cross Abstract: Human communication seamlessly integrates speech and bodily motion, where hand gestures naturally complement vocal prosody to express intent, emotion, and emphasis. While recent text-to-speech (TTS) systems have begun incorporating multimodal cues such as facial expressions or lip movements, the role of hand gestures in shaping prosody remains largely underexplored. We […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844