arXiv:2604.19098v2 Announce Type: replace-cross
Abstract: English financial NLP has advanced rapidly through benchmarks targeting earnings analysis, market sentiment, tabular reasoning, and financial question answering, yet Arabic financial NLP remains virtually nonexistent, despite 422 million speakers, $4.9 trillion in Gulf sovereign wealth, and a $4-5 trillion Islamic finance industry requiring specialized Shari’ah compliance over instruments like sukuk, murabaha, and takaful. We introduce Sahm, the first Arabic financial benchmark spanning seven tasks: AAOIFI standards QA, fatwa-based QA/MCQ, accounting and business exams, financial sentiment analysis, extractive summarization, and event-cause reasoning, comprising 14,380 expert-verified instances from authentic regulatory, juristic, and corporate sources. Evaluating 20 LLMs, we find Arabic fluency does not imply financial reasoning: models achieving 91% on recognition tasks drop sharply on generation, and event-cause reasoning exposes the widest performance gap (1.89-9.84/10). We release the benchmark and dataset to support trustworthy Arabic financial assistants.
Disclosure in the era of generative artificial intelligence
Generative artificial intelligence (AI) has rapidly become embedded in academic writing, assisting with tasks ranging from language editing to drafting text and producing evidence. Despite



