Large language models (LLMs) are used increasingly in medicine, but their decision-making in cardiovascular risk attribution remains underexplored. This pilot study examined how an LLM apportioned relative cardiovascular risk across different demographic and clinical domains. A structured prompt set across six domains was developed, across general cardiovascular risk, body mass index (BMI), diabetes, depression, smoking, and hyperlipidaemia, and submitted in triplicate to ChatGPT 4.0 mini. For each domain, a neutral prompt assessed the LLM’s risk attribution, while paired comparative prompts examined whether including the domain changed the LLM’s decision of the higher-risk demographic group. The LLM attributed higher cardiovascular risk to men than women, and to Black rather than white patients, across most neutral prompts. In comparative prompts, the LLM’s decision between sex changed in two of six domains: when depression was included, risk attribution was equal between men and women. It changed from females being at higher risk than males in scenarios without smoking, but changed to males being at higher risk than females when smoking was present. In contrast, race-based decisions of relative risk were stable across domains, as the LLM consistently judged Black patients to be higher-risk. Agreement across repeated runs was strong (ICC of 0.949, 95% CI: 0.819–0.992, p = <0.001). The LLM exhibited bias and variability across cardiovascular risk domains. Although decisions between males/females sometimes changed when comorbidities were included, race-based decisions remained the same. This pilot study suggests careful evaluation of LLM clinical decision-making is needed, to avoid reinforcing inequities.
Artificial intelligence in epidemic watch: revolutionizing infectious diseases surveillance
Artificial intelligence is undoubtedly emerging, and its various manifestations in technology are widely and deeply embedded in our communities. That is what obliges its mindful



