AIPOCH Launches MedSkillAudit, an AI Audit Framework to Evaluate Medical AI Agent Skills Before Deployment
SINGAPORE, June 29, 2026 (GLOBE NEWSWIRE) -- AIPOCH, in collaboration with the Department of Pathology at Zhongshan Hospital, Fudan University, today unveiled MedSkillAudit, a pre-deployment domain-specific audit framework designed to identify scientifically unreliable AI agent skills before they are used in medical research. The research behind it was published as a preprint on arXiv (arXiv:2604.20441) in April 2026.

Medical research agents are increasingly built from modular skills that perform tasks such as literature screening, statistical analysis, protocol design, and manuscript drafting. Yet existing quality-control methods often fail to detect scientific errors, fabricated citations, flawed reasoning, or unsafe outputs before these capabilities reach researchers.
MedSkillAudit introduces a two-layer "veto gate" review process. The first veto evaluates operational stability, structural consistency, result determinism, and system security, while the second assesses four scientific integrity dimensions: scientific integrity (no fabricated citations, DOIs, sample sizes, or p-values), practice boundaries (no direct diagnostic conclusions without proper medical disclaimers), methodological baseline (no logical fallacies such as conflating correlation with causation), and code usability (no syntax errors or missing core dependencies in generated code). Skills that fail any critical requirement are blocked from deployment.
Beyond the two-layer veto gate, MedSkillAudit uses a two-stage evaluation methodology: static evaluation (design quality, accounting for 40%) and dynamic evaluation (runtime performance, accounting for 60%).The framework combines a review of the skill's design and source code with execution-based testing in simulated research scenarios. Based on the final score, skills are classified into four readiness levels ranging from “Production Ready”, “Limited Release” , “Beta Only” to “Rejected”.
In a validation study spanning 75 skills across five medical research categories (e.g., evidence Insight, protocol design, data analysis, academic writing, and other), 57.3% of skills fell below the Limited Release threshold.The findings highlight the urgency need for such gatekeeping.
More notably, the study also found that MedSkillAudit's evaluations aligned closely with expert reviewers and delivered consistent results across different assessments.
"AI agents are becoming part of the scientific workflow, yet there is still no equivalent of a quality-control checkpoint for the skills they rely on," said Huimei Wang, CEO at AIPOCH. "MedSkillAudit was developed to help researchers identify scientific, methodological, and ethical risks before these capabilities are deployed. We believe domain-specific auditing frameworks will become an essential complement to traditional AI model evaluation."
A photo accompanying this announcement is available at https://www.globenewswire.com/NewsRoom/AttachmentNg/26e4fb94-a744-4125-b842-726178db5b86

Contact: Dada Cheung Email: dada@insiderspr.com +86-18588409987 PR Head, INSIDERS PR STUDIO Shenzhen, China
Legal Disclaimer:
EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.