Framework Coverage

What the audit actually tests.

Every Allymet engagement runs 2,000+ automated probes across 17 test categories, mapped to the controls and requirements of 6 industry frameworks. This page shows exactly what is in scope, what is not, and why.

17 test categories

What we probe for.

Security and Attack Resistance

Jailbreak Resistance

Encoding bypasses, persona attacks, DAN variants, and other guardrail circumvention techniques.

Prompt Injection Defense

Direct injection, instruction override, delimiter attacks, and payload smuggling.

Multi-Turn Attack Resistance

Conversational goal hijacking across multiple turns of dialogue.

System Prompt Protection

Attempts to extract, override, or leak the system prompt.

Supply Chain Security

Dependency vulnerability scanning and malicious serialized object detection.

Output Sanitization

Cross-site scripting, code injection, and unsafe markup in model outputs.

Inter-Agent Security

Communication integrity between agents. Assessed when a multi-agent architecture is in scope.

Privacy and Data Protection

PII/PHI Protection

Personal and health data disclosure, extraction attempts, and refusal quality.

RAG Data Security

Retrieval-augmented generation faithfulness and context leakage.

Copyrighted content reproduction and intellectual property handling.

Safety and Quality

Factual Accuracy

Hallucination detection and factual grounding.

Bias and Fairness

Demographic bias, stereotyping, and discriminatory output patterns.

Harmful Content Prevention

Weapons, controlled substances, exploitation, violence, harassment, and misinformation.

Human Override Respect

System purpose adherence and resistance to task hijacking.

Agentic and Operational

Tool Use Safety

Function call abuse and tool misuse. Assessed when the target has tool-calling enabled.

Cascading Failure Resilience

Failure propagation in multi-agent systems. Assessed when applicable.

Cost and Resource Protection

Token abuse, resource exhaustion, and unbounded consumption patterns.

6 compliance frameworks

Which controls your client can point to.

Best fit for most engagements

OWASP LLM Top 10 (2025)

The most directly testable framework for LLM deployments. Six of ten risks have strong automated coverage; four have partial coverage requiring additional architectural review.

Controls tested

LLM01 Prompt Injection
LLM02 Sensitive Information Disclosure
LLM03 Supply Chain
LLM05 Improper Output Handling
LLM07 System Prompt Leakage
LLM09 Misinformation
LLM04, 06, 08, 10 (partial)

Deployment-environment controls (logging, monitoring, sandboxing) require separate review.

Agentic systems

OWASP Agentic AI (2026)

Tests the cognitive layer of agentic systems: goal stability, injection resistance, and memory integrity. Infrastructure-level risks (tool sandboxing, IAM, multi-agent orchestration) require architectural review.

Controls tested

ASI01 Goal and Instruction Hijacking
ASI04 Malicious Supply Chain
ASI05 Code and Command Execution
ASI06 Memory and Context Poisoning
ASI09 Trust Boundary Exploitation

ASI02, ASI03, ASI07, ASI08, ASI10 require tool sandbox or multi-agent architecture access.

Adversarial threat landscape

MITRE ATLAS

Comprehensive coverage of LLM-specific adversarial techniques. ATLAS catalogs 40+ techniques; our testing covers the subset applicable to API-based LLM assessment.

Techniques tested

T0051 Prompt Injection (direct and indirect)
T0054 LLM Jailbreak
T0056 Meta Prompt Extraction
T0057 LLM Data Leakage
T0043 Craft Adversarial Data
T0048 External Harms

Traditional ML techniques (model theft, training poisoning, evasion) require model-level access beyond API testing.

AI governance standard

ISO/IEC 42001:2023

Technical validation of main-body clause requirements. ISO 42001 is primarily an organizational governance standard; our testing provides the technical evidence layer that supports a broader AI Management System assessment.

Clauses tested

6.1.2 AI risk assessment
6.1.3 AI risk treatment
7.2 Data quality
7.3 Bias prevention
8.1 Operational controls
8.2 Privacy protection

Full ISO 42001 certification requires organizational AI Management System assessment by a qualified auditor.

US federal framework

NIST AI RMF 1.0

Strongest in the MEASURE and MANAGE functions, where automated testing directly validates risk metrics. The GOVERN function is organizational and outside technical testing scope.

Functions tested

MEASURE: bias, accuracy, robustness testing
MANAGE: output controls, content safety, supply chain
MAP: attack surface identification (partial)
GOVERN: human oversight indicators (partial)

GOVERN function requires organizational governance assessment. Full RMF adoption is an organizational implementation.

EU regulation

EU AI Act

Mapped to GPAI and transparency obligations. For systems classified as high-risk under Annex III, additional article mappings (Art. 9-15) are available on request.

Articles tested

Art. 50(1) Transparency and AI disclosure
Art. 50(2) Marking of synthetic content
Art. 53(1)(a) GPAI technical documentation
Art. 53(1)(c) GPAI copyright policy
Art. 55(1)(a) GPAI systemic risk, adversarial testing

Does not constitute a conformity assessment. Provider documentation, QMS, and post-market monitoring require separate evaluation.

Cross-reference

How test categories map to frameworks.

Each test category produces evidence that maps to one or more framework controls. A single scan covers all six frameworks simultaneously.

Test Category	ISO 42001	OWASP LLM	Agentic	NIST RMF	EU AI Act	ATLAS
Jailbreak Resistance	6.1.2	LLM01	ASI01	MANAGE	Art. 55	T0054
Prompt Injection	6.1.2	LLM01	ASI01	MANAGE	Art. 55	T0051
Multi-Turn Attacks	6.1.2	—	ASI01	MAP	—	—
System Prompt	6.1.3	LLM07	—	MAP	Art. 50	T0056
Supply Chain	6.1.3	LLM03	ASI04	MANAGE	—	—
Output Sanitization	6.1.3	LLM05	—	MANAGE	—	T0048
PII/PHI Protection	8.2	LLM02	—	MANAGE	Art. 53	T0057
RAG Data Security	7.2	LLM04, 08	ASI06	MEASURE	—	—
Copyright Protection	8.2	—	—	—	Art. 53	—
Factual Accuracy	7.2	LLM09	—	MEASURE	Art. 50, 53	—
Bias and Fairness	7.3	—	—	MEASURE	Art. 53	—
Harmful Content	8.1	—	—	MANAGE	Art. 55	T0048
Human Override	8.1	LLM06	—	GOVERN	Art. 50	—
Cost/Resource	8.1	LLM10	—	GOVERN	—	—

What this audit is and is not.

An Allymet assessment is a point-in-time technical evaluation of an AI system's behavior via API-based probing. It provides evidence of how the system responds to adversarial inputs, mapped to the controls of each framework.

It is not a certification, a conformity assessment, or a substitute for organizational governance reviews. Framework mappings are technical indicators. Regulatory approval, policy documentation, and management system assessments require separate evaluation by qualified auditors.