AI Assessment Scoping
AI and Large Language Model (LLM) assessments are conducted to evaluate the security posture of the AI model, its hosting environment, and the application layer that wraps the model. The assessment aims to identify vulnerabilities that could lead to prompt injection, data leakage, model theft, or the generation of harmful content.
Complete the form below or click ‘Download’ to save a copy and fill it in at your convenience. Once completed, please send it to sales@cyberalchemy.co.uk.
AI Assessment Scoping Methodology
Approach
AI and Large Language Model (LLM) assessments are conducted to evaluate the security posture of the AI model, its hosting environment, and the application layer that wraps the model. The assessment aims to identify vulnerabilities that could lead to prompt injection, data leakage, model theft, or the generation of harmful content. The consultants will use a blend of traditional penetration testing and specialised adversarial machine learning techniques (“Red Teaming”) to stress-test the system. The application and model are viewed and manipulated from several perspectives, including external attackers (no knowledge), authenticated users (partial knowledge), and privileged developers with access to system prompts (full knowledge).
Cyber Alchemy’s AI testing methodology covers the OWASP Top Ten for Large Language Models, representing the industry consensus on the most critical security risks to AI applications. The OWASP Top Ten for LLMs is as follows:
- LLM01:2025 Prompt Injection
- LLM02:2025 Sensitive Information Disclosure
- LLM03:2025 Supply Chain
- LLM04: Data and Model Poisoning
- LLM05:2025 Improper Output Handling
- LLM06:2025 Excessive Agency
- LLM07:2025 System Prompt Leakage
- LLM08:2025 Vector and Embedding Weaknesses
- LLM09:2025 Misinformation
- LLM10:2025 Unbounded Consumption
Methodology
The first step of the engagement is to define the scope. This is performed through the completion of a scoping document and a scoping call (if required). Once the context is set, Cyber Alchemy will begin the assessment using the following categories derived from the OWASP AI Security Testing Guide.
Information Gathering & Model Reconnaissance
- Fingerprinting the underlying model family (e.g., GPT-4, Llama 3, Mistral) and version.
- Attempting to trick the model into revealing its own governing instructions, ethical guidelines, and developer comments.
- Identifying if the model is hosted via third-party API or self-hosted, and mapping the flow of data between the user, the model, and backend databases.
- Enumerating what external tools (calculators, web browsers, API hooks) the AI has access to.
Prompt Injection & Jailbreaking Testing
- Testing for “Jailbreaks” (e.g., DAN mode, role-playing attacks) to bypass safety guardrails and ethical filters.
- Attempting to compromise the AI by feeding it malicious external content (e.g., a website or document containing hidden commands that the AI reads and executes).
- Using techniques to extract the intellectual property (IP) contained within the system prompts.
- Using encoding (Base64, Morse code, translation) to bypass input filters and execute blocked queries.
Data Privacy & Information Integrity Testing
- Attempting to force the model to regurgitate PII (Personally Identifiable Information) or sensitive data from its training set.
- Testing if an attacker can deduce private details about the data used to fine-tune the model.
- If the model uses Retrieval-Augmented Generation, testing for “Poisoned Context” and injecting malicious data into the knowledge base to alter the AI’s answers.
- Stress-testing the model to see if it can be forced to generate convincing but false information that could damage the company’s reputation.
Model & Application Security Testing
- Testing for context-window exhaustion and resource consumption attacks that degrade the model’s performance or increase API costs.
- Identifying vulnerable dependencies, outdated model versions, or insecure model serialisation formats (e.g., Pickle vulnerabilities in PyTorch models).
Output Handling & Integration Testing
- Ensuring the application sanitises the AI’s output to prevent Cross-Site Scripting (XSS) or SQL Injection generated by the AI.
- Testing “Excessive Agency” by attempting to force the AI to perform unauthorised actions via its connected APIs (e.g., sending emails, deleting database records).
- Testing if the AI can be instructed to query internal IP addresses or restricted endpoints.
Logical & Business Risk Testing
- Assessing how the application handles incorrect AI outputs and if appropriate warnings/human-in-the-loop checks are in place.
- Testing the robustness of the moderation layer (e.g., Azure Content Safety or NeMo Guardrails) against creative circumvention.
- Testing for the absence of limits on prompt length or frequency, which could lead to financial resource exhaustion (Denial of Service).
API & Infrastructure Testing (AI Specific)
- Testing the security of the vector database (e.g., Pinecone, Milvus) used for memory/RAG, ensuring proper access controls and encryption.
- Applying standard OWASP API security tests to the endpoints that serve the model (Authentication, Authorisation, Throttling).

Got a question?
Speak to an expert about AI Assessment Scoping.