What's included
- Prompt Injection: Detects instructions that override AI system context before skills are installed or executed.
- Jailbreak & Exfiltration: Catches content removing safety constraints and references to SSH keys, cloud credentials, and wallets.
- Token Smuggling: Flags LLM control tokens injected into skill content that can manipulate model behavior.
- Obfuscation Detection: Surfaces zero-width characters, homoglyphs, and base64 blobs used to hide malicious payloads.
Architecture
How Skill Warden analyzes AI skills for prompt injection and unsafe behavior.
Detection categories
Detection Severity Type
Prompt Injection Instructions that override AI system context
Critical Hard fail Jailbreak Attempt Content removing AI safety constraints
Critical Hard fail Token Smuggling LLM control tokens injected into skill content
High Hard fail Secret Grabbing References to SSH keys, cloud credentials, wallets
High Advisory External Fetch Coercion Instructions pushing the AI to install or download packages
Medium Advisory Content Obfuscation Zero-width chars, homoglyphs, base64 blobs
Medium Advisory Protect your skill repo in CI
Add the Skill Warden GitHub Action to your skill repository, it fails CI on any hard violation and uploads findings to the GitHub Security tab via SARIF 2.1.0.
Also available as a CLI: pip install skill-warden
Get it on GitHub Marketplace →# .github/workflows/skill-warden.yml
name: Warden Security Scan
on:
push:
branches: [main]
pull_request:
jobs:
scan:
runs-on: ubuntu-latest
permissions:
security-events: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: W3OSC/skill-warden-action@v1