jpskill.com
🛠️ 開発・MCP コミュニティ

ai-pentesting

AIを活用してWebアプリケーションの脆弱性を自動で検査し、セキュリティツールと連携して攻撃経路を特定、PoCエクスプロイトを含む詳細なレポートを作成、CI/CDパイプラインへの統合までを支援するSkill。

📜 元の英語説明(参考)

Run autonomous AI-driven penetration tests on web applications using tools like Shannon, PentAGI, and similar frameworks. Use when tasks involve setting up automated penetration testing pipelines, combining AI agents with security tools (nmap, subfinder, nuclei, sqlmap), building autonomous exploit chains, generating pentest reports with proof-of-concept exploits, or integrating AI pentesting into CI/CD pipelines. Covers the full pentest lifecycle from reconnaissance to reporting using AI orchestration.

🇯🇵 日本人クリエイター向け解説

一言でいうと

AIを活用してWebアプリケーションの脆弱性を自動で検査し、セキュリティツールと連携して攻撃経路を特定、PoCエクスプロイトを含む詳細なレポートを作成、CI/CDパイプラインへの統合までを支援するSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o ai-pentesting.zip https://jpskill.com/download/14613.zip && unzip -o ai-pentesting.zip && rm ai-pentesting.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/14613.zip -OutFile "$d\ai-pentesting.zip"; Expand-Archive "$d\ai-pentesting.zip" -DestinationPath $d -Force; ri "$d\ai-pentesting.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して ai-pentesting.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → ai-pentesting フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

AI Pentesting

概要

AIエージェントを使用して、Webアプリケーションに対する侵入テストを自律的に実行します。LLMの推論とセキュリティツール(nmap、subfinder、nuclei、sqlmap、ブラウザ自動化)を組み合わせることで、最小限の人的介入で脆弱性を発見し、証明します。

手順

方法論

AI pentestingは、人間のpentestingと同じフェーズに従いますが、AIが各フェーズを自律的に調整します。

Phase 1: RECONNAISSANCE
├── サブドメイン列挙 (subfinder)
├── 技術フィンガープリント (whatweb, wappalyzer)
├── ポートスキャン (nmap)
├── APIスキーマの発見 (クロール、OpenAPI/GraphQLイントロスペクション)
└── ソースコード分析 (ホワイトボックスの場合)
    AIが決定: どのツールを、どの順序で実行するか、発見に基づいて

Phase 2: VULNERABILITY ANALYSIS
├── 既知のCVEスキャン (nuclei)
├── Web脆弱性スキャン (OWASP ZAP, nikto)
├── APIファジング (schemathesis)
├── コードレベルの脆弱性探索 (semgrep, CodeQL)
└── データフロー分析 (入力 → 危険な関数)
    AIが決定: どの発見が悪用可能である可能性が高いか

Phase 3: EXPLOITATION
├── SQLインジェクション (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (内部アクセス、クラウドメタデータ)
├── 認証バイパス (broken auth, 権限昇格)
├── ビジネスロジックの欠陥 (価格操作、競合状態)
└── ブラウザベースの悪用 (Playwright/Puppeteer)
    AIが決定: 悪用順序、ペイロード選択、チェーニング

Phase 4: REPORTING
├── 各発見に対する概念実証
├── 再現可能なステップ (curlコマンド、スクリーンショット)
├── 深刻度評価 (CVSSスコア)
├── 修正ガイダンス
└── エグゼクティブサマリー
    AIが生成: 構造化された、証拠に基づいたレポート

Shannonのセットアップ

Shannonは、フルライフサイクルを自動化するオープンソースのAI pentesterです。

# Shannonをクローンしてセットアップ
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# 認証情報を設定
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000

# アプリケーションに対してpentestを実行
# 必須: Docker, target URL, ソースコードリポジトリ
./shannon start URL=https://your-app.com REPO=./your-repo

# 進捗を監視
./shannon logs

# Temporal UIで結果を表示
open http://localhost:8233

Shannonのアーキテクチャ:

  • Reconnaissance agent: nmap、subfinder、whatwebを使用して攻撃対象領域をマッピング
  • Vulnerability agents: OWASPカテゴリ(インジェクション、XSS、SSRF、認証バイパス)ごとに特化
  • Exploitation agent: ブラウザ自動化を使用して、実際の悪用で脆弱性を証明
  • Reporting agent: コピー&ペースト可能なPoCコマンドで発見を生成

カスタムAI Pentestパイプラインの構築

Shannonが適合しない場合は、カスタムパイプラインを構築します。


# ai_pentester.py
# LLM + セキュリティツールを使用したカスタムAI pentestingパイプライン

import subprocess
import json
from openai import OpenAI

client = OpenAI()

class AIPentester:
    """自律的なAI侵入テスター。

    LLM推論を使用してセキュリティツールを調整し、
    脆弱性を発見して証明します。
    """

    def __init__(self, target_url: str, scope: list[str] = None):
        self.target = target_url
        self.scope = scope or [target_url]
        self.findings = []
        self.recon_data = {}

    async def run_pentest(self) -> dict:
        """完全な侵入テストライフサイクルを実行します。

        戻り値:
            発見、証拠、および推奨事項を含むDict
        """
        # Phase 1: Recon
        self.recon_data = await self._recon()

        # Phase 2: AIガイド付き脆弱性分析
        targets = await self._analyze_attack_surface(self.recon_data)

        # Phase 3: AIガイド付き悪用
        for target in targets:
            finding = await self._exploit(target)
            if finding:
                self.findings.append(finding)

        # Phase 4: レポートの生成
        report = await self._generate_report()
        return report

    async def _recon(self) -> dict:
        """偵察ツールを実行し、結果を集約します。"""
        recon = {}

        # サブドメイン列挙
        result = subprocess.run(
            ['subfinder', '-d', self._get_domain(), '-silent'],
            capture_output=True, text=True, timeout=120
        )
        recon['subdomains'] = result.stdout.strip().split('\n')

        # 技術フィンガープリント
        result = subprocess.run(
            ['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
            capture_output=True, text=True, timeout=60
        )
        recon['technologies'] = json.loads(result.stdout) if result.stdout else {}

        # ポートスキャン
        result = subprocess.run(
            ['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
            capture_output=True, text=True, timeout=300
        )
        recon['ports'] = result.stdout

        # 既知のCVEのNucleiスキャン
        result = subprocess.run(
            ['nuclei', '-u', self.target, '-severity', 'critical,high',
             '-json', '-silent'],
            capture_output=True, text=True, timeout=300
        )
        recon['known_vulns'] = [
            json.loads(line) for line in result.stdout.strip().split('\n')
            if line.strip()
        ]

        return recon

    async def _analyze_attack_surface(self, recon: dict) -> list:
        """AIを使用して偵察データを分析し、攻撃対象を優先順位付けします。"""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "あなたは、熟練した侵入テスターです。 "
                 "偵察データを分析し、最も有望な "
                 "攻撃ベクトルを特定します。ターゲットのJSON配列を返します。"},
                {"role": "user", "content":
                 f"偵察データ:\n{json.dumps(recon, indent=2)}\n\n"
                 "次の情報を使用して攻撃対象を特定します: endpoint, vulnerability_t

(原文がここで切り詰められています)
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

AI Pentesting

Overview

Use AI agents to autonomously conduct penetration tests on web applications. Combine LLM reasoning with security tools (nmap, subfinder, nuclei, sqlmap, browser automation) to find and prove vulnerabilities with minimal human intervention.

Instructions

Methodology

AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:

Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
    AI decides: which tools to run, in what order, based on findings

Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
    AI decides: which findings are likely exploitable

Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
    AI decides: exploitation order, payload selection, chaining

Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
    AI generates: structured, evidence-based report

Setting Up Shannon

Shannon is an open-source AI pentester that automates the full lifecycle:

# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000

# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo

# Monitor progress
./shannon logs

# View results in Temporal UI
open http://localhost:8233

Shannon's architecture:

  • Reconnaissance agent: Maps attack surface using nmap, subfinder, whatweb
  • Vulnerability agents: Specialized per OWASP category (injection, XSS, SSRF, auth bypass)
  • Exploitation agent: Uses browser automation to prove vulnerabilities with real exploits
  • Reporting agent: Generates findings with copy-paste PoC commands

Building a Custom AI Pentest Pipeline

For cases where Shannon doesn't fit, build a custom pipeline:

# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools

import subprocess
import json
from openai import OpenAI

client = OpenAI()

class AIPentester:
    """Autonomous AI penetration tester.

    Orchestrates security tools using LLM reasoning
    to find and prove vulnerabilities.
    """

    def __init__(self, target_url: str, scope: list[str] = None):
        self.target = target_url
        self.scope = scope or [target_url]
        self.findings = []
        self.recon_data = {}

    async def run_pentest(self) -> dict:
        """Execute full penetration test lifecycle.

        Returns:
            Dict with findings, evidence, and recommendations
        """
        # Phase 1: Recon
        self.recon_data = await self._recon()

        # Phase 2: AI-guided vulnerability analysis
        targets = await self._analyze_attack_surface(self.recon_data)

        # Phase 3: AI-guided exploitation
        for target in targets:
            finding = await self._exploit(target)
            if finding:
                self.findings.append(finding)

        # Phase 4: Generate report
        report = await self._generate_report()
        return report

    async def _recon(self) -> dict:
        """Run reconnaissance tools and aggregate results."""
        recon = {}

        # Subdomain enumeration
        result = subprocess.run(
            ['subfinder', '-d', self._get_domain(), '-silent'],
            capture_output=True, text=True, timeout=120
        )
        recon['subdomains'] = result.stdout.strip().split('\n')

        # Technology fingerprinting
        result = subprocess.run(
            ['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
            capture_output=True, text=True, timeout=60
        )
        recon['technologies'] = json.loads(result.stdout) if result.stdout else {}

        # Port scanning
        result = subprocess.run(
            ['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
            capture_output=True, text=True, timeout=300
        )
        recon['ports'] = result.stdout

        # Nuclei scan for known CVEs
        result = subprocess.run(
            ['nuclei', '-u', self.target, '-severity', 'critical,high',
             '-json', '-silent'],
            capture_output=True, text=True, timeout=300
        )
        recon['known_vulns'] = [
            json.loads(line) for line in result.stdout.strip().split('\n')
            if line.strip()
        ]

        return recon

    async def _analyze_attack_surface(self, recon: dict) -> list:
        """Use AI to analyze recon data and prioritize attack targets."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "You are an expert penetration tester. Analyze the "
                 "reconnaissance data and identify the most promising "
                 "attack vectors. Return JSON array of targets."},
                {"role": "user", "content":
                 f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
                 "Identify attack targets with: endpoint, vulnerability_type, "
                 "technique, priority (1-5), reasoning."}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content).get("targets", [])

    async def _exploit(self, target: dict) -> dict | None:
        """Attempt to exploit an identified vulnerability."""
        vuln_type = target.get('vulnerability_type', '').lower()
        handlers = {
            'injection': self._test_injection,
            'xss': self._test_xss,
            'ssrf': self._test_ssrf,
            'auth': self._test_auth_bypass,
        }
        for key, handler in handlers.items():
            if key in vuln_type:
                return await handler(target)
        return None

    async def _generate_report(self) -> dict:
        """Generate a structured penetration test report."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "Generate a professional penetration test report with "
                 "executive summary, findings with CVSS scores, PoC steps, "
                 "and remediation recommendations."},
                {"role": "user", "content":
                 f"Target: {self.target}\n"
                 f"Findings: {json.dumps(self.findings, indent=2)}\n"
                 f"Recon data: {json.dumps(self.recon_data, indent=2)}"}
            ]
        )
        return {
            "target": self.target,
            "findings_count": len(self.findings),
            "findings": self.findings,
            "report": response.choices[0].message.content
        }

CI/CD Integration

Run AI pentests on every deployment:

# .github/workflows/pentest.yml
name: AI Penetration Test
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2 AM

jobs:
  pentest:
    runs-on: ubuntu-latest
    services:
      app:
        image: your-app:${{ github.sha }}
        ports:
          - 8080:8080

    steps:
      - uses: actions/checkout@v4

      - name: Run Shannon Pentest
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git clone https://github.com/KeygraphHQ/shannon.git
          cd shannon
          ./shannon start \
            URL=http://localhost:8080 \
            REPO=../ \
            MAX_CONCURRENT=3

          # Wait for completion and extract report
          ./shannon wait
          cp workspace/report.md $GITHUB_WORKSPACE/pentest-report.md

      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: pentest-report
          path: pentest-report.md

      - name: Fail on Critical Findings
        run: |
          if grep -q "CRITICAL" pentest-report.md; then
            echo "::error::Critical vulnerabilities found!"
            exit 1
          fi

Report Structure

A professional AI-generated pentest report should include: executive summary (scope, duration, methodology, overall risk, findings count by severity), individual findings (each with CVSS score, affected endpoint/parameter, evidence with reproducible curl commands, impact description, and specific remediation guidance), and a remediation priority list ordered by severity with recommended fix timelines.

Examples

Run an autonomous pentest on a web application

Set up Shannon to run a full penetration test on our staging environment at https://staging.ourapp.com. The source code is in the current repository. Configure it to test for: SQL injection, XSS, SSRF, and broken authentication. Run with maximum concurrency and generate a report with reproducible proof-of-concept exploits for every finding. Flag any critical vulnerabilities that need immediate attention.

Build a custom AI pentest pipeline

Build a custom AI pentesting pipeline that combines subfinder (subdomain discovery), whatweb (tech fingerprinting), nuclei (CVE scanning), and schemathesis (API fuzzing) orchestrated by an LLM agent. The LLM should analyze results from each tool, decide what to test next, and generate exploitation payloads. Target: our API at api.example.com with the OpenAPI spec at /docs/openapi.json. Produce a structured findings report.

Integrate AI pentesting into CI/CD

Add automated penetration testing to our GitHub Actions pipeline. It should run on every push to main and weekly on a schedule. The app runs in Docker (docker-compose up), exposed at localhost:8080. Use Shannon for the pentest, upload the report as an artifact, and fail the build if any critical or high severity vulnerabilities are found. Include Slack notification for findings.

Guidelines

  • Only run penetration tests against systems you have explicit written authorization to test — unauthorized testing is illegal
  • AI pentesters can cause real damage (data modification, service disruption) — always test against staging environments, never production
  • Review AI-generated exploitation attempts before running them — LLMs can hallucinate or generate overly aggressive payloads
  • Treat pentest reports as confidential — they contain vulnerability details and proof-of-concept exploits
  • Set time limits and scope boundaries for autonomous testing to prevent runaway scans
  • Validate AI findings manually — false positives in automated reports erode trust with stakeholders
  • Store API keys and credentials used for pentesting securely — never hardcode them in CI configurations