🛠️ 開発・MCP コミュニティ

data-context-extractor

Generate or improve a company-specific data analysis skill by extracting tribal knowledge from analysts. BOOTSTRAP MODE - Triggers: "Create a data context skill", "Set up data analysis for our warehouse", "Help me create a skill for our database", "Generate a data skill for [company]" → Discovers schemas, asks key questions, generates initial skill with reference files ITERATION MODE - Triggers: "Add context about [domain]", "The skill needs more info about [topic]", "Update the data skill with [metrics/tables/terminology]", "Improve the [domain] reference" → Loads existing skill, asks targeted questions, appends/updates reference files Use when data analysts want Claude to understand their company's specific data warehouse, terminology, metrics definitions, and common query patterns.

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o data-context-extractor.zip https://jpskill.com/download/22585.zip && unzip -o data-context-extractor.zip && rm data-context-extractor.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/22585.zip -OutFile "$d\data-context-extractor.zip"; Expand-Archive "$d\data-context-extractor.zip" -DestinationPath $d -Force; ri "$d\data-context-extractor.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して data-context-extractor.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → data-context-extractor フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 6

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

[スキル名] data-context-extractor

データコンテキストエクストラクター

アナリストから企業固有のデータ知識を抽出し、カスタマイズされたデータ分析スキルを生成するメタスキルです。

仕組み

このスキルには2つのモードがあります。

ブートストラップモード: ゼロから新しいデータ分析スキルを作成します。
イテレーションモード: ドメイン固有の参照ファイルを追加して、既存のスキルを改善します。

ブートストラップモード

使用場面: ユーザーが自身のウェアハウス用に新しいデータコンテキストスキルを作成したい場合。

フェーズ1: データベース接続と発見

ステップ1: データベースの種類を特定する

質問: 「どのデータウェアハウスを使用していますか？」

一般的な選択肢:

BigQuery
Snowflake
PostgreSQL/Redshift
Databricks

~~data warehouseツール（クエリとスキーマ）を使用して接続します。不明な場合は、現在のセッションで利用可能なMCPツールを確認します。

ステップ2: スキーマを探索する

~~data warehouseスキーマツールを使用して以下を行います。

利用可能なデータセット/スキーマをリストアップします。
最も重要なテーブルを特定します（ユーザーに「アナリストが最も頻繁にクエリするテーブルはどれですか？3～5つ教えてください」と尋ねます）。
それらの主要なテーブルのスキーマ詳細を取得します。

方言ごとの探索クエリの例:

-- BigQuery: データセットをリストアップ
SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA

-- BigQuery: データセット内のテーブルをリストアップ
SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`

-- Snowflake: スキーマをリストアップ
SHOW SCHEMAS IN DATABASE my_database

-- Snowflake: テーブルをリストアップ
SHOW TABLES IN SCHEMA my_schema

フェーズ2: 主要な質問（これらを尋ねる）

スキーマの発見後、これらの質問を会話形式で尋ねます（一度にすべてではありません）。

エンティティの曖昧さ解消（重要）

「ここで『ユーザー』や『顧客』と言うとき、具体的に何を意味しますか？異なる種類がありますか？」

以下に注意して聞きます。

複数のエンティティタイプ（user vs account vs organization）
それらの間の関係（1:1、1:多、多:多）
それらをリンクするIDフィールド

主要な識別子

「[顧客/ユーザー/アカウント]の主要な識別子は何ですか？同じエンティティに対して複数のIDがありますか？」

以下に注意して聞きます。

主キー vs ビジネスキー
UUID vs 整数ID
レガシーIDシステム

主要なメトリクス

「人々が最もよく尋ねるメトリクスは2～3つ何ですか？それぞれどのように計算されますか？」

以下に注意して聞きます。

正確な計算式（ARR = monthly_revenue × 12）
各メトリクスに供給されるテーブル/カラム
期間の慣例（過去7日間、暦月など）

データ衛生

「クエリから常に除外すべきものは何ですか？（テストデータ、不正、内部ユーザーなど）」

以下に注意して聞きます。

常に含めるべき標準的なWHERE句
除外を示すフラグカラム（is_test、is_internal、is_fraud）
除外すべき特定の値（status = 'deleted'）

よくある落とし穴

「新しいアナリストがこのデータでよく犯す間違いは何ですか？」

以下に注意して聞きます。

紛らわしいカラム名
タイムゾーンの問題
NULL処理の癖
履歴データと現在の状態のテーブル

フェーズ3: スキルの生成

以下の構造でスキルを作成します。

[company]-data-analyst/
├── SKILL.md
└── references/
    ├── entities.md          # エンティティの定義と関係
    ├── metrics.md           # KPIの計算
    ├── tables/              # ドメインごとに1ファイル
    │   ├── [domain1].md
    │   └── [domain2].md
    └── dashboards.json      # オプション: 既存のダッシュボードカタログ

SKILL.mdテンプレート: references/skill-template.mdを参照してください。

SQL方言セクション: references/sql-dialects.mdを参照し、適切な方言のメモを含めてください。

参照ファイルテンプレート: references/domain-template.mdを参照してください。

フェーズ4: パッケージ化と提供

スキルディレクトリ内のすべてのファイルを作成します。
zipファイルとしてパッケージ化します。
取得した内容の概要とともにユーザーに提示します。

イテレーションモード

使用場面: ユーザーが既存のスキルを持っているが、さらにコンテキストを追加する必要がある場合。

ステップ1: 既存のスキルをロードする

ユーザーに既存のスキル（zipまたはフォルダ）をアップロードしてもらうか、セッション内に既にある場合はそれを特定します。

現在のSKILL.mdと参照ファイルを読み込み、すでに文書化されている内容を理解します。

ステップ2: ギャップを特定する

質問: 「どのドメインやトピックにさらにコンテキストが必要ですか？どのクエリが失敗しているか、または間違った結果を生成していますか？」

一般的なギャップ:

新しいデータドメイン（マーケティング、財務、製品など）
メトリクス定義の欠落
未文書化のテーブル関係
新しい用語

ステップ3: ターゲットを絞った発見

特定されたドメインについて:

関連テーブルを探索する: ~~data warehouseスキーマツールを使用して、そのドメインのテーブルを見つけます。
ドメイン固有の質問をする:
- 「[ドメイン]分析にはどのテーブルが使用されますか？」
- 「[ドメイン]の主要なメトリクスは何ですか？」
- 「[ドメイン]データに特別なフィルターや落とし穴はありますか？」
新しい参照ファイルを生成する: ドメインテンプレートを使用してreferences/[domain].mdを作成します。

ステップ4: 更新と再パッケージ化

新しい参照ファイルを追加します。
SKILL.mdの「知識ベースナビゲーション」セクションを更新し、新しいドメインを含めます。
スキルを再パッケージ化します。
更新されたスキルをユーザーに提示します。

参照ファイルの標準

各参照ファイルには以下を含める必要があります。

テーブルドキュメントの場合

場所: 完全なテーブルパス
説明: このテーブルに含まれるもの、いつ使用するか
主キー: 行を一意に識別する方法
更新頻度: データの更新頻度
主要カラム: カラム名、型、説明、メモを含むテーブル
関係: このテーブルが他のテーブルとどのように結合するか
サンプルクエリ: 2～3の一般的なクエリパターン

メトリクスドキュメントの場合

メトリクス名: 人間が読める名前
定義: 平易な英語での説明
計算式: カラム参照を含む正確な計算式
ソーステーブル: データがどこから来るか
注意点: エッジケース、除外、落とし穴

エンティティドキュメントの場合

エンティティ名: 呼び名
定義: ビジネスにおける表現
主テーブル: このエンティティを見つける場所
IDフィールド: 識別方法
関係: 他のエンティティとの関係
共通フィルター: 標準的な除外（内部、テストなど）

品質チェックリスト

生成されたスキルを提供する前に、以下を確認してください。

[ ] SKILL.mdには完全なfr

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Data Context Extractor

A meta-skill that extracts company-specific data knowledge from analysts and generates tailored data analysis skills.

How It Works

This skill has two modes:

Bootstrap Mode: Create a new data analysis skill from scratch
Iteration Mode: Improve an existing skill by adding domain-specific reference files

Bootstrap Mode

Use when: User wants to create a new data context skill for their warehouse.

Phase 1: Database Connection & Discovery

Step 1: Identify the database type

Ask: "What data warehouse are you using?"

Common options:

BigQuery
Snowflake
PostgreSQL/Redshift
Databricks

Use ~~data warehouse tools (query and schema) to connect. If unclear, check available MCP tools in the current session.

Step 2: Explore the schema

Use ~~data warehouse schema tools to:

List available datasets/schemas
Identify the most important tables (ask user: "Which 3-5 tables do analysts query most often?")
Pull schema details for those key tables

Sample exploration queries by dialect:

-- BigQuery: List datasets
SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA

-- BigQuery: List tables in a dataset
SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`

-- Snowflake: List schemas
SHOW SCHEMAS IN DATABASE my_database

-- Snowflake: List tables
SHOW TABLES IN SCHEMA my_schema

Phase 2: Core Questions (Ask These)

After schema discovery, ask these questions conversationally (not all at once):

Entity Disambiguation (Critical)

"When people here say 'user' or 'customer', what exactly do they mean? Are there different types?"

Listen for:

Multiple entity types (user vs account vs organization)
Relationships between them (1:1, 1:many, many:many)
Which ID fields link them together

Primary Identifiers

"What's the main identifier for a [customer/user/account]? Are there multiple IDs for the same entity?"

Listen for:

Primary keys vs business keys
UUID vs integer IDs
Legacy ID systems

Key Metrics

"What are the 2-3 metrics people ask about most? How is each one calculated?"

Listen for:

Exact formulas (ARR = monthly_revenue × 12)
Which tables/columns feed each metric
Time period conventions (trailing 7 days, calendar month, etc.)

Data Hygiene

"What should ALWAYS be filtered out of queries? (test data, fraud, internal users, etc.)"

Listen for:

Standard WHERE clauses to always include
Flag columns that indicate exclusions (is_test, is_internal, is_fraud)
Specific values to exclude (status = 'deleted')

Common Gotchas

"What mistakes do new analysts typically make with this data?"

Listen for:

Confusing column names
Timezone issues
NULL handling quirks
Historical vs current state tables

Phase 3: Generate the Skill

Create a skill with this structure:

[company]-data-analyst/
├── SKILL.md
└── references/
    ├── entities.md          # Entity definitions and relationships
    ├── metrics.md           # KPI calculations
    ├── tables/              # One file per domain
    │   ├── [domain1].md
    │   └── [domain2].md
    └── dashboards.json      # Optional: existing dashboards catalog

SKILL.md Template: See references/skill-template.md

SQL Dialect Section: See references/sql-dialects.md and include the appropriate dialect notes.

Reference File Template: See references/domain-template.md

Phase 4: Package and Deliver

Create all files in the skill directory
Package as a zip file
Present to user with summary of what was captured

Iteration Mode

Use when: User has an existing skill but needs to add more context.

Step 1: Load Existing Skill

Ask user to upload their existing skill (zip or folder), or locate it if already in the session.

Read the current SKILL.md and reference files to understand what's already documented.

Step 2: Identify the Gap

Ask: "What domain or topic needs more context? What queries are failing or producing wrong results?"

Common gaps:

A new data domain (marketing, finance, product, etc.)
Missing metric definitions
Undocumented table relationships
New terminology

Step 3: Targeted Discovery

For the identified domain:

Explore relevant tables: Use ~~data warehouse schema tools to find tables in that domain
Ask domain-specific questions:
- "What tables are used for [domain] analysis?"
- "What are the key metrics for [domain]?"
- "Any special filters or gotchas for [domain] data?"
Generate new reference file: Create references/[domain].md using the domain template

Step 4: Update and Repackage

Add the new reference file
Update SKILL.md's "Knowledge Base Navigation" section to include the new domain
Repackage the skill
Present the updated skill to user

Reference File Standards

Each reference file should include:

For Table Documentation

Location: Full table path
Description: What this table contains, when to use it
Primary Key: How to uniquely identify rows
Update Frequency: How often data refreshes
Key Columns: Table with column name, type, description, notes
Relationships: How this table joins to others
Sample Queries: 2-3 common query patterns

For Metrics Documentation

Metric Name: Human-readable name
Definition: Plain English explanation
Formula: Exact calculation with column references
Source Table(s): Where the data comes from
Caveats: Edge cases, exclusions, gotchas

For Entity Documentation

Entity Name: What it's called
Definition: What it represents in the business
Primary Table: Where to find this entity
ID Field(s): How to identify it
Relationships: How it relates to other entities
Common Filters: Standard exclusions (internal, test, etc.)

Quality Checklist

Before delivering a generated skill, verify:

[ ] SKILL.md has complete frontmatter (name, description)
[ ] Entity disambiguation section is clear
[ ] Key terminology is defined
[ ] Standard filters/exclusions are documented
[ ] At least 2-3 sample queries per domain
[ ] SQL uses correct dialect syntax
[ ] Reference files are linked from SKILL.md navigation section

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (7,233 bytes)
📎 references/domain-template.md (3,490 bytes)
📎 references/example-output.md (6,247 bytes)
📎 references/skill-template.md (3,820 bytes)
📎 references/sql-dialects.md (4,557 bytes)
📎 scripts/package_data_skill.py (3,765 bytes)