jpskill.com
🛠️ 開発・MCP コミュニティ

Causal Inference

Determine cause-and-effect relationships using propensity scoring, instrumental variables, and causal graphs for policy evaluation and treatment effects

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o causal-inference.zip https://jpskill.com/download/21357.zip && unzip -o causal-inference.zip && rm causal-inference.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/21357.zip -OutFile "$d\causal-inference.zip"; Expand-Archive "$d\causal-inference.zip" -DestinationPath $d -Force; ri "$d\causal-inference.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して causal-inference.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → causal-inference フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
2

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

因果推論

概要

因果推論は、相関関係を超えて何が何を引き起こすのかを理解し、原因と結果の関係を特定し、介入効果を推定します。

使用場面

  • 政策介入やビジネス上の意思決定の影響を評価する場合
  • ランダム化実験が実施できない場合に介入効果を推定する場合
  • 観察データにおける交絡変数を制御する場合
  • マーケティングキャンペーンや製品変更が結果を引き起こしたかどうかを判断する場合
  • 異なるユーザーセグメント間での異質な介入効果を分析する場合
  • プロペンシティスコアや操作変数を用いて非実験データから因果関係を主張する場合

主要な概念

  • 介入 (Treatment): 介入または曝露
  • 結果 (Outcome): 結果または帰結
  • 交絡 (Confounding): 介入と結果の両方に影響を与える変数
  • 因果グラフ (Causal Graph): 関係の視覚的表現
  • 介入効果 (Treatment Effect): 介入の影響
  • 選択バイアス (Selection Bias): 非ランダムな介入割り当て

因果推論の手法

  • ランダム化比較試験 (Randomized Controlled Trials, RCT): 黄金標準
  • プロペンシティスコアマッチング (Propensity Score Matching): 介入群と対照群のバランスを取る
  • 差の差分法 (Difference-in-Differences): 前後比較
  • 操作変数法 (Instrumental Variables): 内生性に対処する
  • 因果フォレスト (Causal Forests): 異質な介入効果

Pythonによる実装

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.preprocessing import StandardScaler
from scipy import stats

# Generate observational data with confounding
np.random.seed(42)

n = 1000

# Confounder: Age (affects both treatment and outcome)
age = np.random.uniform(25, 75, n)

# Treatment: Training program (more likely for younger people)
treatment_prob = 0.3 + 0.3 * (75 - age) / 50  # Inverse relationship with age
treatment = (np.random.uniform(0, 1, n) < treatment_prob).astype(int)

# Outcome: Salary (affected by both treatment and age)
# True causal effect of treatment: +$5000
salary = 40000 + 500 * age + 5000 * treatment + np.random.normal(0, 10000, n)

df = pd.DataFrame({
    'age': age,
    'treatment': treatment,
    'salary': salary,
})

print("Observational Data Summary:")
print(df.describe())
print(f"\nTreatment Rate: {df['treatment'].mean():.1%}")
print(f"Average Salary (Control): ${df[df['treatment']==0]['salary'].mean():.0f}")
print(f"Average Salary (Treatment): ${df[df['treatment']==1]['salary'].mean():.0f}")

# 1. Naive Comparison (BIASED - ignores confounding)
naive_effect = df[df['treatment']==1]['salary'].mean() - df[df['treatment']==0]['salary'].mean()
print(f"\n1. Naive Comparison: ${naive_effect:.0f} (BIASED)")

# 2. Regression Adjustment (Covariate Adjustment)
X = df[['treatment', 'age']]
y = df['salary']
model = LinearRegression()
model.fit(X, y)
regression_effect = model.coef_[0]

print(f"\n2. Regression Adjustment: ${regression_effect:.0f}")

# 3. Propensity Score Matching
# Estimate probability of treatment given covariates
ps_model = LogisticRegression()
ps_model.fit(df[['age']], df['treatment'])
df['propensity_score'] = ps_model.predict_proba(df[['age']])[:, 1]

print(f"\n3. Propensity Score Matching:")
print(f"PS range: [{df['propensity_score'].min():.3f}, {df['propensity_score'].max():.3f}]")

# Matching: find control for each treated unit
matched_pairs = []
treated_units = df[df['treatment'] == 1].index
for treated_idx in treated_units:
    treated_ps = df.loc[treated_idx, 'propensity_score']
    treated_age = df.loc[treated_idx, 'age']

    # Find closest control unit
    control_units = df[(df['treatment'] == 0) &
                      (df['propensity_score'] >= treated_ps - 0.1) &
                      (df['propensity_score'] <= treated_ps + 0.1)].index

    if len(control_units) > 0:
        closest_control = min(control_units,
                             key=lambda x: abs(df.loc[x, 'propensity_score'] - treated_ps))
        matched_pairs.append({
            'treated_idx': treated_idx,
            'control_idx': closest_control,
            'treated_salary': df.loc[treated_idx, 'salary'],
            'control_salary': df.loc[closest_control, 'salary'],
        })

matched_df = pd.DataFrame(matched_pairs)
psm_effect = (matched_df['treated_salary'] - matched_df['control_salary']).mean()
print(f"PSM Effect: ${psm_effect:.0f}")
print(f"Matched pairs: {len(matched_df)}")

# 4. Stratification by Propensity Score
df['ps_stratum'] = pd.qcut(df['propensity_score'], q=5, labels=False, duplicates='drop')

stratified_effects = []
for stratum in df['ps_stratum'].unique():
    stratum_data = df[df['ps_stratum'] == stratum]
    if (stratum_data['treatment'] == 0).sum() > 0 and (stratum_data['treatment'] == 1).sum() > 0:
        treated_mean = stratum_data[stratum_data['treatment'] == 1]['salary'].mean()
        control_mean = stratum_data[stratum_data['treatment'] == 0]['salary'].mean()
        effect = treated_mean - control_mean
        stratified_effects.append(effect)

stratified_effect = np.mean(stratified_effects)
print(f"\n4. Stratification by PS: ${stratified_effect:.0f}")

# 5. Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Treatment distribution by age
ax = axes[0, 0]
treated = df[df['treatment'] == 1]
control = df[df['treatment'] == 0]
ax.hist(control['age'], bins=20, alpha=0.6, label='Control', color='blue')
ax.hist(treated['age'], bins=20, alpha=0.6, label='Treated', color='red')
ax.set_xlabel('Age')
ax.set_ylabel('Frequency')
ax.set_title('Age Distribution by Treatment')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Salary vs Age (colored by treatment)
ax = axes[0, 1]
ax.scatter(control['age'], control['salary'], alpha=0.5, label='Control', s=30)
ax.scatter(treated['age'], treated['salary'], alpha=0.5, label='Treated', s=30, color='red')
ax.set_xlabel('Age')
ax.set_ylabel('Salary')
ax.set_title('Salary vs Age by Treatment')
ax.legend()
ax.grid(True, alpha=0.3)

# Propensity Score Distribution
ax = axes[1, 0]
ax.hist
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Causal Inference

Overview

Causal inference determines cause-and-effect relationships and estimates treatment effects, going beyond correlation to understand what causes what.

When to Use

  • Evaluating the impact of policy interventions or business decisions
  • Estimating treatment effects when randomized experiments aren't feasible
  • Controlling for confounding variables in observational data
  • Determining if a marketing campaign or product change caused an outcome
  • Analyzing heterogeneous treatment effects across different user segments
  • Making causal claims from non-experimental data using propensity scores or instrumental variables

Key Concepts

  • Treatment: Intervention or exposure
  • Outcome: Result or consequence
  • Confounding: Variables affecting both treatment and outcome
  • Causal Graph: Visual representation of relationships
  • Treatment Effect: Impact of intervention
  • Selection Bias: Non-random treatment assignment

Causal Methods

  • Randomized Controlled Trials (RCT): Gold standard
  • Propensity Score Matching: Balance treatment/control
  • Difference-in-Differences: Before/after comparison
  • Instrumental Variables: Handle endogeneity
  • Causal Forests: Heterogeneous treatment effects

Implementation with Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.preprocessing import StandardScaler
from scipy import stats

# Generate observational data with confounding
np.random.seed(42)

n = 1000

# Confounder: Age (affects both treatment and outcome)
age = np.random.uniform(25, 75, n)

# Treatment: Training program (more likely for younger people)
treatment_prob = 0.3 + 0.3 * (75 - age) / 50  # Inverse relationship with age
treatment = (np.random.uniform(0, 1, n) < treatment_prob).astype(int)

# Outcome: Salary (affected by both treatment and age)
# True causal effect of treatment: +$5000
salary = 40000 + 500 * age + 5000 * treatment + np.random.normal(0, 10000, n)

df = pd.DataFrame({
    'age': age,
    'treatment': treatment,
    'salary': salary,
})

print("Observational Data Summary:")
print(df.describe())
print(f"\nTreatment Rate: {df['treatment'].mean():.1%}")
print(f"Average Salary (Control): ${df[df['treatment']==0]['salary'].mean():.0f}")
print(f"Average Salary (Treatment): ${df[df['treatment']==1]['salary'].mean():.0f}")

# 1. Naive Comparison (BIASED - ignores confounding)
naive_effect = df[df['treatment']==1]['salary'].mean() - df[df['treatment']==0]['salary'].mean()
print(f"\n1. Naive Comparison: ${naive_effect:.0f} (BIASED)")

# 2. Regression Adjustment (Covariate Adjustment)
X = df[['treatment', 'age']]
y = df['salary']
model = LinearRegression()
model.fit(X, y)
regression_effect = model.coef_[0]

print(f"\n2. Regression Adjustment: ${regression_effect:.0f}")

# 3. Propensity Score Matching
# Estimate probability of treatment given covariates
ps_model = LogisticRegression()
ps_model.fit(df[['age']], df['treatment'])
df['propensity_score'] = ps_model.predict_proba(df[['age']])[:, 1]

print(f"\n3. Propensity Score Matching:")
print(f"PS range: [{df['propensity_score'].min():.3f}, {df['propensity_score'].max():.3f}]")

# Matching: find control for each treated unit
matched_pairs = []
treated_units = df[df['treatment'] == 1].index
for treated_idx in treated_units:
    treated_ps = df.loc[treated_idx, 'propensity_score']
    treated_age = df.loc[treated_idx, 'age']

    # Find closest control unit
    control_units = df[(df['treatment'] == 0) &
                      (df['propensity_score'] >= treated_ps - 0.1) &
                      (df['propensity_score'] <= treated_ps + 0.1)].index

    if len(control_units) > 0:
        closest_control = min(control_units,
                             key=lambda x: abs(df.loc[x, 'propensity_score'] - treated_ps))
        matched_pairs.append({
            'treated_idx': treated_idx,
            'control_idx': closest_control,
            'treated_salary': df.loc[treated_idx, 'salary'],
            'control_salary': df.loc[closest_control, 'salary'],
        })

matched_df = pd.DataFrame(matched_pairs)
psm_effect = (matched_df['treated_salary'] - matched_df['control_salary']).mean()
print(f"PSM Effect: ${psm_effect:.0f}")
print(f"Matched pairs: {len(matched_df)}")

# 4. Stratification by Propensity Score
df['ps_stratum'] = pd.qcut(df['propensity_score'], q=5, labels=False, duplicates='drop')

stratified_effects = []
for stratum in df['ps_stratum'].unique():
    stratum_data = df[df['ps_stratum'] == stratum]
    if (stratum_data['treatment'] == 0).sum() > 0 and (stratum_data['treatment'] == 1).sum() > 0:
        treated_mean = stratum_data[stratum_data['treatment'] == 1]['salary'].mean()
        control_mean = stratum_data[stratum_data['treatment'] == 0]['salary'].mean()
        effect = treated_mean - control_mean
        stratified_effects.append(effect)

stratified_effect = np.mean(stratified_effects)
print(f"\n4. Stratification by PS: ${stratified_effect:.0f}")

# 5. Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Treatment distribution by age
ax = axes[0, 0]
treated = df[df['treatment'] == 1]
control = df[df['treatment'] == 0]
ax.hist(control['age'], bins=20, alpha=0.6, label='Control', color='blue')
ax.hist(treated['age'], bins=20, alpha=0.6, label='Treated', color='red')
ax.set_xlabel('Age')
ax.set_ylabel('Frequency')
ax.set_title('Age Distribution by Treatment')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Salary vs Age (colored by treatment)
ax = axes[0, 1]
ax.scatter(control['age'], control['salary'], alpha=0.5, label='Control', s=30)
ax.scatter(treated['age'], treated['salary'], alpha=0.5, label='Treated', s=30, color='red')
ax.set_xlabel('Age')
ax.set_ylabel('Salary')
ax.set_title('Salary vs Age by Treatment')
ax.legend()
ax.grid(True, alpha=0.3)

# Propensity Score Distribution
ax = axes[1, 0]
ax.hist(df[df['treatment'] == 0]['propensity_score'], bins=20, alpha=0.6, label='Control', color='blue')
ax.hist(df[df['treatment'] == 1]['propensity_score'], bins=20, alpha=0.6, label='Treated', color='red')
ax.set_xlabel('Propensity Score')
ax.set_ylabel('Frequency')
ax.set_title('Propensity Score Distribution')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Treatment Effect Comparison
ax = axes[1, 1]
methods = ['Naive', 'Regression', 'PSM', 'Stratified']
effects = [naive_effect, regression_effect, psm_effect, stratified_effect]
true_effect = 5000

ax.bar(methods, effects, color=['red', 'orange', 'yellow', 'lightgreen'], alpha=0.7, edgecolor='black')
ax.axhline(y=true_effect, color='green', linestyle='--', linewidth=2, label=f'True Effect (${true_effect:.0f})')
ax.set_ylabel('Treatment Effect ($)')
ax.set_title('Treatment Effect Estimates by Method')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

for i, effect in enumerate(effects):
    ax.text(i, effect + 200, f'${effect:.0f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

# 6. Doubly Robust Estimation
from sklearn.ensemble import RandomForestRegressor

# Propensity score model
ps_model_dr = LogisticRegression().fit(df[['age']], df['treatment'])
ps_scores = ps_model_dr.predict_proba(df[['age']])[:, 1]

# Outcome model
outcome_model = RandomForestRegressor(n_estimators=50, random_state=42)
outcome_model.fit(df[['treatment', 'age']], df['salary'])

# Doubly robust estimator
treated_mask = df['treatment'] == 1
control_mask = df['treatment'] == 0

# Adjust for propensity score
treated_adjusted = (treated_mask.astype(int) * df['salary']) / (ps_scores + 0.01)
control_adjusted = (control_mask.astype(int) * df['salary']) / (1 - ps_scores + 0.01)

# Outcome predictions
pred_treated = outcome_model.predict(df[['treatment', 'age']].replace({'treatment': 0, 1: 1}))
pred_control = outcome_model.predict(df[['treatment', 'age']].replace({'treatment': 1, 0: 0}))

dr_effect = treated_adjusted.sum() / treated_mask.sum() - control_adjusted.sum() / control_mask.sum()
print(f"\n6. Doubly Robust Estimation: ${dr_effect:.0f}")

# 7. Heterogeneous Treatment Effects
print(f"\n7. Heterogeneous Treatment Effects (by Age Quartile):")

for age_q in pd.qcut(df['age'], q=4, duplicates='drop').unique():
    mask = (df['age'] >= age_q.left) & (df['age'] < age_q.right)
    stratum_data = df[mask]

    if (stratum_data['treatment'] == 0).sum() > 0 and (stratum_data['treatment'] == 1).sum() > 0:
        treated_mean = stratum_data[stratum_data['treatment'] == 1]['salary'].mean()
        control_mean = stratum_data[stratum_data['treatment'] == 0]['salary'].mean()
        effect = treated_mean - control_mean

        print(f"  Age {age_q.left:.0f}-{age_q.right:.0f}: ${effect:.0f}")

# 8. Sensitivity Analysis
print(f"\n8. Sensitivity Analysis (Hidden Confounder Impact):")

# Vary hidden confounder correlation with outcome
for hidden_effect in [1000, 2000, 5000, 10000]:
    adjusted_effect = regression_effect - hidden_effect * 0.1
    print(f"  If hidden confounder worth ${hidden_effect}: Effect = ${adjusted_effect:.0f}")

# 9. Summary Table
print(f"\n" + "="*60)
print("CAUSAL INFERENCE SUMMARY")
print("="*60)
print(f"True Treatment Effect: ${true_effect:,.0f}")
print(f"\nEstimates:")
print(f"  Naive (BIASED): ${naive_effect:,.0f}")
print(f"  Regression Adjustment: ${regression_effect:,.0f}")
print(f"  Propensity Score Matching: ${psm_effect:,.0f}")
print(f"  Stratification: ${stratified_effect:,.0f}")
print(f"  Doubly Robust: ${dr_effect:,.0f}")
print("="*60)

# 10. Causal Graph (Text representation)
print(f"\n10. Causal Graph (DAG):")
print(f"""
Age → Treatment ← (Selection Bias)
  ↓        ↓
  └─→ Salary

Interpretation:
- Age is a confounder
- Treatment causally affects Salary
- Age directly affects Salary
- Age affects probability of Treatment
""")

Causal Assumptions

  • Unconfoundedness: No unmeasured confounders
  • Overlap: Common support on propensity scores
  • SUTVA: No interference between units
  • Consistency: Single version of treatment

Treatment Effect Types

  • ATE: Average Treatment Effect (overall)
  • ATT: Average Treatment on Treated
  • CATE: Conditional Average Treatment Effect
  • HTE: Heterogeneous Treatment Effects

Method Strengths

  • RCT: Gold standard, controls all confounders
  • Matching: Balances groups, preserves overlap
  • Regression: Adjusts for covariates
  • Instrumental Variables: Handles endogeneity
  • Causal Forests: Learns heterogeneous effects

Deliverables

  • Causal graph visualization
  • Treatment effect estimates
  • Sensitivity analysis
  • Heterogeneous treatment effects
  • Covariate balance assessment
  • Propensity score diagnostics
  • Final causal inference report

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。