🛠️ 開発・MCP コミュニティ 🔴 エンジニア向け 👤 エンジニア・AI開発者

🛠️ OnCallHandoffパターン集

on-call-handoff-patterns

オンコール担当者の引き継ぎ時に、情報共有とインシデント対応の継続性を確保するための効果的なパターンを提示するSkill。

⚡ ⏱ RAG構築 1週間 → 1日

📺 まず動画で見る(YouTube)

▶ 【衝撃】最強のAIエージェント「Claude Code」の最新機能・使い方・プログラミングをAIで効率化する超実践術を解説! ↗

※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。

📜 元の英語説明(参考)

Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts.

🇯🇵 日本人クリエイター向け解説

一言でいうと

オンコール担当者の引き継ぎ時に、情報共有とインシデント対応の継続性を確保するための効果的なパターンを提示するSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o on-call-handoff-patterns.zip https://jpskill.com/download/3254.zip && unzip -o on-call-handoff-patterns.zip && rm on-call-handoff-patterns.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/3254.zip -OutFile "$d\on-call-handoff-patterns.zip"; Expand-Archive "$d\on-call-handoff-patterns.zip" -DestinationPath $d -Force; ri "$d\on-call-handoff-patterns.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して on-call-handoff-patterns.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → on-call-handoff-patterns フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 1

💬 こう話しかけるだけ — サンプルプロンプト

› On Call Handoff Patterns を使って、最小構成のサンプルコードを示して
› On Call Handoff Patterns の主な使い方と注意点を教えて
› On Call Handoff Patterns を既存プロジェクトに組み込む方法を教えて

これをClaude Code に貼るだけで、このSkillが自動発動します。

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

[スキル名] on-call-handoff-patterns

オンコール引き継ぎパターン

オンコールシフトの移行を効果的に行うためのパターンです。これにより、シフト間の継続性、コンテキストの伝達、および信頼性の高いインシデント対応を確実にします。

このスキルを使用しない場合

タスクがオンコール引き継ぎパターンと無関係な場合
この範囲外の異なるドメインやツールが必要な場合

手順

目標、制約、および必要な入力を明確にしてください。
関連するベストプラクティスを適用し、結果を検証してください。
実用的な手順と検証を提供してください。
詳細な例が必要な場合は、resources/implementation-playbook.md を開いてください。

このスキルを使用する場合

オンコール担当の引き継ぎを行う場合
シフト引き継ぎの要約を作成する場合
進行中の調査を文書化する場合
オンコールローテーションの手順を確立する場合
引き継ぎの品質を向上させる場合
新しいオンコールエンジニアをオンボーディングする場合

コアコンセプト

1. 引き継ぎの構成要素

構成要素	目的
Active Incidents	現在発生している問題
Ongoing Investigations	デバッグ中の問題
Recent Changes	デプロイ、設定
Known Issues	実施中の回避策
Upcoming Events	メンテナンス、リリース

2. 引き継ぎのタイミング

推奨: シフト間に30分の重複時間

引き継ぎ元:
├── 15分: 引き継ぎドキュメントの作成
└── 15分: 引き継ぎ先との同期通話

引き継ぎ先:
├── 15分: 引き継ぎドキュメントの確認
├── 15分: 引き継ぎ元との同期通話
└── 5分: アラート設定の確認

テンプレート

テンプレート 1: シフト引き継ぎドキュメント

# オンコール引き継ぎ: プラットフォームチーム

**引き継ぎ元**: @alice (2024-01-15 から 2024-01-22)
**引き継ぎ先**: @bob (2024-01-22 から 2024-01-29)
**引き継ぎ時間**: 2024-01-22 09:00 UTC

---

## 🔴 アクティブなインシデント

### 現在アクティブなものはありません
引き継ぎ時点でアクティブなインシデントはありません。

---

## 🟡 進行中の調査

### 1. 断続的な API タイムアウト (ENG-1234)
**ステータス**: 調査中
**開始**: 2024-01-20
**影響**: リクエストの約0.1%がタイムアウト

**コンテキスト**:
- タイムアウトはデータベースのバックアップウィンドウ (02:00-03:00 UTC) と相関
- バックアッププロセスがロック競合を引き起こしている疑い
- PR #567 で追加のログを追加 (01/21 デプロイ済み)

**次のステップ**:
- [ ] 今夜のバックアップ後に新しいログを確認
- [ ] 確認された場合、バックアップウィンドウの移動を検討

**リソース**:
- ダッシュボード: [API Latency](https://grafana/d/api-latency)
- スレッド: #platform-eng (01/20, 14:32)

---

### 2. 認証サービスにおけるメモリ増加 (ENG-1235)
**ステータス**: 監視中
**開始**: 2024-01-18
**影響**: 現在なし (予防的)

**コンテキスト**:
- メモリ使用量が1日あたり約5%増加
- プロファイリングでメモリリークは見つからず
- コネクションプールが適切に解放されていない疑い

**次のステップ**:
- [ ] 01/21 のヒープダンプを確認
- [ ] 使用量が80%を超えた場合、再起動を検討

**リソース**:
- ダッシュボード: [Auth Service Memory](https://grafana/d/auth-memory)
- 分析ドキュメント: [Memory Investigation](https://docs/eng-1235)

---

## 🟢 このシフトで解決済み

### 決済サービス停止 (2024-01-19)
- **期間**: 23分
- **根本原因**: データベース接続の枯渇
- **解決策**: v2.3.4 をロールバック、プールサイズを増加
- **事後検証**: [POSTMORTEM-89](https://docs/postmortem-89)
- **フォローアップチケット**: ENG-1230, ENG-1231

---

## 📋 最近の変更

### デプロイ
| サービス | バージョン | 時間 | メモ |
|---------|---------|------|-------|
| api-gateway | v3.2.1 | 01/21 14:00 | ヘッダー解析のバグ修正 |
| user-service | v2.8.0 | 01/20 10:00 | 新しいプロファイル機能 |
| auth-service | v4.1.2 | 01/19 16:00 | セキュリティパッチ |

### 設定変更
- 01/21: API レート制限を 1000 RPS から 1500 RPS に増加
- 01/20: データベース接続プールの最大値を 50 から 75 に更新

### インフラストラクチャ
- 01/20: Kubernetes クラスターに2ノードを追加
- 01/19: Redis を 6.2 から 7.0 にアップグレード

---

## ⚠️ 既知の問題と回避策

### 1. ダッシュボードの読み込みが遅い
**問題**: 月曜日の朝に Grafana ダッシュボードが遅い
**回避策**: キャッシュウォームアップのため、08:00 UTC 後に5分待つ
**チケット**: OPS-456 (P3)

### 2. 不安定な統合テスト
**問題**: CI で `test_payment_flow` が断続的に失敗する
**回避策**: 失敗したジョブを再実行する (通常は再試行で成功)
**チケット**: ENG-1200 (P2)

---

## 📅 今後のイベント

| 日付 | イベント | 影響 | 連絡先 |
|------|-------|--------|---------|
| 01/23 02:00 | データベースメンテナンス | 5分間の読み取り専用 | @dba-team |
| 01/24 14:00 | メジャーリリース v5.0 | 厳重に監視 | @release-team |
| 01/25 | マーケティングキャンペーン | 2倍のトラフィックを予想 | @platform |

---

## 📞 エスカレーションリマインダー

| 問題の種類 | 最初のエスカレーション | 2回目のエスカレーション |
|------------|------------------|-------------------|
| 決済問題 | @payments-oncall | @payments-manager |
| 認証問題 | @auth-oncall | @security-team |
| データベース問題 | @dba-team | @infra-manager |
| 不明/重大 | @engineering-manager | @vp-engineering |

---

## 🔧 クイックリファレンス

### 一般的なコマンド
```bash
# サービスの状態を確認
kubectl get pods -A | grep -v Running

# 最近のデプロイ
kubectl get events --sort-by='.lastTimestamp' | tail -20

# データベース接続
psql -c "SELECT count(*) FROM pg_stat_activity;"

# キャッシュをクリア (緊急時のみ)
redis-cli FLUSHDB

重要なリンク

引き継ぎチェックリスト

引き継ぎ元エンジニア

[x] アクティブなインシデントを文書化
[x] 進行中の調査を文書化
[x] 最近の変更をリストアップ
[x] 既知の問題をメモ
[x] 今後のイベントを追加
[x] 引き継ぎ先エンジニアと同期

引き継ぎ先エンジニア

[ ] このドキュメントを読む
[ ] 同期通話に参加する
[ ] PagerDuty が自分にルーティングされていることを確認する
[ ] Slack 通知が機能していることを確認する
[ ] VPN/アクセスが機能していることを確認する
[ ] 重要なダッシュボードを確認する

テンプレート 2: クイック引き継ぎ (非同期)

# クイック引き継ぎ: @alice → @bob

## TL;DR
- アクティブなインシデントなし
- 1件の調査が進行中 (API タイムアウト、ENG-1234 を参照)
- 明日 (01/24) メジャーリリース - 問題に備えること

## ウォッチリスト
1. 02:00-03:00 UTC (バックアップウィンドウ) 前後の API レイテンシ
2. 認証サービスのメモリ

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

On-Call Handoff Patterns

Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts.

Do not use this skill when

The task is unrelated to on-call handoff patterns
You need a different domain or tool outside this scope

Instructions

Clarify goals, constraints, and required inputs.
Apply relevant best practices and validate outcomes.
Provide actionable steps and verification.
If detailed examples are required, open resources/implementation-playbook.md.

Use this skill when

Transitioning on-call responsibilities
Writing shift handoff summaries
Documenting ongoing investigations
Establishing on-call rotation procedures
Improving handoff quality
Onboarding new on-call engineers

Core Concepts

1. Handoff Components

Component	Purpose
Active Incidents	What's currently broken
Ongoing Investigations	Issues being debugged
Recent Changes	Deployments, configs
Known Issues	Workarounds in place
Upcoming Events	Maintenance, releases

2. Handoff Timing

Recommended: 30 min overlap between shifts

Outgoing:
├── 15 min: Write handoff document
└── 15 min: Sync call with incoming

Incoming:
├── 15 min: Review handoff document
├── 15 min: Sync call with outgoing
└── 5 min: Verify alerting setup

Templates

Template 1: Shift Handoff Document

# On-Call Handoff: Platform Team

**Outgoing**: @alice (2024-01-15 to 2024-01-22)
**Incoming**: @bob (2024-01-22 to 2024-01-29)
**Handoff Time**: 2024-01-22 09:00 UTC

---

## 🔴 Active Incidents

### None currently active
No active incidents at handoff time.

---

## 🟡 Ongoing Investigations

### 1. Intermittent API Timeouts (ENG-1234)
**Status**: Investigating
**Started**: 2024-01-20
**Impact**: ~0.1% of requests timing out

**Context**:
- Timeouts correlate with database backup window (02:00-03:00 UTC)
- Suspect backup process causing lock contention
- Added extra logging in PR #567 (deployed 01/21)

**Next Steps**:
- [ ] Review new logs after tonight's backup
- [ ] Consider moving backup window if confirmed

**Resources**:
- Dashboard: [API Latency](https://grafana/d/api-latency)
- Thread: #platform-eng (01/20, 14:32)

---

### 2. Memory Growth in Auth Service (ENG-1235)
**Status**: Monitoring
**Started**: 2024-01-18
**Impact**: None yet (proactive)

**Context**:
- Memory usage growing ~5% per day
- No memory leak found in profiling
- Suspect connection pool not releasing properly

**Next Steps**:
- [ ] Review heap dump from 01/21
- [ ] Consider restart if usage > 80%

**Resources**:
- Dashboard: [Auth Service Memory](https://grafana/d/auth-memory)
- Analysis doc: [Memory Investigation](https://docs/eng-1235)

---

## 🟢 Resolved This Shift

### Payment Service Outage (2024-01-19)
- **Duration**: 23 minutes
- **Root Cause**: Database connection exhaustion
- **Resolution**: Rolled back v2.3.4, increased pool size
- **Postmortem**: [POSTMORTEM-89](https://docs/postmortem-89)
- **Follow-up tickets**: ENG-1230, ENG-1231

---

## 📋 Recent Changes

### Deployments
| Service | Version | Time | Notes |
|---------|---------|------|-------|
| api-gateway | v3.2.1 | 01/21 14:00 | Bug fix for header parsing |
| user-service | v2.8.0 | 01/20 10:00 | New profile features |
| auth-service | v4.1.2 | 01/19 16:00 | Security patch |

### Configuration Changes
- 01/21: Increased API rate limit from 1000 to 1500 RPS
- 01/20: Updated database connection pool max from 50 to 75

### Infrastructure
- 01/20: Added 2 nodes to Kubernetes cluster
- 01/19: Upgraded Redis from 6.2 to 7.0

---

## ⚠️ Known Issues & Workarounds

### 1. Slow Dashboard Loading
**Issue**: Grafana dashboards slow on Monday mornings
**Workaround**: Wait 5 min after 08:00 UTC for cache warm-up
**Ticket**: OPS-456 (P3)

### 2. Flaky Integration Test
**Issue**: `test_payment_flow` fails intermittently in CI
**Workaround**: Re-run failed job (usually passes on retry)
**Ticket**: ENG-1200 (P2)

---

## 📅 Upcoming Events

| Date | Event | Impact | Contact |
|------|-------|--------|---------|
| 01/23 02:00 | Database maintenance | 5 min read-only | @dba-team |
| 01/24 14:00 | Major release v5.0 | Monitor closely | @release-team |
| 01/25 | Marketing campaign | 2x traffic expected | @platform |

---

## 📞 Escalation Reminders

| Issue Type | First Escalation | Second Escalation |
|------------|------------------|-------------------|
| Payment issues | @payments-oncall | @payments-manager |
| Auth issues | @auth-oncall | @security-team |
| Database issues | @dba-team | @infra-manager |
| Unknown/severe | @engineering-manager | @vp-engineering |

---

## 🔧 Quick Reference

### Common Commands
```bash
# Check service health
kubectl get pods -A | grep -v Running

# Recent deployments
kubectl get events --sort-by='.lastTimestamp' | tail -20

# Database connections
psql -c "SELECT count(*) FROM pg_stat_activity;"

# Clear cache (emergency only)
redis-cli FLUSHDB

Important Links

Handoff Checklist

Outgoing Engineer

[x] Document active incidents
[x] Document ongoing investigations
[x] List recent changes
[x] Note known issues
[x] Add upcoming events
[x] Sync with incoming engineer

Incoming Engineer

[ ] Read this document
[ ] Join sync call
[ ] Verify PagerDuty is routing to you
[ ] Verify Slack notifications working
[ ] Check VPN/access working
[ ] Review critical dashboards

Template 2: Quick Handoff (Async)

# Quick Handoff: @alice → @bob

## TL;DR
- No active incidents
- 1 investigation ongoing (API timeouts, see ENG-1234)
- Major release tomorrow (01/24) - be ready for issues

## Watch List
1. API latency around 02:00-03:00 UTC (backup window)
2. Auth service memory (restart if > 80%)

## Recent
- Deployed api-gateway v3.2.1 yesterday (stable)
- Increased rate limits to 1500 RPS

## Coming Up
- 01/23 02:00 - DB maintenance (5 min read-only)
- 01/24 14:00 - v5.0 release

## Questions?
I'll be available on Slack until 17:00 today.

Template 3: Incident Handoff (Mid-Incident)

# INCIDENT HANDOFF: Payment Service Degradation

**Incident Start**: 2024-01-22 08:15 UTC
**Current Status**: Mitigating
**Severity**: SEV2

---

## Current State
- Error rate: 15% (down from 40%)
- Mitigation in progress: scaling up pods
- ETA to resolution: ~30 min

## What We Know
1. Root cause: Memory pressure on payment-service pods
2. Triggered by: Unusual traffic spike (3x normal)
3. Contributing: Inefficient query in checkout flow

## What We've Done
- Scaled payment-service from 5 → 15 pods
- Enabled rate limiting on checkout endpoint
- Disabled non-critical features

## What Needs to Happen
1. Monitor error rate - should reach <1% in ~15 min
2. If not improving, escalate to @payments-manager
3. Once stable, begin root cause investigation

## Key People
- Incident Commander: @alice (handing off)
- Comms Lead: @charlie
- Technical Lead: @bob (incoming)

## Communication
- Status page: Updated at 08:45
- Customer support: Notified
- Exec team: Aware

## Resources
- Incident channel: #inc-20240122-payment
- Dashboard: [Payment Service](https://grafana/d/payments)
- Runbook: [Payment Degradation](https://wiki/runbooks/payments)

---

**Incoming on-call (@bob) - Please confirm you have:**
- [ ] Joined #inc-20240122-payment
- [ ] Access to dashboards
- [ ] Understand current state
- [ ] Know escalation path

Handoff Sync Meeting

Agenda (15 minutes)

## Handoff Sync: @alice → @bob

1. **Active Issues** (5 min)
   - Walk through any ongoing incidents
   - Discuss investigation status
   - Transfer context and theories

2. **Recent Changes** (3 min)
   - Deployments to watch
   - Config changes
   - Known regressions

3. **Upcoming Events** (3 min)
   - Maintenance windows
   - Expected traffic changes
   - Releases planned

4. **Questions** (4 min)
   - Clarify anything unclear
   - Confirm access and alerting
   - Exchange contact info

On-Call Best Practices

Before Your Shift

## Pre-Shift Checklist

### Access Verification
- [ ] VPN working
- [ ] kubectl access to all clusters
- [ ] Database read access
- [ ] Log aggregator access (Splunk/Datadog)
- [ ] PagerDuty app installed and logged in

### Alerting Setup
- [ ] PagerDuty schedule shows you as primary
- [ ] Phone notifications enabled
- [ ] Slack notifications for incident channels
- [ ] Test alert received and acknowledged

### Knowledge Refresh
- [ ] Review recent incidents (past 2 weeks)
- [ ] Check service changelog
- [ ] Skim critical runbooks
- [ ] Know escalation contacts

### Environment Ready
- [ ] Laptop charged and accessible
- [ ] Phone charged
- [ ] Quiet space available for calls
- [ ] Secondary contact identified (if traveling)

During Your Shift

## Daily On-Call Routine

### Morning (start of day)
- [ ] Check overnight alerts
- [ ] Review dashboards for anomalies
- [ ] Check for any P0/P1 tickets created
- [ ] Skim incident channels for context

### Throughout Day
- [ ] Respond to alerts within SLA
- [ ] Document investigation progress
- [ ] Update team on significant issues
- [ ] Triage incoming pages

### End of Day
- [ ] Hand off any active issues
- [ ] Update investigation docs
- [ ] Note anything for next shift

After Your Shift

## Post-Shift Checklist

- [ ] Complete handoff document
- [ ] Sync with incoming on-call
- [ ] Verify PagerDuty routing changed
- [ ] Close/update investigation tickets
- [ ] File postmortems for any incidents
- [ ] Take time off if shift was stressful

Escalation Guidelines

When to Escalate

## Escalation Triggers

### Immediate Escalation
- SEV1 incident declared
- Data breach suspected
- Unable to diagnose within 30 min
- Customer or legal escalation received

### Consider Escalation
- Issue spans multiple teams
- Requires expertise you don't have
- Business impact exceeds threshold
- You're uncertain about next steps

### How to Escalate
1. Page the appropriate escalation path
2. Provide brief context in Slack
3. Stay engaged until escalation acknowledges
4. Hand off cleanly, don't just disappear

Best Practices

Do's

Document everything - Future you will thank you
Escalate early - Better safe than sorry
Take breaks - Alert fatigue is real
Keep handoffs synchronous - Async loses context
Test your setup - Before incidents, not during

Don'ts

Don't skip handoffs - Context loss causes incidents
Don't hero - Escalate when needed
Don't ignore alerts - Even if they seem minor
Don't work sick - Swap shifts instead
Don't disappear - Stay reachable during shift

Resources

Limitations

Use this skill only when the task clearly matches the scope described above.
Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.