🛠️ 開発・MCP コミュニティ

debug-cuda-crash

Tutorial for debugging CUDA crashes using API logging

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o debug-cuda-crash.zip https://jpskill.com/download/19125.zip && unzip -o debug-cuda-crash.zip && rm debug-cuda-crash.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/19125.zip -OutFile "$d\debug-cuda-crash.zip"; Expand-Archive "$d\debug-cuda-crash.zip" -DestinationPath $d -Force; ri "$d\debug-cuda-crash.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して debug-cuda-crash.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → debug-cuda-crash フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

[Skill 名] debug-cuda-crash

チュートリアル: APIロギングによるCUDAクラッシュのデバッグ

このチュートリアルでは、@flashinfer_apiロギングデコレーターを使用して、FlashInferにおけるCUDAクラッシュとエラーをデバッグする方法を説明します。

目標

CUDAエラー（不正なメモリアクセス、範囲外アクセス、NaN/Inf）でコードがクラッシュした場合、APIロギングを使用して以下を行います。

クラッシュ発生前の入力テンソルをキャプチャする
問題の原因となったデータを理解する
パイプラインを通じてテンソルの形状、dtype、値を追跡する
数値的な問題（NaN、Inf、誤った形状）を検出する

なぜAPIロギングを使用するのか？

問題: CUDAエラーはしばしばプログラムをクラッシュさせ、デバッグ情報を残しません。

解決策: FlashInferの@flashinfer_apiデコレーターは、実行前にログを記録するため、プログラムが終了した後でもクラッシュの原因を確認できます。

ステップ1: APIロギングを有効にする

基本的なロギング（関数名のみ）

export FLASHINFER_LOGLEVEL=1        # 関数名をログに記録
export FLASHINFER_LOGDEST=stdout    # コンソールにログを出力

python my_script.py

出力:

[2025-12-18 10:30:45] FlashInfer API Call: batch_decode_with_padded_kv_cache

詳細なロギング（メタデータ付きの入出力）

export FLASHINFER_LOGLEVEL=3        # メタデータ付きの入出力をログに記録
export FLASHINFER_LOGDEST=debug.log # ファイルに保存

python my_script.py

debug.log内の出力:

================================================================================
[2025-12-18 10:30:45] FlashInfer API Logging - System Information
================================================================================
FlashInfer version: 0.6.0
CUDA toolkit version: 12.1
GPU 0: NVIDIA H100 PCIe
  Compute capability: 9.0 (SM90)
PyTorch version: 2.1.0
================================================================================

================================================================================
[2025-12-18 10:30:46] FlashInfer API Call: batch_decode_with_padded_kv_cache
--------------------------------------------------------------------------------
Positional input arguments:
  arg[0]:
    Tensor(
      shape=(32, 8, 128)
      dtype=torch.bfloat16
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
Keyword input arguments:
  kv_cache=
    Tensor(
      shape=(1024, 2, 8, 128)
      dtype=torch.bfloat16
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )

完全なロギング（テンソル統計情報付き）

export FLASHINFER_LOGLEVEL=5        # min/max/mean/nan/infをログに記録
export FLASHINFER_LOGDEST=debug.log

python my_script.py

追加出力:

  Tensor(
    shape=(32, 8, 128)
    dtype=torch.bfloat16
    device=cuda:0
    requires_grad=False
    is_contiguous=True
    min=-3.125000
    max=4.250000
    mean=0.015625
    nan_count=0
    inf_count=0
  )

ステップ2: クラッシュを再現する

例: 形状の不一致

コードが以下でクラッシュします。

RuntimeError: CUDA error: an illegal memory access was encountered

ロギングを有効にして再度実行します。

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=crash_log.txt

python my_script.py

ログにはクラッシュ前の入力が表示されます。

[2025-12-18 10:32:15] FlashInfer API Call: batch_decode_with_padded_kv_cache
Positional input arguments:
  arg[0]:
    Tensor(
      shape=(32, 8, 128)      # クエリテンソル
      ...
    )
Keyword input arguments:
  kv_cache=
    Tensor(
      shape=(1024, 2, 8, 64)  # ❌ 間違いです！ (..., 128)であるべきで、(..., 64)ではありません。
      ...
    )

バグを発見しました: head_dimの不一致（64 vs 128）

ステップ3: 一般的なCUDAエラーとそのデバッグ方法

エラー1: 不正なメモリアクセス

エラーメッセージ:

RuntimeError: CUDA error: an illegal memory access was encountered

ロギングを有効にする:

export FLASHINFER_LOGLEVEL=3
python my_script.py

ログで確認すべきこと:

✅ テンソルの形状が期待される次元と一致しているか
✅ すべてのテンソルがCUDA上にあるか（CPU上ではないか）
✅ テンソルのストライドが妥当か
✅ is_contiguous=Trueであるか（必要な場合）

一般的な原因:

テンソルの次元が間違っている
CPUテンソルがGPUカーネルに渡された
ストライドパターンが正しくない

エラー2: NaNまたはInf値

エラーメッセージ:

RuntimeError: Function ... returned nan or inf

統計ロギングを有効にする:

export FLASHINFER_LOGLEVEL=5        # レベル5はnan_count、inf_countを表示します
python my_script.py

ログで確認すべきこと:

Tensor(
  ...
  min=-1234567.000000     # ❌ 疑わしいほど大きい
  max=9876543.000000      # ❌ 疑わしいほど大きい
  mean=nan                # ❌ NaNが検出されました
  nan_count=128           # ❌ 128個のNaN値！
  inf_count=0
)

一般的な原因:

前の操作でのゼロ除算
数値のオーバーフロー/アンダーフロー
未初期化メモリ

エラー3: メモリ不足

エラーメッセージ:

RuntimeError: CUDA out of memory

ロギングを有効にする:

export FLASHINFER_LOGLEVEL=3
python my_script.py

ログで確認すべきこと:

✅ テンソルの形状（予期せず大きいか？）
✅ バッチサイズ
✅ シーケンス長

例:

Tensor(
  shape=(1024, 8192, 128, 128)  # ❌ 大きすぎます！ (1024, 128, 128)であるべきでは？
  ...
)

エラー4: Dtypeの間違い

エラーメッセージ:

RuntimeError: expected scalar type BFloat16 but found Float16

ロギングを有効にする:

export FLASHINFER_LOGLEVEL=3
python my_script.py

ログで確認すべきこと:

Tensor(
  dtype=torch.float16     # ❌ torch.bfloat16であるべきです
  ...
)

ステップ4: マルチプロセスデバッグ

複数のGPU/プロセスで実行する場合、%iパターンを使用します。

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=debug_rank_%i.txt    # %i = プロセスID

torchrun --nproc_per_node=4 my_script.py

これにより、個別のログが作成されます。

debug_rank_12345.txt (プロセス12345)
debug_rank_12346.txt (プロセス12346)
debug_rank_12347.txt (プロセス12347)
debug_rank_12348.txt (プロセス12348)

これで、各ランクを個別にデバッグできます。

ステップ5: compute-sanitizerによる高度なデバッグ

より困難なバグには、APIロギングとCUDAツールを組み合わせて使用します。

compute-sanitizerの使用（メモリ

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Tutorial: Debugging CUDA Crashes with API Logging

This tutorial shows you how to debug CUDA crashes and errors in FlashInfer using the @flashinfer_api logging decorator.

Goal

When your code crashes with CUDA errors (illegal memory access, out-of-bounds, NaN/Inf), use API logging to:

Capture input tensors BEFORE the crash occurs
Understand what data caused the problem
Track tensor shapes, dtypes, and values through your pipeline
Detect numerical issues (NaN, Inf, wrong shapes)

Why Use API Logging?

Problem: CUDA errors often crash the program, leaving no debugging information.

Solution: FlashInfer's @flashinfer_api decorator logs inputs BEFORE execution, so you can see what caused the crash even after the program terminates.

Step 1: Enable API Logging

Basic Logging (Function Names Only)

export FLASHINFER_LOGLEVEL=1        # Log function names
export FLASHINFER_LOGDEST=stdout    # Log to console

python my_script.py

Output:

[2025-12-18 10:30:45] FlashInfer API Call: batch_decode_with_padded_kv_cache

Detailed Logging (Inputs/Outputs with Metadata)

export FLASHINFER_LOGLEVEL=3        # Log inputs/outputs with metadata
export FLASHINFER_LOGDEST=debug.log # Save to file

python my_script.py

Output in debug.log:

================================================================================
[2025-12-18 10:30:45] FlashInfer API Logging - System Information
================================================================================
FlashInfer version: 0.6.0
CUDA toolkit version: 12.1
GPU 0: NVIDIA H100 PCIe
  Compute capability: 9.0 (SM90)
PyTorch version: 2.1.0
================================================================================

================================================================================
[2025-12-18 10:30:46] FlashInfer API Call: batch_decode_with_padded_kv_cache
--------------------------------------------------------------------------------
Positional input arguments:
  arg[0]:
    Tensor(
      shape=(32, 8, 128)
      dtype=torch.bfloat16
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
Keyword input arguments:
  kv_cache=
    Tensor(
      shape=(1024, 2, 8, 128)
      dtype=torch.bfloat16
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )

Full Logging (With Tensor Statistics)

export FLASHINFER_LOGLEVEL=5        # Log with min/max/mean/nan/inf
export FLASHINFER_LOGDEST=debug.log

python my_script.py

Additional output:

  Tensor(
    shape=(32, 8, 128)
    dtype=torch.bfloat16
    device=cuda:0
    requires_grad=False
    is_contiguous=True
    min=-3.125000
    max=4.250000
    mean=0.015625
    nan_count=0
    inf_count=0
  )

Step 2: Reproduce the Crash

Example: Shape Mismatch

Your code crashes with:

RuntimeError: CUDA error: an illegal memory access was encountered

Enable logging and run again:

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=crash_log.txt

python my_script.py

The log shows inputs before the crash:

[2025-12-18 10:32:15] FlashInfer API Call: batch_decode_with_padded_kv_cache
Positional input arguments:
  arg[0]:
    Tensor(
      shape=(32, 8, 128)      # Query tensor
      ...
    )
Keyword input arguments:
  kv_cache=
    Tensor(
      shape=(1024, 2, 8, 64)  # ❌ Wrong! Should be (..., 128) not (..., 64)
      ...
    )

Found the bug: head_dim mismatch (64 vs 128)

Step 3: Common CUDA Errors and How to Debug

Error 1: Illegal Memory Access

Error Message:

RuntimeError: CUDA error: an illegal memory access was encountered

Enable logging:

export FLASHINFER_LOGLEVEL=3
python my_script.py

What to check in logs:

✅ Tensor shapes match expected dimensions
✅ All tensors are on CUDA (not CPU)
✅ Tensor strides are reasonable
✅ is_contiguous=True (if required)

Common causes:

Wrong tensor dimensions
CPU tensor passed to GPU kernel
Incorrect stride patterns

Error 2: NaN or Inf Values

Error Message:

RuntimeError: Function ... returned nan or inf

Enable statistics logging:

export FLASHINFER_LOGLEVEL=5        # Level 5 shows nan_count, inf_count
python my_script.py

What to check in logs:

Tensor(
  ...
  min=-1234567.000000     # ❌ Suspiciously large
  max=9876543.000000      # ❌ Suspiciously large
  mean=nan                # ❌ NaN detected
  nan_count=128           # ❌ 128 NaN values!
  inf_count=0
)

Common causes:

Division by zero in previous operation
Numerical overflow/underflow
Uninitialized memory

Error 3: Out of Memory

Error Message:

RuntimeError: CUDA out of memory

Enable logging:

export FLASHINFER_LOGLEVEL=3
python my_script.py

What to check in logs:

✅ Tensor shapes (are they unexpectedly large?)
✅ Batch size
✅ Sequence length

Example:

Tensor(
  shape=(1024, 8192, 128, 128)  # ❌ Way too large! Should be (1024, 128, 128)?
  ...
)

Error 4: Wrong Dtype

Error Message:

RuntimeError: expected scalar type BFloat16 but found Float16

Enable logging:

export FLASHINFER_LOGLEVEL=3
python my_script.py

What to check in logs:

Tensor(
  dtype=torch.float16     # ❌ Should be torch.bfloat16
  ...
)

Step 4: Multi-Process Debugging

When running with multiple GPUs/processes, use %i pattern:

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=debug_rank_%i.txt    # %i = process ID

torchrun --nproc_per_node=4 my_script.py

This creates separate logs:

debug_rank_12345.txt (process 12345)
debug_rank_12346.txt (process 12346)
debug_rank_12347.txt (process 12347)
debug_rank_12348.txt (process 12348)

Now you can debug each rank independently.

Step 5: Advanced Debugging with compute-sanitizer

For harder bugs, combine API logging with CUDA tools:

Use compute-sanitizer (Memory Checker)

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=debug.log

compute-sanitizer --tool memcheck python my_script.py

Output shows exact memory errors:

========= COMPUTE-SANITIZER
========= Invalid __global__ write of size 4 bytes
=========     at 0x1234 in ScaleKernel<float>
=========     by thread (256,0,0) in block (10,0,0)
=========     Address 0x7f1234567890 is out of bounds

Check debug.log to see what inputs caused this kernel to fail.

Use cuda-gdb (Debugger)

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=debug.log

cuda-gdb --args python my_script.py

In gdb:

(cuda-gdb) run
(cuda-gdb) where     # Show stack trace when it crashes

Check debug.log for the inputs that led to the crash.

Step 6: Kernel-Level Debugging with printf()

You can use printf() inside CUDA kernels for debugging:

Basic Usage

__global__ void MyKernel(const float* input, float* output, int n) {
  int idx = blockIdx.x * blockDim.x + threadIdx.x;

  // Print from one thread to avoid spam
  if (threadIdx.x == 0 && blockIdx.x == 0) {
    printf("n=%d, input[0]=%f\n", n, input[0]);
  }

  if (idx < n) {
    output[idx] = input[idx] * 2.0f;
  }
}

Important: Flush printf buffer after kernel:

my_kernel(input, output)
torch.cuda.synchronize()  # ← Flushes printf output

⚠️ Warp-Specialized Kernels: Choosing the Right Print Thread

Problem: threadIdx.x == 0 doesn't work for all warps (warp starting at thread 32 won't have thread 0).

Solution: Choose one representative thread per specialization group.

__global__ void WarpSpecializedKernel(...) {
  // Define your group's representative thread
  // e.g., first thread of each warp: threadIdx.x % 32 == 0
  // e.g., first thread of each 4-warp group: threadIdx.x % 128 == 0

  if (is_group_representative) {
    printf("Group %d processing\n", group_id);
  }
}

Common mistake ❌:

// ❌ Only warp 0 will print!
if (threadIdx.x == 0) {
  printf("Warp %d processing\n", threadIdx.x / 32);
}

Quick Reference

Kernel Type	Print Condition	Notes
Simple kernel	`threadIdx.x == 0`	One thread per block
Warp-specialized	One thread per group	Depends on kernel design

Other Kernel Debugging Tools

// Assert for invariants
assert(value >= 0.0f && "Value must be non-negative");

// Compile-time checks
static_assert(BLOCK_SIZE % 32 == 0, "BLOCK_SIZE must be multiple of warp size");

Environment Variables Reference

Variable	Values	Description
`FLASHINFER_LOGLEVEL`	`0`	No logging (default)
	`1`	Function names only
	`3`	Inputs/outputs with metadata
	`5`	+ Tensor statistics (min/max/mean/nan/inf)
`FLASHINFER_LOGDEST`	`stdout`	Log to console (default)
	`stderr`	Log to stderr
	`<path>`	Log to file
	`log_%i.txt`	Multi-process: %i = process ID

Best Practices

1. Always Start with Level 3

export FLASHINFER_LOGLEVEL=3

Level 3 provides tensor metadata (shape, dtype, device) without overwhelming output.

2. Use Level 5 for Numerical Issues

export FLASHINFER_LOGLEVEL=5

Only use level 5 when debugging NaN/Inf problems (adds statistics).

3. Log to File for Crashes

export FLASHINFER_LOGDEST=crash_log.txt

Console output may be lost when program crashes. File logs persist.

4. Compare Before/After

Enable logging and compare:

Last successful API call (inputs logged, outputs logged) ✅
First failed API call (inputs logged, no outputs) ❌ ← This is where it crashed!

5. Disable Logging in Production

unset FLASHINFER_LOGLEVEL   # or export FLASHINFER_LOGLEVEL=0

Logging has zero overhead when disabled (decorator returns original function).

Troubleshooting

No Logs Appearing

Problem: Set FLASHINFER_LOGLEVEL=3 but no logs appear

Solutions:

Check if API has the decorator: Not all FlashInfer APIs have @flashinfer_api yet (work in progress)

Verify environment variable:

echo $FLASHINFER_LOGLEVEL    # Should print "3"

Check log destination:

echo $FLASHINFER_LOGDEST     # Should print path or "stdout"

Too Much Output

Problem: Level 5 produces too much output

Solution: Use level 3 instead:

export FLASHINFER_LOGLEVEL=3   # Skip tensor statistics

Statistics Skipped in CUDA Graph

Warning: [statistics skipped: CUDA graph capture in progress]

What it means: Level 5 statistics are automatically skipped during CUDA graph capture (to avoid synchronization)

This is normal: The framework protects you from graph capture issues.

Quick Examples

Debug Shape Mismatch

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=stdout
python my_script.py
# Check tensor shapes in output

Debug NaN/Inf

export FLASHINFER_LOGLEVEL=5         # Show statistics
export FLASHINFER_LOGDEST=debug.log
python my_script.py
# Check nan_count and inf_count in debug.log

Debug Multi-GPU Training

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=rank_%i.log   # Separate log per rank
torchrun --nproc_per_node=8 train.py
# Check rank_*.log files

Combine with Memory Checker

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=inputs.log
compute-sanitizer --tool memcheck python my_script.py
# inputs.log shows what data caused the memory error

Example: Full Debug Session

Your code crashes:

import torch
from flashinfer import batch_decode_with_padded_kv_cache

q = torch.randn(32, 8, 128, dtype=torch.bfloat16, device="cuda")
kv = torch.randn(1024, 2, 8, 64, dtype=torch.bfloat16, device="cuda")  # Wrong dim!

output = batch_decode_with_padded_kv_cache(q, kv)  # ❌ Crashes

Enable logging:

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=debug.log
python test.py

Check debug.log:

[2025-12-18 10:45:23] FlashInfer API Call: batch_decode_with_padded_kv_cache
Positional input arguments:
  arg[0]:
    Tensor(
      shape=(32, 8, 128)
      dtype=torch.bfloat16
      ...
    )
  arg[1]:
    Tensor(
      shape=(1024, 2, 8, 64)    # ❌ Found it! Last dim should be 128
      dtype=torch.bfloat16
      ...
    )

Fix the bug:

kv = torch.randn(1024, 2, 8, 128, dtype=torch.bfloat16, device="cuda")  # ✅ Fixed

Success!

python test.py
# No crash, outputs logged successfully

Summary

Enable logging before the crash:

export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST=debug.log

Run your code - inputs are logged BEFORE crash
Check the log - last API call shows what caused the crash
Fix the issue based on logged input metadata
Disable logging when done:
```
export FLASHINFER_LOGLEVEL=0
```

debug-cuda-crash

🎯 このSkillでできること

📦 インストール方法 (3ステップ)

📖 Skill本文(日本語訳)

チュートリアル: APIロギングによるCUDAクラッシュのデバッグ

目標

なぜAPIロギングを使用するのか？

ステップ1: APIロギングを有効にする

基本的なロギング（関数名のみ）

詳細なロギング（メタデータ付きの入出力）

完全なロギング（テンソル統計情報付き）

ステップ2: クラッシュを再現する

例: 形状の不一致

ステップ3: 一般的なCUDAエラーとそのデバッグ方法

エラー1: 不正なメモリアクセス

エラー2: NaNまたはInf値

エラー3: メモリ不足

エラー4: Dtypeの間違い

ステップ4: マルチプロセスデバッグ

ステップ5: compute-sanitizerによる高度なデバッグ

compute-sanitizerの使用（メモリ

Tutorial: Debugging CUDA Crashes with API Logging

Goal

Why Use API Logging?

Step 1: Enable API Logging

Basic Logging (Function Names Only)

Detailed Logging (Inputs/Outputs with Metadata)

Full Logging (With Tensor Statistics)

Step 2: Reproduce the Crash

Example: Shape Mismatch

Step 3: Common CUDA Errors and How to Debug

Error 1: Illegal Memory Access

Error 2: NaN or Inf Values

Error 3: Out of Memory

Error 4: Wrong Dtype

Step 4: Multi-Process Debugging

Step 5: Advanced Debugging with compute-sanitizer

Use compute-sanitizer (Memory Checker)

Use cuda-gdb (Debugger)

Step 6: Kernel-Level Debugging with printf()

Basic Usage

⚠️ Warp-Specialized Kernels: Choosing the Right Print Thread

Quick Reference

Other Kernel Debugging Tools

Environment Variables Reference

Best Practices

1. Always Start with Level 3

2. Use Level 5 for Numerical Issues

3. Log to File for Crashes

4. Compare Before/After

5. Disable Logging in Production

Troubleshooting

No Logs Appearing

Too Much Output

Statistics Skipped in CUDA Graph

Quick Examples

Debug Shape Mismatch

Debug NaN/Inf

Debug Multi-GPU Training

Combine with Memory Checker

Example: Full Debug Session

Your code crashes:

Enable logging:

Check debug.log:

Fix the bug:

Success!

Summary

Related Documentation