🛠️ 開発・MCP コミュニティ 🔴 エンジニア向け 👤 エンジニア・AI開発者

🛠️ HuggingFaceモデルTrainer

hugging-face-model-trainer

Hugging Faceのプラットフォーム上で、言語

⚡ ⏱ 障害ポストモーテム 1日 → 1時間

📺 まず動画で見る(YouTube)

▶ 【衝撃】最強のAIエージェント「Claude Code」の最新機能・使い方・プログラミングをAIで効率化する超実践術を解説! ↗

※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。

📜 元の英語説明(参考)

Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export.

🇯🇵 日本人クリエイター向け解説

一言でいうと

Hugging Faceのプラットフォーム上で、言語

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o hugging-face-model-trainer.zip https://jpskill.com/download/2993.zip && unzip -o hugging-face-model-trainer.zip && rm hugging-face-model-trainer.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/2993.zip -OutFile "$d\hugging-face-model-trainer.zip"; Expand-Archive "$d\hugging-face-model-trainer.zip" -DestinationPath $d -Force; ri "$d\hugging-face-model-trainer.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して hugging-face-model-trainer.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → hugging-face-model-trainer フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 18

💬 こう話しかけるだけ — サンプルプロンプト

› Hugging Face Model Trainer を使って、最小構成のサンプルコードを示して
› Hugging Face Model Trainer の主な使い方と注意点を教えて
› Hugging Face Model Trainer を既存プロジェクトに組み込む方法を教えて

これをClaude Code に貼るだけで、このSkillが自動発動します。

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

[スキル名] hugging-face-model-trainer

Hugging Face JobsでのTRLトレーニング

概要

TRL (Transformer Reinforcement Learning) を使用して、完全に管理されたHugging Faceインフラストラクチャ上で言語モデルをトレーニングします。ローカルGPUのセットアップは不要です。モデルはクラウドGPUでトレーニングされ、結果は自動的にHugging Face Hubに保存されます。

TRLは複数のトレーニング方法を提供します。

SFT (Supervised Fine-Tuning) - 標準的な命令チューニング
DPO (Direct Preference Optimization) - 優先データからのアライメント
GRPO (Group Relative Policy Optimization) - オンラインRLトレーニング
Reward Modeling - RLHF用の報酬モデルのトレーニング

TRLメソッドの詳細なドキュメントについては、以下を参照してください。

hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO
# etc.

こちらも参照してください。 メソッドの概要と選択ガイダンスについては、references/training_methods.md を参照してください。

このスキルを使用するタイミング

ユーザーが以下のことを望む場合に、このスキルを使用します。

ローカルインフラストラクチャなしでクラウドGPU上で言語モデルをファインチューニングする
TRLメソッド (SFT、DPO、GRPOなど) でトレーニングする
Hugging Face Jobsインフラストラクチャでトレーニングジョブを実行する
トレーニング済みモデルをローカルデプロイメント (Ollama、LM Studio、llama.cpp) 用にGGUFに変換する
トレーニング済みモデルがHubに永続的に保存されることを保証する
最適化されたデフォルト設定で最新のワークフローを使用する

Unslothを使用するタイミング

以下の場合は、標準のTRLではなくUnsloth (references/unsloth.md) を使用してください。

GPUメモリが限られている場合 - UnslothはVRAMを約60%削減します。
速度が重要な場合 - Unslothは約2倍高速です。
大規模モデル (>13B) をトレーニングする場合 - メモリ効率が重要です。
Vision-Language Models (VLMs) をトレーニングする場合 - UnslothはFastVisionModelをサポートしています。

Unslothの完全なドキュメントについてはreferences/unsloth.mdを、本番環境に対応したトレーニングスクリプトについてはscripts/unsloth_sft_example.pyを参照してください。

主要な指示

トレーニングジョブを支援する際には、以下の点に注意してください。

常にhf_jobs() MCPツールを使用してください - hf_jobs("uv", {...}) を使用してジョブを送信し、bashのtrl-jobsコマンドは使用しないでください。scriptパラメーターはPythonコードを直接受け入れます。ユーザーが明示的に要求しない限り、ローカルファイルに保存しないでください。スクリプトの内容を文字列としてhf_jobs()に渡してください。ユーザーが「モデルをトレーニングする」、「ファインチューニングする」などのリクエストをした場合、トレーニングスクリプトを作成し、hf_jobs()を使用してすぐにジョブを送信する必要があります。
常にTrackioを含めてください - すべてのトレーニングスクリプトには、リアルタイム監視のためにTrackioを含める必要があります。scripts/内のサンプルスクリプトをテンプレートとして使用してください。
送信後にジョブの詳細を提供してください - 送信後、ジョブID、監視URL、推定時間を提供し、後でユーザーがステータスチェックを要求できることを伝えてください。
サンプルスクリプトをテンプレートとして使用してください - scripts/train_sft_example.py、scripts/train_dpo_example.pyなどを出発点として参照してください。

ローカルスクリプトの実行

リポジトリのスクリプトはPEP 723のインライン依存関係を使用しています。uv runで実行してください。

uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help

前提条件チェックリスト

トレーニングジョブを開始する前に、以下を確認してください。

✅ アカウントと認証

Hugging FaceアカウントがPro、Team、またはEnterpriseプランであること (Jobsには有料プランが必要です)
認証済みログインであること: hf_whoami()で確認してください。
Hubプッシュ用のHF_TOKEN ⚠️ 重要 - トレーニング環境は一時的なものです。Hubにプッシュしないと、すべてのトレーニング結果が失われます。
トークンには書き込み権限が必要です。
トークンを利用可能にするために、ジョブ設定でsecrets={"HF_TOKEN": "$HF_TOKEN"}を渡す必要があります ( $HF_TOKEN構文は実際のトークン値を参照します)。

✅ データセットの要件

データセットがHubに存在するか、datasets.load_dataset()でロード可能であること。
フォーマットがトレーニング方法と一致していること (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)。
GPUトレーニングの前に、常に未知のデータセットを検証してください。フォーマットの失敗を防ぐためです (下記のデータセット検証セクションを参照)。
ハードウェアに適したサイズであること (デモ: t4-smallで50-100例; 本番: a10g-large/a100-largeで1K-10K+例)。

⚠️ 重要な設定

タイムアウトは予想されるトレーニング時間を超えている必要があります - デフォルトの30分はほとんどのトレーニングには短すぎます。推奨される最小値は1〜2時間です。タイムアウトを超過するとジョブは失敗し、すべての進行状況が失われます。
Hubプッシュが有効になっている必要があります - 設定: push_to_hub=True、hub_model_id="username/model-name"; ジョブ: secrets={"HF_TOKEN": "$HF_TOKEN"}。

非同期ジョブのガイドライン

⚠️ 重要: トレーニングジョブは非同期で実行され、数時間かかる場合があります。

必要なアクション

ユーザーがトレーニングを要求した場合:

Trackioを含むトレーニングスクリプトを作成します (scripts/train_sft_example.pyをテンプレートとして使用してください)。
スクリプトの内容をインラインでhf_jobs() MCPツールを使用してすぐに送信します。ユーザーが要求しない限りファイルに保存しないでください。
ジョブID、監視URL、推定時間とともに送信を報告します。
ユーザーがステータスチェックを要求するまで待ちます。自動的にポーリングしないでください。

基本ルール

ジョブはバックグラウンドで実行されます - 送信はすぐに完了し、トレーニングは独立して続行されます。
初期ログは遅延します - ログが表示されるまでに30〜60秒かかる場合があります。
ユーザーがステータスを確認します - ユーザーがステータス更新を要求するまで待ちます。
ポーリングを避けてください - ユーザーの要求があった場合にのみログを確認し、代わりに監視リンクを提供してください。

送信後

ユーザーに提供するもの:

✅ ジョブIDと監視URL
✅ 予想される完了時間
✅ TrackioダッシュボードURL
✅ 後でユーザーがステータスチェックを要求できる旨のメモ

応答例:

✅ ジョブが正常に送信されました！

ジョブID: abc123xyz
モニター: https://huggingface.co/jobs/username/abc123xyz

予想時間: 約2時間
推定コスト: 約$10

ジョブはバックグラウンドで実行中です。準備ができたら、ステータス/ログの確認を依頼してください！

クイックスタート: 3つのアプローチ

💡 デモのヒント: 小規模なGPU (t4-small) でのクイックデモの場合、eval_datasetとeval_strategyを省略すると、メモリを約40%節約できます。トレーニング損失と学習の進行状況は引き続き確認できます。

シーケンス長の構成

(原文がここで途切れています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

TRL Training on Hugging Face Jobs

Overview

Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.

TRL provides multiple training methods:

SFT (Supervised Fine-Tuning) - Standard instruction tuning
DPO (Direct Preference Optimization) - Alignment from preference data
GRPO (Group Relative Policy Optimization) - Online RL training
Reward Modeling - Train reward models for RLHF

For detailed TRL method documentation:

hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO
# etc.

See also: references/training_methods.md for method overviews and selection guidance

When to Use This Skill

Use this skill when users want to:

Fine-tune language models on cloud GPUs without local infrastructure
Train with TRL methods (SFT, DPO, GRPO, etc.)
Run training jobs on Hugging Face Jobs infrastructure
Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
Ensure trained models are permanently saved to the Hub
Use modern workflows with optimized defaults

When to Use Unsloth

Use Unsloth (references/unsloth.md) instead of standard TRL when:

Limited GPU memory - Unsloth uses ~60% less VRAM
Speed matters - Unsloth is ~2x faster
Training large models (>13B) - memory efficiency is critical
Training Vision-Language Models (VLMs) - Unsloth has FastVisionModel support

See references/unsloth.md for complete Unsloth documentation and scripts/unsloth_sft_example.py for a production-ready training script.

Key Directives

When assisting with training jobs:

ALWAYS use hf_jobs() MCP tool - Submit jobs using hf_jobs("uv", {...}), NOT bash trl-jobs commands. The script parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to hf_jobs(). If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using hf_jobs().
Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in scripts/ as templates.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
Use example scripts as templates - Reference scripts/train_sft_example.py, scripts/train_dpo_example.py, etc. as starting points.

Local Script Execution

Repository scripts use PEP 723 inline dependencies. Run them with uv run:

uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help

Prerequisites Checklist

Before starting any training job, verify:

✅ Account & Authentication

Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
Authenticated login: Check with hf_whoami()
HF_TOKEN for Hub Push ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
Token must have write permissions
MUST pass secrets={"HF_TOKEN": "$HF_TOKEN"} in job config to make token available (the $HF_TOKEN syntax references your actual token value)

✅ Dataset Requirements

Dataset must exist on Hub or be loadable via datasets.load_dataset()
Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
ALWAYS validate unknown datasets before GPU training to prevent format failures (see Dataset Validation section below)
Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)

⚠️ Critical Settings

Timeout must exceed expected training time - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
Hub push must be enabled - Config: push_to_hub=True, hub_model_id="username/model-name"; Job: secrets={"HF_TOKEN": "$HF_TOKEN"}

Asynchronous Job Guidelines

⚠️ IMPORTANT: Training jobs run asynchronously and can take hours

Action Required

When user requests training:

Create the training script with Trackio included (use scripts/train_sft_example.py as template)
Submit immediately using hf_jobs() MCP tool with script content inline - don't save to file unless user requests
Report submission with job ID, monitoring URL, and estimated time
Wait for user to request status checks - don't poll automatically

Ground Rules

Jobs run in background - Submission returns immediately; training continues independently
Initial logs delayed - Can take 30-60 seconds for logs to appear
User checks status - Wait for user to request status updates
Avoid polling - Check logs only on user request; provide monitoring links instead

After Submission

Provide to user:

✅ Job ID and monitoring URL
✅ Expected completion time
✅ Trackio dashboard URL
✅ Note that user can request status checks later

Example Response:

✅ Job submitted successfully!

Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz

Expected time: ~2 hours
Estimated cost: ~$10

The job is running in the background. Ask me to check status/logs when ready!

Quick Start: Three Approaches

💡 Tip for Demos: For quick demos on smaller GPUs (t4-small), omit eval_dataset and eval_strategy to save ~40% memory. You'll still see training loss and learning progress.

Sequence Length Configuration

TRL config classes use max_length (not max_seq_length) to control tokenized sequence length:

# ✅ CORRECT - If you need to set sequence length
SFTConfig(max_length=512)   # Truncate sequences to 512 tokens
DPOConfig(max_length=2048)  # Longer context (2048 tokens)

# ❌ WRONG - This parameter doesn't exist
SFTConfig(max_seq_length=512)  # TypeError!

Default behavior: max_length=1024 (truncates from right). This works well for most training.

When to override:

Longer context: Set higher (e.g., max_length=2048)
Memory constraints: Set lower (e.g., max_length=512)
Vision models: Set max_length=None (prevents cutting image tokens)

Usually you don't need to set this parameter at all - the examples below use the sensible default.

Approach 1: UV Scripts (Recommended—Default Choice)

UV scripts use PEP 723 inline dependencies for clean, self-contained training. This is the primary approach for Claude Code.

hf_jobs("uv", {
    "script": """
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
# ///

from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import trackio

dataset = load_dataset("trl-lib/Capybara", split="train")

# Create train/eval split for monitoring
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset_split["train"],
    eval_dataset=dataset_split["test"],
    peft_config=LoraConfig(r=16, lora_alpha=32),
    args=SFTConfig(
        output_dir="my-model",
        push_to_hub=True,
        hub_model_id="username/my-model",
        num_train_epochs=3,
        eval_strategy="steps",
        eval_steps=50,
        report_to="trackio",
        project="meaningful_prject_name", # project name for the training name (trackio)
        run_name="meaningful_run_name",   # descriptive name for the specific training run (trackio)
    )
)

trainer.train()
trainer.push_to_hub()
""",
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

Benefits: Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control When to use: Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring hf_jobs()

Working with Scripts

⚠️ Important: The script parameter accepts either inline code (as shown above) OR a URL. Local file paths do NOT work.

Why local paths don't work: Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:

Inline code (recommended for custom training)
Publicly accessible URLs
Private repo URLs (with HF_TOKEN)

Common mistakes:

# ❌ These will all fail
hf_jobs("uv", {"script": "train.py"})
hf_jobs("uv", {"script": "./scripts/train.py"})
hf_jobs("uv", {"script": "/path/to/train.py"})

Correct approaches:

# ✅ Inline code (recommended)
hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})

# ✅ From Hugging Face Hub
hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})

# ✅ From GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})

# ✅ From Gist
hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})

To use local scripts: Upload to HF Hub first:

hf repos create my-training-scripts --type model
hf upload my-training-scripts ./train.py train.py
# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py

Approach 2: TRL Maintained Scripts (Official Examples)

TRL provides battle-tested scripts for all methods. Can be run from URLs:

hf_jobs("uv", {
    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
        "--dataset_name", "trl-lib/Capybara",
        "--output_dir", "my-model",
        "--push_to_hub",
        "--hub_model_id", "username/my-model"
    ],
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

Benefits: No code to write, maintained by TRL team, production-tested When to use: Standard TRL training, quick experiments, don't need custom code Available: Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts

Finding More UV Scripts on Hub

The uv-scripts organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:

# Discover available UV script collections
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

# Explore a specific collection
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)

Popular collections: ocr, classification, synthetic-data, vllm, dataset-creation

Approach 3: HF Jobs CLI (Direct Terminal Commands)

When the hf_jobs() MCP tool is unavailable, use the hf jobs CLI directly.

⚠️ CRITICAL: CLI Syntax Rules

# ✅ CORRECT syntax - flags BEFORE script URL
hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"

# ❌ WRONG - "run uv" instead of "uv run"
hf jobs run uv "https://example.com/train.py" --flavor a10g-large

# ❌ WRONG - flags AFTER script URL (will be ignored!)
hf jobs uv run "https://example.com/train.py" --flavor a10g-large

# ❌ WRONG - "--secret" instead of "--secrets" (plural)
hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"

Key syntax rules:

Command order is hf jobs uv run (NOT hf jobs run uv)
All flags (--flavor, --timeout, --secrets) must come BEFORE the script URL
Use --secrets (plural), not --secret
Script URL must be the last positional argument

Complete CLI example:

hf jobs uv run \
  --flavor a10g-large \
  --timeout 2h \
  --secrets HF_TOKEN \
  "https://huggingface.co/user/repo/resolve/main/train.py"

Check job status via CLI:

hf jobs ps                        # List all jobs
hf jobs logs <job-id>             # View logs
hf jobs inspect <job-id>          # Job details
hf jobs cancel <job-id>           # Cancel a job

Approach 4: TRL Jobs Package (Simplified Training)

The trl-jobs package provides optimized defaults and one-liner training.

uvx trl-jobs sft \
  --model_name Qwen/Qwen2.5-0.5B \
  --dataset_name trl-lib/Capybara

Benefits: Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands When to use: User working in terminal directly (not Claude Code context), quick local experimentation Repository: https://github.com/huggingface/trl-jobs

⚠️ In Claude Code context, prefer using hf_jobs() MCP tool (Approach 1) when available.

Hardware Selection

Model Size	Recommended Hardware	Cost (approx/hr)	Use Case
<1B params	`t4-small`	~$0.75	Demos, quick tests only without eval steps
1-3B params	`t4-medium`, `l4x1`	~$1.50-2.50	Development
3-7B params	`a10g-small`, `a10g-large`	~$3.50-5.00	Production training
7-13B params	`a10g-large`, `a100-large`	~$5-10	Large models (use LoRA)
13B+ params	`a100-large`, `a10g-largex2`	~$10-20	Very large (use LoRA)

GPU Flavors: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8

Guidelines:

Use LoRA/PEFT for models >7B to reduce memory
Multi-GPU automatically handled by TRL/Accelerate
Start with smaller hardware for testing

See: references/hardware_guide.md for detailed specifications

Critical: Saving Results to Hub

⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB

The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, ALL TRAINING IS LOST.

Required Configuration

In training script/config:

SFTConfig(
    push_to_hub=True,
    hub_model_id="username/model-name",  # MUST specify
    hub_strategy="every_save",  # Optional: push checkpoints
)

In job submission:

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
}

Verification Checklist

Before submitting:

[ ] push_to_hub=True set in config
[ ] hub_model_id includes username/repo-name
[ ] secrets parameter includes HF_TOKEN
[ ] User has write access to target repo

See: references/hub_saving.md for detailed troubleshooting

Timeout Management

⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING

Setting Timeouts

{
    "timeout": "2h"   # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
}

Timeout Guidelines

Scenario	Recommended	Notes
Quick demo (50-100 examples)	10-30 min	Verify setup
Development training	1-2 hours	Small datasets
Production (3-7B model)	4-6 hours	Full datasets
Large model with LoRA	3-6 hours	Depends on dataset

Always add 20-30% buffer for model/dataset loading, checkpoint saving, Hub push operations, and network delays.

On timeout: Job killed immediately, all unsaved progress lost, must restart from beginning

Cost Estimation

Offer to estimate cost when planning jobs with known parameters. Use scripts/estimate_cost.py:

uv run scripts/estimate_cost.py \
  --model meta-llama/Llama-2-7b-hf \
  --dataset trl-lib/Capybara \
  --hardware a10g-large \
  --dataset-size 16000 \
  --epochs 3

Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.

When to offer: User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5

Example Training Scripts

Production-ready templates with all best practices:

Load these scripts for correctly:

scripts/train_sft_example.py - Complete SFT training with Trackio, LoRA, checkpoints
scripts/train_dpo_example.py - DPO training for preference learning
scripts/train_grpo_example.py - GRPO training for online RL

These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to hf_jobs() or use as templates for custom scripts.

Monitoring and Tracking

Trackio provides real-time metrics visualization. See references/trackio_guide.md for complete setup guide.

Key points:

Add trackio to dependencies
Configure trainer with report_to="trackio" and run_name="meaningful_name"

Trackio Configuration Defaults

Use sensible defaults unless user specifies otherwise. When generating training scripts with Trackio:

Default Configuration:

Space ID: {username}/trackio (use "trackio" as default space name)
Run naming: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)
Config: Keep minimal - only include hyperparameters and model/dataset info
Project Name: Use a Project Name to associate runs with a particular Project

User overrides: If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.

This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.

See references/trackio_guide.md for complete documentation including grouping runs for experiments.

Check Job Status

# List all jobs
hf_jobs("ps")

# Inspect specific job
hf_jobs("inspect", {"job_id": "your-job-id"})

# View logs
hf_jobs("logs", {"job_id": "your-job-id"})

Remember: Wait for user to request status checks. Avoid polling repeatedly.

Dataset Validation

Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.

Why Validate

50%+ of training failures are due to dataset format issues
DPO especially strict: requires exact column names (prompt, chosen, rejected)
Failed GPU jobs waste $1-10 and 30-60 minutes
Validation on CPU costs ~$0.01 and takes <1 minute

When to Validate

ALWAYS validate for:

Unknown or custom datasets
DPO training (CRITICAL - 90% of datasets need mapping)
Any dataset not explicitly TRL-compatible

Skip validation for known TRL datasets:

trl-lib/ultrachat_200k, trl-lib/Capybara, HuggingFaceH4/ultrachat_200k, etc.

Usage

hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})

The script is fast, and will usually complete synchronously.

Reading Results

The output shows compatibility for each training method:

✓ READY - Dataset is compatible, use directly
✗ NEEDS MAPPING - Compatible but needs preprocessing (mapping code provided)
✗ INCOMPATIBLE - Cannot be used for this method

When mapping is needed, the output includes a "MAPPING CODE" section with copy-paste ready Python code.

Example Workflow

# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)
hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
})

# 2. Check output markers:
#    ✓ READY → proceed with training
#    ✗ NEEDS MAPPING → apply mapping code below
#    ✗ INCOMPATIBLE → choose different method/dataset

# 3. If mapping needed, apply before training:
def format_for_dpo(example):
    return {
        'prompt': example['instruction'],
        'chosen': example['chosen_response'],
        'rejected': example['rejected_response'],
    }
dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)

# 4. Launch training job with confidence

Common Scenario: DPO Format Mismatch

Most DPO datasets use non-standard column names. Example:

Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected

The validator detects this and provides exact mapping code to fix it.

Converting Models to GGUF

After training, convert models to GGUF format for use with llama.cpp, Ollama, LM Studio, and other local inference tools.

What is GGUF:

Optimized for CPU/GPU inference with llama.cpp
Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp
Typically 2-8GB for 7B models (vs 14GB unquantized)

When to convert:

Running models locally with Ollama or LM Studio
Reducing model size with quantization
Deploying to edge devices
Sharing models for local-first use

See: references/gguf_conversion.md for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.

Quick conversion:

hf_jobs("uv", {
    "script": "<see references/gguf_conversion.md for complete script>",
    "flavor": "a10g-large",
    "timeout": "45m",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
    "env": {
        "ADAPTER_MODEL": "username/my-finetuned-model",
        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
        "OUTPUT_REPO": "username/my-model-gguf"
    }
})

Common Training Patterns

See references/training_patterns.md for detailed examples including:

Quick demo (5-10 minutes)
Production with checkpoints
Multi-GPU training
DPO training (preference learning)
GRPO training (online RL)

Common Failure Modes

Out of Memory (OOM)

Fix (try in order):

Reduce batch size: per_device_train_batch_size=1, increase gradient_accumulation_steps=8. Effective batch size is per_device_train_batch_size x gradient_accumulation_steps. For best performance keep effective batch size close to 128.
Enable: gradient_checkpointing=True
Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc.

Dataset Misformatted

Fix:

Validate first with dataset inspector:

uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
  --dataset name --split train

Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
Apply mapping code from inspector output if needed

Job Timeout

Fix:

Check logs for actual runtime: hf_jobs("logs", {"job_id": "..."})
Increase timeout with buffer: "timeout": "3h" (add 30% to estimated time)
Or reduce training: lower num_train_epochs, use smaller dataset, enable max_steps
Save checkpoints: save_strategy="steps", save_steps=500, hub_strategy="every_save"

Note: Default 30min is insufficient for real training. Minimum 1-2 hours.

Hub Push Failures

Fix:

Add to job: secrets={"HF_TOKEN": "$HF_TOKEN"}
Add to config: push_to_hub=True, hub_model_id="username/model-name"
Verify auth: mcp__huggingface__hf_whoami()
Check token has write permissions and repo exists (or set hub_private_repo=True)

Missing Dependencies

Fix: Add to PEP 723 header:

# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
# ///

Troubleshooting

Common issues:

Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA
Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN
Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU
Dataset format error → Validate with dataset inspector (see Dataset Validation section)
Import/module errors → Add PEP 723 header with dependencies, verify format
Authentication errors → Check mcp__huggingface__hf_whoami(), token permissions, secrets parameter

See: references/troubleshooting.md for complete troubleshooting guide

Resources

References (In This Skill)

references/training_methods.md - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling
references/training_patterns.md - Common training patterns and examples
references/unsloth.md - Unsloth for fast VLM training (~2x speed, 60% less VRAM)
references/gguf_conversion.md - Complete GGUF conversion guide
references/trackio_guide.md - Trackio monitoring setup
references/hardware_guide.md - Hardware specs and selection
references/hub_saving.md - Hub authentication troubleshooting
references/troubleshooting.md - Common issues and solutions
references/local_training_macos.md - Local training on macOS

Scripts (In This Skill)

scripts/train_sft_example.py - Production SFT template
scripts/train_dpo_example.py - Production DPO template
scripts/train_grpo_example.py - Production GRPO template
scripts/unsloth_sft_example.py - Unsloth text LLM training template (faster, less VRAM)
scripts/estimate_cost.py - Estimate time and cost (offer when appropriate)
scripts/convert_to_gguf.py - Complete GGUF conversion script

External Scripts

Dataset Inspector - Validate dataset format before training (use via uv run or hf_jobs)

External Links

Key Takeaways

Submit scripts inline - The script parameter accepts Python code directly; no file saving required unless user requests
Jobs are asynchronous - Don't wait/poll; let user check when ready
Always set timeout - Default 30 min is insufficient; minimum 1-2 hours recommended
Always enable Hub push - Environment is ephemeral; without push, all results lost
Include Trackio - Use example scripts as templates for real-time monitoring
Offer cost estimation - When parameters are known, use scripts/estimate_cost.py
Use UV scripts (Approach 1) - Default to hf_jobs("uv", {...}) with inline scripts; TRL maintained scripts for standard training; avoid bash trl-jobs commands in Claude Code
Use hf_doc_fetch/hf_doc_search for latest TRL documentation
Validate dataset format before training with dataset inspector (see Dataset Validation section)
Choose appropriate hardware for model size; use LoRA for models >7B

Limitations

Use this skill only when the task clearly matches the scope described above.
Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (27,864 bytes)
📎 references/gguf_conversion.md (9,854 bytes)
📎 references/hardware_guide.md (6,778 bytes)
📎 references/hub_saving.md (8,501 bytes)
📎 references/local_training_macos.md (8,324 bytes)
📎 references/reliability_principles.md (10,867 bytes)
📎 references/trackio_guide.md (5,784 bytes)
📎 references/training_methods.md (5,011 bytes)
📎 references/training_patterns.md (6,127 bytes)
📎 references/troubleshooting.md (8,858 bytes)
📎 references/unsloth.md (8,005 bytes)
📎 scripts/convert_to_gguf.py (12,612 bytes)
📎 scripts/dataset_inspector.py (15,683 bytes)
📎 scripts/estimate_cost.py (4,854 bytes)
📎 scripts/train_dpo_example.py (3,118 bytes)
📎 scripts/train_grpo_example.py (2,384 bytes)
📎 scripts/train_sft_example.py (3,343 bytes)
📎 scripts/unsloth_sft_example.py (16,854 bytes)