💼 ビジネスコミュニティ

video-processor

動画ファイルから音声抽出や形式変換（mp4、webm）を行い、Whisperによる文字起こしまで対応できるため、動画変換、音声抽出、文字起こし、ffmpegなどのキーワードが出た際に活用できるSkill。

📜 元の英語説明(参考)

Process video files with audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions video conversion, audio extraction, transcription, mp4, webm, ffmpeg, or whisper transcription.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o video-processor.zip https://jpskill.com/download/18972.zip && unzip -o video-processor.zip && rm video-processor.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/18972.zip -OutFile "$d\video-processor.zip"; Expand-Archive "$d\video-processor.zip" -DestinationPath $d -Force; ri "$d\video-processor.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して video-processor.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → video-processor フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 2

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Video Processor

手順

このスキルは、FFmpeg と OpenAI の Whisper モデルを使用して、音声抽出、フォーマット変換、音声文字起こしを含むビデオ処理ユーティリティを提供します。

前提条件

必須ツール (お使いの環境にインストールされている必要があります):

FFmpeg: ビデオ/オーディオ処理用のマルチメディアフレームワーク

# macOS
brew install ffmpeg

# Ubuntu/Debian
apt-get install ffmpeg

# Verify installation
ffmpeg -version

OpenAI Whisper: 音声認識文字起こしモデル

# Install via pip
pip install -U openai-whisper

# Verify installation
whisper --help

Python パッケージ (PEP 723 を介してスクリプトに含まれています):

click (CLI フレームワーク)
ffmpeg-python (FFmpeg 用 Python ラッパー)

ワークフロー

すべてのビデオ処理タスクには scripts/video_processor.py スクリプトを使用します。このスクリプトは、以下のコマンドを持つシンプルな CLI を提供します。

1. ビデオから音声を抽出

ビデオファイルからオーディオトラックを抽出します。

uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav

オプション:

--format: 出力オーディオフォーマット (デフォルト: wav)。wav, mp3, aac, flac をサポートします。
出力は文字起こしや単独のオーディオ使用に適しています。

2. ビデオを MP4 に変換

任意のビデオファイルを MP4 フォーマットに変換します。

uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4

オプション:

--codec: ビデオコーデック (デフォルト: libx264)。一般的なオプション: libx264, libx265, h264
--preset: エンコード速度/品質プリセット (デフォルト: medium)。オプション: ultrafast, fast, medium, slow, veryslow

3. ビデオを WebM に変換

任意のビデオファイルを WebM フォーマット (ウェブ最適化) に変換します。

uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm

オプション:

--codec: ビデオコーデック (デフォルト: libvpx-vp9)。オプション: libvpx, libvpx-vp9
WebM はウェブ再生とストリーミングに最適化されています。

4. Whisper で音声を文字起こし

OpenAI の Whisper モデルを使用して、オーディオまたはビデオファイルをテキストに文字起こしします。

# ビデオファイルを文字起こし (音声は自動的に抽出されます)
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt

# オーディオファイルを直接文字起こし
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt

オプション:

--model: Whisper モデルサイズ (デフォルト: base)。オプション:
- tiny: 最速、最低精度 (~1GB RAM)
- base: 高速、良好な精度 (~1GB RAM) [デフォルト]
- small: バランスの取れた (~2GB RAM)
- medium: 高精度 (~5GB RAM)
- large: 最高精度、最も遅い (~10GB RAM)
--language: 言語コード (デフォルト: 自動検出)。例: en, es, fr, de, zh
--format: 出力フォーマット (デフォルト: txt)。オプション: txt, srt, vtt, json

文字起こしワークフロー:

入力がビデオの場合、FFmpeg が音声を一時的な WAV ファイルに抽出します。
Whisper がオーディオファイルを処理します。
文字起こしは要求されたフォーマットで保存されます。
一時ファイルは自動的にクリーンアップされます。

5. 結合ワークフローの例

ビデオをエンドツーエンドで処理します。

# 1. 分析のために音声を抽出
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav

# 2. SRT 字幕に文字起こし
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small

# 3. ウェブフォーマットに変換
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm

主要な技術的詳細

FFmpeg と Whisper の統合:

FFmpeg は音声自体を文字起こししません。外部の文字起こしのために音声を準備します。
ワークフローは次のとおりです: 音声抽出 (FFmpeg) → 文字起こし (Whisper) → オプション: ビデオとの再統合
FFmpeg はリアルタイム処理のために音声を Whisper に直接パイプできます (高度なユースケース)。

文字起こし用のオーディオフォーマット:

Whisper は WAV または MP3 フォーマットで最もよく機能します。
サンプルレート: 16kHz が最適です (スクリプトが自動的に変換を処理します)。
スクリプトは Whisper に最適な設定で音声を抽出します。

出力フォーマット:

txt: プレーンテキストの文字起こし
srt: SubRip 字幕フォーマット (タイムスタンプを含む)
vtt: WebVTT 字幕フォーマット (ウェブ標準)
json: 単語レベルのタイムスタンプを含む詳細な JSON

エラー処理

スクリプトには包括的なエラー処理が含まれています。

入力ファイルの存在を検証します。
FFmpeg と Whisper がインストールされているかを確認します。
依存関係の不足に対して明確なエラーメッセージを提供します。
エラー発生時に一時ファイルのクリーンアップを処理します。

パフォーマンスのヒント

クイックドラフトには tiny または base モデルを使用してください。
本番の文字起こしには small または medium を使用してください。
最大の精度が必要な場合にのみ large を使用してください。
長いビデオの場合、最初に音声を抽出し、次にセグメントごとに文字起こしすることを検討してください。
VP9 を使用した WebM 変換は時間がかかりますが、ファイルサイズは小さくなります。

例

例 1: ビデオから MP4 へのクイック変換

ユーザーリクエスト:

古いカメラの AVI ファイルがあります。MP4 に変換できますか？

あなたが行うこと:

to-mp4 コマンドをデフォルト設定で使用します。

uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4

変換が正常に完了したことを確認します。
出力ファイルの場所をユーザーに通知します。

例 2: 音声の抽出と文字起こし

ユーザーリクエスト:

講義ビデオを録画しましたが、文字起こしが必要です。音声を抽出して文字起こしできますか？

あなたが行うこと:

まず音声を抽出します。

uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav

次に、base モデル (速度と精度のバランスが良い) を使用して文字起こしします。

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base

transcript.txt ファイルをユーザーと共有します。

例 3: 字幕付きのウェブ最適化ビデオを作成

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Video Processor

Instructions

This skill provides video processing utilities including audio extraction, format conversion, and audio transcription using FFmpeg and OpenAI's Whisper model.

Prerequisites

Required tools (must be installed in your environment):

FFmpeg: Multimedia framework for video/audio processing

# macOS
brew install ffmpeg

# Ubuntu/Debian
apt-get install ffmpeg

# Verify installation
ffmpeg -version

OpenAI Whisper: Speech-to-text transcription model

# Install via pip
pip install -U openai-whisper

# Verify installation
whisper --help

Python packages (included in script via PEP 723):

click (CLI framework)
ffmpeg-python (Python wrapper for FFmpeg)

Workflow

Use the scripts/video_processor.py script for all video processing tasks. The script provides a simple CLI with the following commands:

1. Extract Audio from Video

Extract the audio track from a video file:

uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav

Options:

--format: Output audio format (default: wav). Supports: wav, mp3, aac, flac
Output is suitable for transcription or standalone audio use

2. Convert Video to MP4

Convert any video file to MP4 format:

uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4

Options:

--codec: Video codec (default: libx264). Common options: libx264, libx265, h264
--preset: Encoding speed/quality preset (default: medium). Options: ultrafast, fast, medium, slow, veryslow

3. Convert Video to WebM

Convert any video file to WebM format (web-optimized):

uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm

Options:

--codec: Video codec (default: libvpx-vp9). Options: libvpx, libvpx-vp9
WebM is optimized for web playback and streaming

4. Transcribe Audio with Whisper

Transcribe audio or video files to text using OpenAI's Whisper model:

# Transcribe video file (audio will be extracted automatically)
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt

# Transcribe audio file directly
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt

Options:

--model: Whisper model size (default: base). Options:
- tiny: Fastest, lowest accuracy (~1GB RAM)
- base: Fast, good accuracy (~1GB RAM) [DEFAULT]
- small: Balanced (~2GB RAM)
- medium: High accuracy (~5GB RAM)
- large: Best accuracy, slowest (~10GB RAM)
--language: Language code (default: auto-detect). Examples: en, es, fr, de, zh
--format: Output format (default: txt). Options: txt, srt, vtt, json

Transcription workflow:

If input is video, FFmpeg extracts audio to temporary WAV file
Whisper processes the audio file
Transcription is saved in requested format
Temporary files are cleaned up automatically

5. Combined Workflow Example

Process a video end-to-end:

# 1. Extract audio for analysis
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav

# 2. Transcribe to SRT subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small

# 3. Convert to web format
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm

Key Technical Details

FFmpeg and Whisper Integration:

FFmpeg doesn't transcribe audio itself - it prepares audio for external transcription
The workflow is: Extract audio (FFmpeg) → Transcribe (Whisper) → Optional: Re-integrate with video
FFmpeg can pipe audio directly to Whisper for real-time processing (advanced use case)

Audio Format for Transcription:

Whisper works best with WAV or MP3 formats
Sample rate: 16kHz is optimal (script handles conversion automatically)
The script extracts audio with optimal settings for Whisper

Output Formats:

txt: Plain text transcript
srt: SubRip subtitle format (includes timestamps)
vtt: WebVTT subtitle format (web standard)
json: Detailed JSON with word-level timestamps

Error Handling

The script includes comprehensive error handling:

Validates input files exist
Checks FFmpeg and Whisper are installed
Provides clear error messages for missing dependencies
Handles temporary file cleanup on errors

Performance Tips

Use tiny or base models for quick drafts
Use small or medium for production transcriptions
Use large only when maximum accuracy is required
For long videos, consider extracting audio first, then transcribe in segments
WebM conversion with VP9 takes longer but produces smaller files

Examples

Example 1: Quick Video to MP4 Conversion

User request:

I have an AVI file from my old camera. Can you convert it to MP4?

You would:

Use the to-mp4 command with default settings:

uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4

Confirm the conversion completed successfully
Inform the user about the output file location

Example 2: Extract Audio and Transcribe

User request:

I recorded a lecture video and need a transcript. Can you extract the audio and transcribe it?

You would:

First extract the audio:

uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav

Then transcribe using the base model (good balance of speed/accuracy):

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base

Share the transcript.txt file with the user

Example 3: Create Web-Optimized Video with Subtitles

User request:

I need to put this video on my website with subtitles. Can you help?

You would:

Convert to WebM for web optimization:

uv run .claude/skills/video-processor/scripts/video_processor.py to-webm presentation.mp4 presentation.webm

Generate SRT subtitle file:

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe presentation.mp4 subtitles.srt --format srt --model small

Inform user they now have:
- presentation.webm (web-optimized video)
- subtitles.srt (subtitle file for embedding)

Example 4: High-Quality Transcription with Language Specification

User request:

I have a Spanish interview video that needs an accurate transcript for publication.

You would:

Use a larger model with language specified for best accuracy:

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.txt --model medium --language es

Optionally create SRT for review:

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.srt --format srt --model medium --language es

Review the transcript with the user and make any necessary corrections

Example 5: Batch Processing Multiple Videos

User request:

I have a folder of training videos that all need to be converted to WebM and transcribed.

You would:

List all video files in the directory:
```
ls training_videos/*.mp4
```

For each video file, run the conversion and transcription:

# For each video: video1.mp4, video2.mp4, etc.
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm training_videos/video1.mp4 output/video1.webm
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe training_videos/video1.mp4 output/video1.txt --model base

# Repeat for each file

Confirm all conversions and transcriptions completed
Provide summary of output files

Summary

The video-processor skill provides a unified interface for common video processing tasks:

Audio extraction: Extract audio tracks in various formats
Format conversion: Convert to MP4 (universal) or WebM (web-optimized)
Transcription: Speech-to-text with multiple output formats
Flexible: CLI arguments for model selection, language, and output formats

All operations are handled through a single, well-documented script with sensible defaults and comprehensive error handling.

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (8,823 bytes)
📎 scripts/video_processor.py (14,464 bytes)