💼 ビジネスコミュニティ

speech-to-text

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation, multi-language, timestamps. Use for: meeting transcription, subtitles, podcast transcripts, voice notes. Triggers: speech to text, transcription, whisper, audio to text, transcribe audio, voice to text, stt, automatic transcription, subtitles generation, transcribe meeting, audio transcription, whisper ai

⬇ このSkillをダウンロード(.skill) 元のソースを見る ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Speech-to-Text

inference.sh CLI を介して音声をテキストに文字起こしします。

クイックスタート

curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

利用可能なモデル

モデル	アプリ ID	最適な用途
Fast Whisper V3	`infsh/fast-whisper-large-v3`	高速な文字起こし
Whisper V3 Large	`infsh/whisper-v3-large`	最高の精度

例

基本的な文字起こし

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

タイムスタンプ付き

infsh app sample infsh/fast-whisper-large-v3 --save input.json

# {
#   "audio_url": "https://podcast.mp3",
#   "timestamps": true
# }

infsh app run infsh/fast-whisper-large-v3 --input input.json

翻訳（英語へ）

infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

ビデオから

# まずビデオから音声を抽出します
infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

# 抽出した音声を文字起こしします
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

ワークフロー: ビデオ字幕

# 1. ビデオ音声を文字起こしします
infsh app run infsh/fast-whisper-large-v3 --input '{
  "audio_url": "https://video.mp4",
  "timestamps": true
}' > transcript.json

# 2. 字幕に文字起こしを使用します
infsh app run infsh/caption-videos --input '{
  "video_url": "https://video.mp4",
  "captions": "<transcript-from-step-1>"
}'

サポートされている言語

Whisper は、英語、スペイン語、フランス語、ドイツ語、イタリア語、ポルトガル語、中国語、日本語、韓国語、アラビア語、ヒンディー語、ロシア語など、99 以上の言語をサポートしています。

ユースケース

会議: 録音を文字起こしします
ポッドキャスト: 文字起こしを生成します
字幕: ビデオのキャプションを作成します
音声メモ: 検索可能なテキストに変換します
インタビュー: 研究のための文字起こし
アクセシビリティ: 音声コンテンツにアクセスしやすくします

出力形式

以下の JSON を返します。

text: 完全な文字起こし
segments: タイムスタンプ付きセグメント（要求された場合）
language: 検出された言語

ドキュメント

Running Apps - CLI を介してアプリを実行する方法
Audio Transcription Example - 完全な文字起こしガイド
Apps Overview - アプリのエコシステムを理解する

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Available Models

Model	App ID	Best For
Fast Whisper V3	`infsh/fast-whisper-large-v3`	Fast transcription
Whisper V3 Large	`infsh/whisper-v3-large`	Highest accuracy

Examples

Basic Transcription

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

infsh app sample infsh/fast-whisper-large-v3 --save input.json

# {
#   "audio_url": "https://podcast.mp3",
#   "timestamps": true
# }

infsh app run infsh/fast-whisper-large-v3 --input input.json

Translation (to English)

infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

From Video

# Extract audio from video first
infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

# Transcribe the extracted audio
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

Workflow: Video Subtitles

# 1. Transcribe video audio
infsh app run infsh/fast-whisper-large-v3 --input '{
  "audio_url": "https://video.mp4",
  "timestamps": true
}' > transcript.json

# 2. Use transcript for captions
infsh app run infsh/caption-videos --input '{
  "video_url": "https://video.mp4",
  "captions": "<transcript-from-step-1>"
}'

Supported Languages

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Use Cases

Meetings: Transcribe recordings
Podcasts: Generate transcripts
Subtitles: Create captions for videos
Voice Notes: Convert to searchable text
Interviews: Transcription for research
Accessibility: Make audio content accessible

Output Format

Returns JSON with:

text: Full transcription
segments: Timestamped segments (if requested)
language: Detected language

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inferencesh/skills@inference-sh

# Text-to-speech (reverse direction)
npx skills add inferencesh/skills@text-to-speech

# Video generation (add captions)
npx skills add inferencesh/skills@ai-video-generation

# AI avatars (lipsync with transcripts)
npx skills add inferencesh/skills@ai-avatar-video

Browse all audio apps: infsh app list --category audio

Documentation

Running Apps - How to run apps via CLI
Audio Transcription Example - Complete transcription guide
Apps Overview - Understanding the app ecosystem

speech-to-text

🎯 このSkillでできること

📦 インストール方法 (3ステップ)

📖 Skill本文(日本語訳)

Speech-to-Text

クイックスタート

利用可能なモデル

例

基本的な文字起こし

タイムスタンプ付き

翻訳（英語へ）

ビデオから

ワークフロー: ビデオ字幕

サポートされている言語

ユースケース

出力形式

関連スキル

ドキュメント

Speech-to-Text

Quick Start

Available Models

Examples

Basic Transcription

With Timestamps

Translation (to English)

From Video

Workflow: Video Subtitles

Supported Languages

Use Cases

Output Format

Related Skills

Documentation