ai-voice-cloning
AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs alternative, natural voice, realistic speech, voice ai
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 1
📖 Skill本文(日本語訳)
※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。
AI音声生成

inference.sh CLI を介して、自然なAI音声を生成します。
クイックスタート
curl -fsSL https://cli.inference.sh | sh && infsh login
# 音声を生成
infsh app run infsh/kokoro-tts --input '{
"text": "Hello! This is an AI-generated voice that sounds natural and engaging.",
"voice": "af_sarah"
}'
利用可能なモデル
| モデル | アプリID | 最適な用途 |
|---|---|---|
| Kokoro TTS | infsh/kokoro-tts |
自然、複数の声 |
| DIA | infsh/dia-tts |
会話的、表現豊か |
| Chatterbox | infsh/chatterbox |
カジュアル、エンターテイメント |
| Higgs | infsh/higgs-tts |
プロフェッショナルなナレーション |
| VibeVoice | infsh/vibevoice |
感情の幅が広い |
Kokoroボイスライブラリ
アメリカ英語
| ボイスID | 性別 | スタイル |
|---|---|---|
af_sarah |
女性 | 温かく、フレンドリー |
af_nicole |
女性 | プロフェッショナル |
af_sky |
女性 | 若々しい |
am_michael |
男性 | 権威的 |
am_adam |
男性 | 会話的 |
am_echo |
男性 | クリア、ニュートラル |
イギリス英語
| ボイスID | 性別 | スタイル |
|---|---|---|
bf_emma |
女性 | 洗練された |
bf_isabella |
女性 | 温かい |
bm_george |
男性 | クラシック |
bm_lewis |
男性 | モダン |
音声生成の例
プロフェッショナルなナレーション
infsh app run infsh/kokoro-tts --input '{
"text": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
"voice": "am_michael",
"speed": 1.0
}'
会話スタイル
infsh app run infsh/dia-tts --input '{
"text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
"voice": "conversational"
}'
オーディオブックのナレーション
infsh app run infsh/kokoro-tts --input '{
"text": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
"voice": "bf_emma",
"speed": 0.9
}'
ビデオのナレーション
infsh app run infsh/kokoro-tts --input '{
"text": "Introducing the next generation of productivity. Work smarter, not harder.",
"voice": "af_nicole",
"speed": 1.1
}'
ポッドキャストのホスト
infsh app run infsh/kokoro-tts --input '{
"text": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
"voice": "am_adam"
}'
複数音声の会話
# 2人の話者間の対話を生成
# 話者1
infsh app run infsh/kokoro-tts --input '{
"text": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
"voice": "am_michael"
}' > speaker1.json
# 話者2
infsh app run infsh/kokoro-tts --input '{
"text": "I know, right? Just last week I tried that new image generator and was blown away.",
"voice": "af_sarah"
}' > speaker2.json
# 会話を結合
infsh app run infsh/media-merger --input '{
"audio_files": ["<speaker1-url>", "<speaker2-url>"],
"crossfade_ms": 300
}'
長尺コンテンツ
チャンク処理
5000文字を超えるコンテンツの場合、チャンクに分割します。
# 長いテキストをチャンクで処理
TEXT="Your very long text here..."
# 分割して生成
# チャンク1
infsh app run infsh/kokoro-tts --input '{
"text": "<chunk-1>",
"voice": "bf_emma"
}' > chunk1.json
# チャンク2
infsh app run infsh/kokoro-tts --input '{
"text": "<chunk-2>",
"voice": "bf_emma"
}' > chunk2.json
# チャンクを結合
infsh app run infsh/media-merger --input '{
"audio_files": ["<chunk1-url>", "<chunk2-url>"],
"crossfade_ms": 100
}'
音声 + ビデオのワークフロー
ビデオにナレーションを追加
# 1. ナレーションを生成
infsh app run infsh/kokoro-tts --input '{
"text": "This stunning footage shows the beauty of nature in its purest form.",
"voice": "am_michael"
}' > voiceover.json
# 2. ビデオと結合
infsh app run infsh/media-merger --input '{
"video_url": "https://your-video.mp4",
"audio_url": "<voiceover-url>"
}'
トーキングヘッドを作成
# 1. 音声を生成
infsh app run infsh/kokoro-tts --input '{
"text": "Hi, Im excited to share some updates with you today.",
"voice": "af_sarah"
}' > speech.json
# 2. アバターでアニメーション化
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "<speech-url>"
}'
速度とペース
| 速度 | 効果 | 用途 |
|---|---|---|
| 0.8 | 遅い、意図的 | オーディオブック、瞑想 |
| 0.9 | やや遅い | 教育、チュートリアル |
| 1.0 | 通常 | 一般的な用途 |
| 1.1 | やや速い | コマーシャル、活気 |
| 1.2 | 速い | 短いアナウンス |
# 遅いナレーション
infsh app run infsh/kokoro-tts --input '{
"text": "Take a deep breath. Let yourself relax.",
"voice": "bf_emma",
"speed": 0.8
}'
ペーシングのための句読点
句読点を使用して音声のリズムを制御します。
| 句読点 | 効果 |
|---|---|
ピリオド . |
完全な一時停止 |
コンマ , |
短い一時停止 |
... |
長い一時停止 |
! |
強調 |
? |
疑問のイントネーション |
- |
短い区切り |
infsh app run infsh/kokoro-tts --input '{
"text": "Wait... Did you hear that? Something is coming. Something big!",
"voice": "am_adam"
}'
ベストプラクティス
- コンテンツに合った声を選ぶ - ビジネスにはプロフェッショナルな声、ソーシャルにはカジュアルな声
- 句読点を使用する - ピリオドやコンマでペースを制御する
- 文を短く保つ - 生成しやすく、より自然に聞こえます
- さまざまな声を試す - 同じテキストでも声によって聞こえ方が異なります
- 速度を調整する - やや遅い方が自然に聞こえることが多いです
- 長いコンテンツを分割する - 一貫性を保つためにチャンクで処理する
ユースケース
- ナレーション - ビデオナレーション、コマーシャル
- オーディオブック - 全編のナレーション
- ポッドキャスト - AIホストとゲスト
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開
AI Voice Generation

Generate natural AI voices via inference.sh CLI.
Quick Start
curl -fsSL https://cli.inference.sh | sh && infsh login
# Generate speech
infsh app run infsh/kokoro-tts --input '{
"text": "Hello! This is an AI-generated voice that sounds natural and engaging.",
"voice": "af_sarah"
}'
Available Models
| Model | App ID | Best For |
|---|---|---|
| Kokoro TTS | infsh/kokoro-tts |
Natural, multiple voices |
| DIA | infsh/dia-tts |
Conversational, expressive |
| Chatterbox | infsh/chatterbox |
Casual, entertainment |
| Higgs | infsh/higgs-tts |
Professional narration |
| VibeVoice | infsh/vibevoice |
Emotional range |
Kokoro Voice Library
American English
| Voice ID | Gender | Style |
|---|---|---|
af_sarah |
Female | Warm, friendly |
af_nicole |
Female | Professional |
af_sky |
Female | Youthful |
am_michael |
Male | Authoritative |
am_adam |
Male | Conversational |
am_echo |
Male | Clear, neutral |
British English
| Voice ID | Gender | Style |
|---|---|---|
bf_emma |
Female | Refined |
bf_isabella |
Female | Warm |
bm_george |
Male | Classic |
bm_lewis |
Male | Modern |
Voice Generation Examples
Professional Narration
infsh app run infsh/kokoro-tts --input '{
"text": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
"voice": "am_michael",
"speed": 1.0
}'
Conversational Style
infsh app run infsh/dia-tts --input '{
"text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
"voice": "conversational"
}'
Audiobook Narration
infsh app run infsh/kokoro-tts --input '{
"text": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
"voice": "bf_emma",
"speed": 0.9
}'
Video Voiceover
infsh app run infsh/kokoro-tts --input '{
"text": "Introducing the next generation of productivity. Work smarter, not harder.",
"voice": "af_nicole",
"speed": 1.1
}'
Podcast Host
infsh app run infsh/kokoro-tts --input '{
"text": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
"voice": "am_adam"
}'
Multi-Voice Conversation
# Generate dialogue between two speakers
# Speaker 1
infsh app run infsh/kokoro-tts --input '{
"text": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
"voice": "am_michael"
}' > speaker1.json
# Speaker 2
infsh app run infsh/kokoro-tts --input '{
"text": "I know, right? Just last week I tried that new image generator and was blown away.",
"voice": "af_sarah"
}' > speaker2.json
# Merge conversation
infsh app run infsh/media-merger --input '{
"audio_files": ["<speaker1-url>", "<speaker2-url>"],
"crossfade_ms": 300
}'
Long-Form Content
Chunked Processing
For content over 5000 characters, split into chunks:
# Process long text in chunks
TEXT="Your very long text here..."
# Split and generate
# Chunk 1
infsh app run infsh/kokoro-tts --input '{
"text": "<chunk-1>",
"voice": "bf_emma"
}' > chunk1.json
# Chunk 2
infsh app run infsh/kokoro-tts --input '{
"text": "<chunk-2>",
"voice": "bf_emma"
}' > chunk2.json
# Merge chunks
infsh app run infsh/media-merger --input '{
"audio_files": ["<chunk1-url>", "<chunk2-url>"],
"crossfade_ms": 100
}'
Voice + Video Workflow
Add Voiceover to Video
# 1. Generate voiceover
infsh app run infsh/kokoro-tts --input '{
"text": "This stunning footage shows the beauty of nature in its purest form.",
"voice": "am_michael"
}' > voiceover.json
# 2. Merge with video
infsh app run infsh/media-merger --input '{
"video_url": "https://your-video.mp4",
"audio_url": "<voiceover-url>"
}'
Create Talking Head
# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{
"text": "Hi, Im excited to share some updates with you today.",
"voice": "af_sarah"
}' > speech.json
# 2. Animate with avatar
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "<speech-url>"
}'
Speed and Pacing
| Speed | Effect | Use For |
|---|---|---|
| 0.8 | Slow, deliberate | Audiobooks, meditation |
| 0.9 | Slightly slow | Education, tutorials |
| 1.0 | Normal | General purpose |
| 1.1 | Slightly fast | Commercials, energy |
| 1.2 | Fast | Quick announcements |
# Slow narration
infsh app run infsh/kokoro-tts --input '{
"text": "Take a deep breath. Let yourself relax.",
"voice": "bf_emma",
"speed": 0.8
}'
Punctuation for Pacing
Use punctuation to control speech rhythm:
| Punctuation | Effect |
|---|---|
Period . |
Full pause |
Comma , |
Brief pause |
... |
Extended pause |
! |
Emphasis |
? |
Question intonation |
- |
Quick break |
infsh app run infsh/kokoro-tts --input '{
"text": "Wait... Did you hear that? Something is coming. Something big!",
"voice": "am_adam"
}'
Best Practices
- Match voice to content - Professional voice for business, casual for social
- Use punctuation - Control pacing with periods and commas
- Keep sentences short - Easier to generate and sounds more natural
- Test different voices - Same text sounds different across voices
- Adjust speed - Slightly slower often sounds more natural
- Break long content - Process in chunks for consistency
Use Cases
- Voiceovers - Video narration, commercials
- Audiobooks - Full book narration
- Podcasts - AI hosts and guests
- E-learning - Course narration
- Accessibility - Screen reader content
- IVR - Phone system messages
- Content localization - Translate and voice
Related Skills
# All TTS models
npx skills add inferencesh/skills@text-to-speech
# Podcast creation
npx skills add inferencesh/skills@ai-podcast-creation
# AI avatars
npx skills add inferencesh/skills@ai-avatar-video
# Video generation
npx skills add inferencesh/skills@ai-video-generation
# Full platform skill
npx skills add inferencesh/skills@inference-sh
Browse audio apps: infsh app list --category audio