jpskill.com
💼 ビジネス コミュニティ

ai-voice-cloning

AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs alternative, natural voice, realistic speech, voice ai

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-17
取得日時
2026-05-17
同梱ファイル
1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

AI音声生成

AI音声生成

inference.sh CLI を介して、自然なAI音声を生成します。

クイックスタート

curl -fsSL https://cli.inference.sh | sh && infsh login

# 音声を生成
infsh app run infsh/kokoro-tts --input '{
  "text": "Hello! This is an AI-generated voice that sounds natural and engaging.",
  "voice": "af_sarah"
}'

利用可能なモデル

モデル アプリID 最適な用途
Kokoro TTS infsh/kokoro-tts 自然、複数の声
DIA infsh/dia-tts 会話的、表現豊か
Chatterbox infsh/chatterbox カジュアル、エンターテイメント
Higgs infsh/higgs-tts プロフェッショナルなナレーション
VibeVoice infsh/vibevoice 感情の幅が広い

Kokoroボイスライブラリ

アメリカ英語

ボイスID 性別 スタイル
af_sarah 女性 温かく、フレンドリー
af_nicole 女性 プロフェッショナル
af_sky 女性 若々しい
am_michael 男性 権威的
am_adam 男性 会話的
am_echo 男性 クリア、ニュートラル

イギリス英語

ボイスID 性別 スタイル
bf_emma 女性 洗練された
bf_isabella 女性 温かい
bm_george 男性 クラシック
bm_lewis 男性 モダン

音声生成の例

プロフェッショナルなナレーション

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

会話スタイル

infsh app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

オーディオブックのナレーション

infsh app run infsh/kokoro-tts --input '{
  "text": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

ビデオのナレーション

infsh app run infsh/kokoro-tts --input '{
  "text": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

ポッドキャストのホスト

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

複数音声の会話

# 2人の話者間の対話を生成
# 話者1
infsh app run infsh/kokoro-tts --input '{
  "text": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# 話者2
infsh app run infsh/kokoro-tts --input '{
  "text": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# 会話を結合
infsh app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

長尺コンテンツ

チャンク処理

5000文字を超えるコンテンツの場合、チャンクに分割します。

# 長いテキストをチャンクで処理
TEXT="Your very long text here..."

# 分割して生成
# チャンク1
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# チャンク2
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# チャンクを結合
infsh app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

音声 + ビデオのワークフロー

ビデオにナレーションを追加

# 1. ナレーションを生成
infsh app run infsh/kokoro-tts --input '{
  "text": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. ビデオと結合
infsh app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

トーキングヘッドを作成

# 1. 音声を生成
infsh app run infsh/kokoro-tts --input '{
  "text": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. アバターでアニメーション化
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

速度とペース

速度 効果 用途
0.8 遅い、意図的 オーディオブック、瞑想
0.9 やや遅い 教育、チュートリアル
1.0 通常 一般的な用途
1.1 やや速い コマーシャル、活気
1.2 速い 短いアナウンス
# 遅いナレーション
infsh app run infsh/kokoro-tts --input '{
  "text": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

ペーシングのための句読点

句読点を使用して音声のリズムを制御します。

句読点 効果
ピリオド . 完全な一時停止
コンマ , 短い一時停止
... 長い一時停止
! 強調
? 疑問のイントネーション
- 短い区切り
infsh app run infsh/kokoro-tts --input '{
  "text": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

ベストプラクティス

  1. コンテンツに合った声を選ぶ - ビジネスにはプロフェッショナルな声、ソーシャルにはカジュアルな声
  2. 句読点を使用する - ピリオドやコンマでペースを制御する
  3. 文を短く保つ - 生成しやすく、より自然に聞こえます
  4. さまざまな声を試す - 同じテキストでも声によって聞こえ方が異なります
  5. 速度を調整する - やや遅い方が自然に聞こえることが多いです
  6. 長いコンテンツを分割する - 一貫性を保つためにチャンクで処理する

ユースケース

  • ナレーション - ビデオナレーション、コマーシャル
  • オーディオブック - 全編のナレーション
  • ポッドキャスト - AIホストとゲスト
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

AI Voice Generation

AI Voice Generation

Generate natural AI voices via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{
  "text": "Hello! This is an AI-generated voice that sounds natural and engaging.",
  "voice": "af_sarah"
}'

Available Models

Model App ID Best For
Kokoro TTS infsh/kokoro-tts Natural, multiple voices
DIA infsh/dia-tts Conversational, expressive
Chatterbox infsh/chatterbox Casual, entertainment
Higgs infsh/higgs-tts Professional narration
VibeVoice infsh/vibevoice Emotional range

Kokoro Voice Library

American English

Voice ID Gender Style
af_sarah Female Warm, friendly
af_nicole Female Professional
af_sky Female Youthful
am_michael Male Authoritative
am_adam Male Conversational
am_echo Male Clear, neutral

British English

Voice ID Gender Style
bf_emma Female Refined
bf_isabella Female Warm
bm_george Male Classic
bm_lewis Male Modern

Voice Generation Examples

Professional Narration

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

Conversational Style

infsh app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

Audiobook Narration

infsh app run infsh/kokoro-tts --input '{
  "text": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

Video Voiceover

infsh app run infsh/kokoro-tts --input '{
  "text": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

Podcast Host

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

Multi-Voice Conversation

# Generate dialogue between two speakers
# Speaker 1
infsh app run infsh/kokoro-tts --input '{
  "text": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# Speaker 2
infsh app run infsh/kokoro-tts --input '{
  "text": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# Merge conversation
infsh app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

Long-Form Content

Chunked Processing

For content over 5000 characters, split into chunks:

# Process long text in chunks
TEXT="Your very long text here..."

# Split and generate
# Chunk 1
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# Chunk 2
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# Merge chunks
infsh app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

Voice + Video Workflow

Add Voiceover to Video

# 1. Generate voiceover
infsh app run infsh/kokoro-tts --input '{
  "text": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. Merge with video
infsh app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

Create Talking Head

# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{
  "text": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. Animate with avatar
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

Speed and Pacing

Speed Effect Use For
0.8 Slow, deliberate Audiobooks, meditation
0.9 Slightly slow Education, tutorials
1.0 Normal General purpose
1.1 Slightly fast Commercials, energy
1.2 Fast Quick announcements
# Slow narration
infsh app run infsh/kokoro-tts --input '{
  "text": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

Punctuation for Pacing

Use punctuation to control speech rhythm:

Punctuation Effect
Period . Full pause
Comma , Brief pause
... Extended pause
! Emphasis
? Question intonation
- Quick break
infsh app run infsh/kokoro-tts --input '{
  "text": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

Best Practices

  1. Match voice to content - Professional voice for business, casual for social
  2. Use punctuation - Control pacing with periods and commas
  3. Keep sentences short - Easier to generate and sounds more natural
  4. Test different voices - Same text sounds different across voices
  5. Adjust speed - Slightly slower often sounds more natural
  6. Break long content - Process in chunks for consistency

Use Cases

  • Voiceovers - Video narration, commercials
  • Audiobooks - Full book narration
  • Podcasts - AI hosts and guests
  • E-learning - Course narration
  • Accessibility - Screen reader content
  • IVR - Phone system messages
  • Content localization - Translate and voice

Related Skills

# All TTS models
npx skills add inferencesh/skills@text-to-speech

# Podcast creation
npx skills add inferencesh/skills@ai-podcast-creation

# AI avatars
npx skills add inferencesh/skills@ai-avatar-video

# Video generation
npx skills add inferencesh/skills@ai-video-generation

# Full platform skill
npx skills add inferencesh/skills@inference-sh

Browse audio apps: infsh app list --category audio