💼 ビジネスコミュニティ

ai-voice-cloning

AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs alternative, natural voice, realistic speech, voice ai

⬇ このSkillをダウンロード(.skill) 元のソースを見る ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

AI音声生成

inference.sh CLI を介して、自然なAI音声を生成します。

クイックスタート

curl -fsSL https://cli.inference.sh | sh && infsh login

# 音声を生成
infsh app run infsh/kokoro-tts --input '{
  "text": "Hello! This is an AI-generated voice that sounds natural and engaging.",
  "voice": "af_sarah"
}'

利用可能なモデル

モデル	アプリID	最適な用途
Kokoro TTS	`infsh/kokoro-tts`	自然、複数の声
DIA	`infsh/dia-tts`	会話的、表現豊か
Chatterbox	`infsh/chatterbox`	カジュアル、エンターテイメント
Higgs	`infsh/higgs-tts`	プロフェッショナルなナレーション
VibeVoice	`infsh/vibevoice`	感情の幅が広い

Kokoroボイスライブラリ

アメリカ英語

ボイスID	性別	スタイル
`af_sarah`	女性	温かく、フレンドリー
`af_nicole`	女性	プロフェッショナル
`af_sky`	女性	若々しい
`am_michael`	男性	権威的
`am_adam`	男性	会話的
`am_echo`	男性	クリア、ニュートラル

イギリス英語

ボイスID	性別	スタイル
`bf_emma`	女性	洗練された
`bf_isabella`	女性	温かい
`bm_george`	男性	クラシック
`bm_lewis`	男性	モダン

音声生成の例

プロフェッショナルなナレーション

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

会話スタイル

infsh app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

オーディオブックのナレーション

infsh app run infsh/kokoro-tts --input '{
  "text": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

ビデオのナレーション

infsh app run infsh/kokoro-tts --input '{
  "text": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

ポッドキャストのホスト

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

複数音声の会話

# 2人の話者間の対話を生成
# 話者1
infsh app run infsh/kokoro-tts --input '{
  "text": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# 話者2
infsh app run infsh/kokoro-tts --input '{
  "text": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# 会話を結合
infsh app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

長尺コンテンツ

チャンク処理

5000文字を超えるコンテンツの場合、チャンクに分割します。

# 長いテキストをチャンクで処理
TEXT="Your very long text here..."

# 分割して生成
# チャンク1
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# チャンク2
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# チャンクを結合
infsh app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

音声 + ビデオのワークフロー

ビデオにナレーションを追加

# 1. ナレーションを生成
infsh app run infsh/kokoro-tts --input '{
  "text": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. ビデオと結合
infsh app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

トーキングヘッドを作成

# 1. 音声を生成
infsh app run infsh/kokoro-tts --input '{
  "text": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. アバターでアニメーション化
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

速度とペース

速度	効果	用途
0.8	遅い、意図的	オーディオブック、瞑想
0.9	やや遅い	教育、チュートリアル
1.0	通常	一般的な用途
1.1	やや速い	コマーシャル、活気
1.2	速い	短いアナウンス

# 遅いナレーション
infsh app run infsh/kokoro-tts --input '{
  "text": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

ペーシングのための句読点

句読点を使用して音声のリズムを制御します。

句読点	効果
ピリオド `.`	完全な一時停止
コンマ `,`	短い一時停止
`...`	長い一時停止
`!`	強調
`?`	疑問のイントネーション
`-`	短い区切り

infsh app run infsh/kokoro-tts --input '{
  "text": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

ベストプラクティス

コンテンツに合った声を選ぶ - ビジネスにはプロフェッショナルな声、ソーシャルにはカジュアルな声
句読点を使用する - ピリオドやコンマでペースを制御する
文を短く保つ - 生成しやすく、より自然に聞こえます
さまざまな声を試す - 同じテキストでも声によって聞こえ方が異なります
速度を調整する - やや遅い方が自然に聞こえることが多いです
長いコンテンツを分割する - 一貫性を保つためにチャンクで処理する

ユースケース

ナレーション - ビデオナレーション、コマーシャル
オーディオブック - 全編のナレーション
ポッドキャスト - AIホストとゲスト

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

AI Voice Generation

Generate natural AI voices via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{
  "text": "Hello! This is an AI-generated voice that sounds natural and engaging.",
  "voice": "af_sarah"
}'

Available Models

Model	App ID	Best For
Kokoro TTS	`infsh/kokoro-tts`	Natural, multiple voices
DIA	`infsh/dia-tts`	Conversational, expressive
Chatterbox	`infsh/chatterbox`	Casual, entertainment
Higgs	`infsh/higgs-tts`	Professional narration
VibeVoice	`infsh/vibevoice`	Emotional range

Kokoro Voice Library

American English

Voice ID	Gender	Style
`af_sarah`	Female	Warm, friendly
`af_nicole`	Female	Professional
`af_sky`	Female	Youthful
`am_michael`	Male	Authoritative
`am_adam`	Male	Conversational
`am_echo`	Male	Clear, neutral

British English

Voice ID	Gender	Style
`bf_emma`	Female	Refined
`bf_isabella`	Female	Warm
`bm_george`	Male	Classic
`bm_lewis`	Male	Modern

Voice Generation Examples

Professional Narration

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

Conversational Style

infsh app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

Audiobook Narration

infsh app run infsh/kokoro-tts --input '{
  "text": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

Video Voiceover

infsh app run infsh/kokoro-tts --input '{
  "text": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

Podcast Host

infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

Multi-Voice Conversation

# Generate dialogue between two speakers
# Speaker 1
infsh app run infsh/kokoro-tts --input '{
  "text": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# Speaker 2
infsh app run infsh/kokoro-tts --input '{
  "text": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# Merge conversation
infsh app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

Long-Form Content

Chunked Processing

For content over 5000 characters, split into chunks:

# Process long text in chunks
TEXT="Your very long text here..."

# Split and generate
# Chunk 1
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# Chunk 2
infsh app run infsh/kokoro-tts --input '{
  "text": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# Merge chunks
infsh app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

Voice + Video Workflow

Add Voiceover to Video

# 1. Generate voiceover
infsh app run infsh/kokoro-tts --input '{
  "text": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. Merge with video
infsh app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

Create Talking Head

# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{
  "text": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. Animate with avatar
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

Speed and Pacing

Speed	Effect	Use For
0.8	Slow, deliberate	Audiobooks, meditation
0.9	Slightly slow	Education, tutorials
1.0	Normal	General purpose
1.1	Slightly fast	Commercials, energy
1.2	Fast	Quick announcements

# Slow narration
infsh app run infsh/kokoro-tts --input '{
  "text": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

Punctuation for Pacing

Use punctuation to control speech rhythm:

Punctuation	Effect
Period `.`	Full pause
Comma `,`	Brief pause
`...`	Extended pause
`!`	Emphasis
`?`	Question intonation
`-`	Quick break

infsh app run infsh/kokoro-tts --input '{
  "text": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

Best Practices

Match voice to content - Professional voice for business, casual for social
Use punctuation - Control pacing with periods and commas
Keep sentences short - Easier to generate and sounds more natural
Test different voices - Same text sounds different across voices
Adjust speed - Slightly slower often sounds more natural
Break long content - Process in chunks for consistency

Use Cases

Voiceovers - Video narration, commercials
Audiobooks - Full book narration
Podcasts - AI hosts and guests
E-learning - Course narration
Accessibility - Screen reader content
IVR - Phone system messages
Content localization - Translate and voice

Related Skills

# All TTS models
npx skills add inferencesh/skills@text-to-speech

# Podcast creation
npx skills add inferencesh/skills@ai-podcast-creation

# AI avatars
npx skills add inferencesh/skills@ai-avatar-video

# Video generation
npx skills add inferencesh/skills@ai-video-generation

# Full platform skill
npx skills add inferencesh/skills@inference-sh

Browse audio apps: infsh app list --category audio