🛠️ 開発・MCP コミュニティ

deepgram

Deepgram APIを活用し、音声の文字起こしやリアルタイム翻訳、音声解析、言語検出などを行い、音声認識技術を活用したアプリケーション開発を支援するSkill。

📜 元の英語説明(参考)

Transcribe and analyze audio with the Deepgram API. Use when a user asks to convert speech to text, implement real-time transcription, analyze audio intelligence, detect languages, or build voice-enabled applications with Deepgram SDKs.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o deepgram.zip https://jpskill.com/download/14826.zip && unzip -o deepgram.zip && rm deepgram.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/14826.zip -OutFile "$d\deepgram.zip"; Expand-Archive "$d\deepgram.zip" -DestinationPath $d -Force; ri "$d\deepgram.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して deepgram.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → deepgram フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Deepgram — リアルタイム音声テキスト変換 API

概要

あなたは、リアルタイムの文字起こしに最適化された音声テキスト変換プラットフォームである Deepgram の専門家です。あなたは、ストリーミング WebSocket 接続、話者分離、およびスマートフォーマットを備えた Deepgram の Nova-2 モデルを使用して、開発者がライブ文字起こしシステム、音声エージェント、通話分析、および会議の要約を構築するのを支援します。

指示

ストリーミング文字起こし (リアルタイム)

// WebSocket を介したリアルタイム文字起こし
import { createClient, LiveTranscriptionEvents } from "@deepgram/sdk";

const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

async function transcribeLive(audioStream: ReadableStream) {
  const connection = deepgram.listen.live({
    model: "nova-2",                    // 最速かつ最も正確なモデル
    language: "en",
    smart_format: true,                 // 自動句読点、大文字小文字、数字
    interim_results: true,              // ユーザーが話すにつれて部分的な結果を取得
    utterance_end_ms: 1000,             // 1秒の無音 = 発話の終わり
    vad_events: true,                   // 音声アクティビティ検出
    diarize: true,                      // 話者識別
    endpointing: 500,                   // 応答性のための 500ms エンドポイント処理
  });

  connection.on(LiveTranscriptionEvents.Transcript, (data) => {
    const transcript = data.channel.alternatives[0];
    if (transcript.transcript) {
      if (data.is_final) {
        console.log(`[Final] Speaker ${data.channel.alternatives[0].words?.[0]?.speaker}: ${transcript.transcript}`);
        // 応答生成のために LLM に送信
      } else {
        console.log(`[Interim] ${transcript.transcript}`);
        // ユーザーが話すにつれてリアルタイムテキストを表示
      }
    }
  });

  connection.on(LiveTranscriptionEvents.UtteranceEnd, () => {
    console.log("[Utterance complete — user stopped speaking]");
  });

  // 音声を Deepgram にパイプ (16kHz, 16-bit PCM またはサポートされている任意の形式)
  for await (const chunk of audioStream) {
    connection.send(chunk);
  }
}

録音済み文字起こし

# 録音されたオーディオ/ビデオのバッチ文字起こし
from deepgram import DeepgramClient, PrerecordedOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

options = PrerecordedOptions(
    model="nova-2",
    smart_format=True,
    diarize=True,                       # 話者を識別
    summarize="v2",                     # 自動的に要約を生成
    topics=True,                        # トピックを抽出
    intents=True,                       # 意図を検出 (質問、コマンド、ステートメント)
    sentiment=True,                     # 発話ごとの感情
    paragraphs=True,                    # 自動段落フォーマット
    utterances=True,                    # 話者のターンで分割
)

# URL から
response = dg.listen.rest.v("1").transcribe_url(
    {"url": "https://example.com/meeting.mp3"}, options
)

# ファイルから
with open("recording.wav", "rb") as f:
    response = dg.listen.rest.v("1").transcribe_file(
        {"buffer": f.read(), "mimetype": "audio/wav"}, options
    )

# 結果へのアクセス
transcript = response.results.channels[0].alternatives[0]
print(f"Transcript: {transcript.transcript}")
print(f"Confidence: {transcript.confidence}")
print(f"Summary: {response.results.summary.short}")
for utterance in response.results.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.transcript}")

テキスト読み上げ (Aura)

# Deepgram Aura TTS — 低遅延音声合成
response = dg.speak.rest.v("1").stream_raw(
    {"text": "Thanks for calling. How can I help you today?"},
    options={
        "model": "aura-asteria-en",     # 女性、温かいトーン
        "encoding": "linear16",          # 16-bit PCM
        "sample_rate": 24000,
    },
)

# オーディオチャンクをスピーカー/WebRTC にストリーミング
for chunk in response.iter_bytes():
    audio_output.write(chunk)

インストール

npm install @deepgram/sdk                # Node.js
pip install deepgram-sdk                  # Python

例

例 1: ユーザーが deepgram のセットアップを要求

ユーザー: "プロジェクトのために deepgram をセットアップするのを手伝ってください"

エージェントは以下を行う必要があります:

システム要件と前提条件を確認
deepgram をインストールまたは構成
初期プロジェクト構造をセットアップ
セットアップが正しく動作することを確認

例 2: ユーザーが deepgram で機能を構築することを要求

ユーザー: "deepgram を使用してダッシュボードを作成してください"

エージェントは以下を行う必要があります:

コンポーネントまたは構成をスキャフォールド
適切なデータソースに接続
要求された機能を実装
出力をテストおよび検証

ガイドライン

Nova-2 をすべてのものに — Nova-2 は、Deepgram の精度と速度において最高のモデルです。特定の言語モデルが必要な場合を除き、これを使用してください。
リアルタイムのためのストリーミング — ライブオーディオには WebSocket 接続を使用し、録音済みファイルにはバッチ API を使用します。
エンドポイント処理の調整 — 音声エージェント (応答性) の場合は endpointing を 300〜500ms に設定し、文字起こし (精度) の場合は 1000ms に設定します。
スマートフォーマット — 適切な大文字化、句読点、および数値フォーマットのために、常に smart_format を有効にします。
会議のための話者分離 — 複数の話者が存在する場合は diarize を有効にします。Deepgram は最大 10 人の話者を識別します。
UX のための中間結果 — リアルタイムテキスト表示のために interim_results を有効にします。ユーザーが話すにつれて部分的な文字起こしを表示します。
通話のためのマルチチャネル — 各話者が別々のオーディオチャネルにいる電話通話の場合は multichannel: true を使用します。
非同期のためのコールバック — 大規模なファイル文字起こしの場合は callback_url を使用します。Deepgram は完了時に結果を POST します。

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Deepgram — Real-Time Speech-to-Text API

Overview

You are an expert in Deepgram, the speech-to-text platform optimized for real-time transcription. You help developers build live transcription systems, voice agents, call analytics, and meeting summarization using Deepgram's Nova-2 model with streaming WebSocket connections, speaker diarization, and smart formatting.

Instructions

Streaming Transcription (Real-Time)

// Real-time transcription via WebSocket
import { createClient, LiveTranscriptionEvents } from "@deepgram/sdk";

const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

async function transcribeLive(audioStream: ReadableStream) {
  const connection = deepgram.listen.live({
    model: "nova-2",                    // Fastest, most accurate model
    language: "en",
    smart_format: true,                 // Auto-punctuation, casing, numbers
    interim_results: true,              // Get partial results as user speaks
    utterance_end_ms: 1000,             // 1s silence = end of utterance
    vad_events: true,                   // Voice activity detection
    diarize: true,                      // Speaker identification
    endpointing: 500,                   // 500ms endpointing for responsiveness
  });

  connection.on(LiveTranscriptionEvents.Transcript, (data) => {
    const transcript = data.channel.alternatives[0];
    if (transcript.transcript) {
      if (data.is_final) {
        console.log(`[Final] Speaker ${data.channel.alternatives[0].words?.[0]?.speaker}: ${transcript.transcript}`);
        // Send to LLM for response generation
      } else {
        console.log(`[Interim] ${transcript.transcript}`);
        // Show real-time text as user speaks
      }
    }
  });

  connection.on(LiveTranscriptionEvents.UtteranceEnd, () => {
    console.log("[Utterance complete — user stopped speaking]");
  });

  // Pipe audio to Deepgram (16kHz, 16-bit PCM or any supported format)
  for await (const chunk of audioStream) {
    connection.send(chunk);
  }
}

Pre-Recorded Transcription

# Batch transcription for recorded audio/video
from deepgram import DeepgramClient, PrerecordedOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

options = PrerecordedOptions(
    model="nova-2",
    smart_format=True,
    diarize=True,                       # Identify speakers
    summarize="v2",                     # Auto-generate summary
    topics=True,                        # Extract topics
    intents=True,                       # Detect intent (question, command, statement)
    sentiment=True,                     # Sentiment per utterance
    paragraphs=True,                    # Auto-paragraph formatting
    utterances=True,                    # Split by speaker turns
)

# From URL
response = dg.listen.rest.v("1").transcribe_url(
    {"url": "https://example.com/meeting.mp3"}, options
)

# From file
with open("recording.wav", "rb") as f:
    response = dg.listen.rest.v("1").transcribe_file(
        {"buffer": f.read(), "mimetype": "audio/wav"}, options
    )

# Access results
transcript = response.results.channels[0].alternatives[0]
print(f"Transcript: {transcript.transcript}")
print(f"Confidence: {transcript.confidence}")
print(f"Summary: {response.results.summary.short}")
for utterance in response.results.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.transcript}")

Text-to-Speech (Aura)

# Deepgram Aura TTS — low-latency voice synthesis
response = dg.speak.rest.v("1").stream_raw(
    {"text": "Thanks for calling. How can I help you today?"},
    options={
        "model": "aura-asteria-en",     # Female, warm tone
        "encoding": "linear16",          # 16-bit PCM
        "sample_rate": 24000,
    },
)

# Stream audio chunks to speaker/WebRTC
for chunk in response.iter_bytes():
    audio_output.write(chunk)

Installation

npm install @deepgram/sdk                # Node.js
pip install deepgram-sdk                  # Python

Examples

Example 1: User asks to set up deepgram

User: "Help me set up deepgram for my project"

The agent should:

Check system requirements and prerequisites
Install or configure deepgram
Set up initial project structure
Verify the setup works correctly

Example 2: User asks to build a feature with deepgram

User: "Create a dashboard using deepgram"

The agent should:

Scaffold the component or configuration
Connect to the appropriate data source
Implement the requested feature
Test and validate the output

Guidelines

Nova-2 for everything — Nova-2 is Deepgram's best model for accuracy and speed; use it unless you need a specific language model
Streaming for real-time — Use WebSocket connections for live audio; batch API for pre-recorded files
Endpointing tuning — Set endpointing to 300-500ms for voice agents (responsive) or 1000ms for transcription (accurate)
Smart formatting — Always enable smart_format for proper capitalization, punctuation, and number formatting
Diarization for meetings — Enable diarize when multiple speakers are present; Deepgram identifies up to 10 speakers
Interim results for UX — Enable interim_results for real-time text display; show partial transcripts as users speak
Multichannel for calls — Use multichannel: true for phone calls where each speaker is on a separate audio channel
Callback for async — Use callback_url for large file transcription; Deepgram POSTs results when done