🛠️ 開発・MCP コミュニティ

groq

Groqは、高速な言語処理ユニット(LPU)を搭載し、最速のトークン生成速度を誇るLLM推論プラットフォームで、チャットボットやコード補完など、低遅延が重要なリアルタイムAIアプリケーション向けに、GroqのAPI統合を支援するSkill。

📜 元の英語説明(参考)

Expert guidance for Groq, the LLM inference platform that provides the fastest token generation speeds available, powered by custom LPU (Language Processing Unit) hardware. Helps developers integrate Groq's API for real-time AI applications where latency matters — chatbots, code completion, and streaming responses.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o groq.zip https://jpskill.com/download/14962.zip && unzip -o groq.zip && rm groq.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/14962.zip -OutFile "$d\groq.zip"; Expand-Archive "$d\groq.zip" -DestinationPath $d -Force; ri "$d\groq.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して groq.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → groq フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Groq — 超高速 LLM 推論

概要

Groq は、カスタム LPU (Language Processing Unit) ハードウェアを搭載し、利用可能な中で最速のトークン生成速度を提供する LLM 推論プラットフォームです。開発者が、レイテンシが重要なリアルタイム AI アプリケーション (チャットボット、コード補完、ストリーミング応答など) 向けに Groq の API を統合するのを支援します。

手順

基本的なチャット補完

// src/llm/groq-client.ts — Groq API (OpenAI 互換)
import Groq from "groq-sdk";

const groq = new Groq({
  apiKey: process.env.GROQ_API_KEY!,
});

// 基本的な補完 — 利用可能な中で最速の LLM 推論
async function chat(prompt: string): Promise<string> {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",     // 高速かつ高性能
    messages: [
      {
        role: "system",
        content: "あなたは親切なアシスタントです。簡潔かつ直接的に答えてください。",
      },
      { role: "user", content: prompt },
    ],
    temperature: 0.7,
    max_tokens: 1024,
  });

  return completion.choices[0].message.content ?? "";
}

// リアルタイム UI 向けのストリーミング
async function streamChat(
  prompt: string,
  onChunk: (text: string) => void
): Promise<string> {
  const stream = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  });

  let fullResponse = "";
  for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content ?? "";
    fullResponse += text;
    onChunk(text);
  }
  return fullResponse;
}

構造化された出力 (JSON モード)

// src/llm/structured.ts — Groq から構造化された JSON 応答を取得
async function extractEntities(text: string) {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      {
        role: "system",
        content: `テキストからエンティティを抽出します。次の構造で JSON を返してください:
          { "people": [string], "organizations": [string], "locations": [string], "dates": [string] }`,
      },
      { role: "user", content: text },
    ],
    response_format: { type: "json_object" },
    temperature: 0,                        // 抽出には決定的であること
  });

  return JSON.parse(completion.choices[0].message.content!);
}

// ツール使用 / 関数呼び出し
async function chatWithTools(prompt: string, tools: any[]) {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [{ role: "user", content: prompt }],
    tools: [
      {
        type: "function",
        function: {
          name: "get_weather",
          description: "都市の現在の天気を取得します",
          parameters: {
            type: "object",
            properties: {
              city: { type: "string", description: "都市名" },
              unit: { type: "string", enum: ["celsius", "fahrenheit"] },
            },
            required: ["city"],
          },
        },
      },
      {
        type: "function",
        function: {
          name: "search_web",
          description: "ウェブで情報を検索します",
          parameters: {
            type: "object",
            properties: {
              query: { type: "string", description: "検索クエリ" },
            },
            required: ["query"],
          },
        },
      },
    ],
    tool_choice: "auto",
  });

  const toolCalls = completion.choices[0].message.tool_calls;
  if (toolCalls) {
    for (const call of toolCalls) {
      console.log(`Tool: ${call.function.name}, Args: ${call.function.arguments}`);
    }
  }
  return completion;
}

音声文字起こし (Whisper)

// src/audio/transcribe.ts — Groq 経由での高速音声文字起こし
import fs from "fs";

async function transcribeAudio(filePath: string) {
  const transcription = await groq.audio.transcriptions.create({
    file: fs.createReadStream(filePath),
    model: "whisper-large-v3-turbo",       // 最速の Whisper モデル
    language: "en",                         // オプション: 言語を指定
    response_format: "verbose_json",       // タイムスタンプを含む
    temperature: 0,
  });

  return {
    text: transcription.text,
    segments: transcription.segments?.map((s: any) => ({
      start: s.start,
      end: s.end,
      text: s.text,
    })),
    duration: transcription.duration,
  };
}

// 音声を英語に翻訳
async function translateAudio(filePath: string) {
  const translation = await groq.audio.translations.create({
    file: fs.createReadStream(filePath),
    model: "whisper-large-v3-turbo",
    response_format: "text",
  });
  return translation;
}

モデルの選択

// 利用可能な Groq モデル (2026 年初頭時点)
const MODELS = {
  // Llama 3.3 — 速度と品質の最適なバランス
  "llama-3.3-70b-versatile": {
    contextWindow: 128_000,
    bestFor: "汎用、推論、コーディング",
    speed: "~350 tok/s",
  },
  // Llama 3.1 — より大きなコンテキスト、わずかに低速
  "llama-3.1-8b-instant": {
    contextWindow: 128_000,
    bestFor: "単純なタスク、分類、抽出",
    speed: "~750 tok/s",
  },
  // Mixtral — 多言語に最適
  "mixtral-8x7b-32768": {
    contextWindow: 32_768,
    bestFor: "多言語、クリエイティブライティング",
    speed: "~500 tok/s",
  },
  // Gemma 2 — コンパクトで効率的
  "gemma2-9b-it": {
    contextWindow: 8_192,
    bestFor: "軽量タスク、エッジデプロイメントの比較",
    speed: "~600 tok/s",
  },
};

// タスクに基づいてモデルを選択
function selectModel(task: "chat" | "extract" | "code" | "fast"): string {
  switch (task) {
    case "chat": return "llama-3.3-70b-versatile";
    case "extract": return "llama-3.1-8b-instant";     // 高速、抽出には十分
    case "code": return "llama-3.3-70b-versatile";
    case "fast": return "llama-3.1-8b-instant";         // 最小レイテンシ
  }
}

Python 統合

# src/groq_client.py — Python での Groq (OpenAI 互換)
fr

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Groq — Ultra-Fast LLM Inference

Overview

Groq, the LLM inference platform that provides the fastest token generation speeds available, powered by custom LPU (Language Processing Unit) hardware. Helps developers integrate Groq's API for real-time AI applications where latency matters — chatbots, code completion, and streaming responses.

Instructions

Basic Chat Completion

// src/llm/groq-client.ts — Groq API (OpenAI-compatible)
import Groq from "groq-sdk";

const groq = new Groq({
  apiKey: process.env.GROQ_API_KEY!,
});

// Basic completion — fastest LLM inference available
async function chat(prompt: string): Promise<string> {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",     // Fast and capable
    messages: [
      {
        role: "system",
        content: "You are a helpful assistant. Be concise and direct.",
      },
      { role: "user", content: prompt },
    ],
    temperature: 0.7,
    max_tokens: 1024,
  });

  return completion.choices[0].message.content ?? "";
}

// Streaming for real-time UI
async function streamChat(
  prompt: string,
  onChunk: (text: string) => void
): Promise<string> {
  const stream = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  });

  let fullResponse = "";
  for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content ?? "";
    fullResponse += text;
    onChunk(text);
  }
  return fullResponse;
}

Structured Output (JSON Mode)

// src/llm/structured.ts — Get structured JSON responses from Groq
async function extractEntities(text: string) {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      {
        role: "system",
        content: `Extract entities from the text. Return JSON with this structure:
          { "people": [string], "organizations": [string], "locations": [string], "dates": [string] }`,
      },
      { role: "user", content: text },
    ],
    response_format: { type: "json_object" },
    temperature: 0,                        // Deterministic for extraction
  });

  return JSON.parse(completion.choices[0].message.content!);
}

// Tool use / Function calling
async function chatWithTools(prompt: string, tools: any[]) {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [{ role: "user", content: prompt }],
    tools: [
      {
        type: "function",
        function: {
          name: "get_weather",
          description: "Get the current weather for a city",
          parameters: {
            type: "object",
            properties: {
              city: { type: "string", description: "City name" },
              unit: { type: "string", enum: ["celsius", "fahrenheit"] },
            },
            required: ["city"],
          },
        },
      },
      {
        type: "function",
        function: {
          name: "search_web",
          description: "Search the web for information",
          parameters: {
            type: "object",
            properties: {
              query: { type: "string", description: "Search query" },
            },
            required: ["query"],
          },
        },
      },
    ],
    tool_choice: "auto",
  });

  const toolCalls = completion.choices[0].message.tool_calls;
  if (toolCalls) {
    for (const call of toolCalls) {
      console.log(`Tool: ${call.function.name}, Args: ${call.function.arguments}`);
    }
  }
  return completion;
}

Audio Transcription (Whisper)

// src/audio/transcribe.ts — Fast audio transcription via Groq
import fs from "fs";

async function transcribeAudio(filePath: string) {
  const transcription = await groq.audio.transcriptions.create({
    file: fs.createReadStream(filePath),
    model: "whisper-large-v3-turbo",       // Fastest Whisper model
    language: "en",                         // Optional: specify language
    response_format: "verbose_json",       // Includes timestamps
    temperature: 0,
  });

  return {
    text: transcription.text,
    segments: transcription.segments?.map((s: any) => ({
      start: s.start,
      end: s.end,
      text: s.text,
    })),
    duration: transcription.duration,
  };
}

// Translate audio to English
async function translateAudio(filePath: string) {
  const translation = await groq.audio.translations.create({
    file: fs.createReadStream(filePath),
    model: "whisper-large-v3-turbo",
    response_format: "text",
  });
  return translation;
}

Model Selection

// Available Groq models (as of early 2026)
const MODELS = {
  // Llama 3.3 — best balance of speed and quality
  "llama-3.3-70b-versatile": {
    contextWindow: 128_000,
    bestFor: "general-purpose, reasoning, coding",
    speed: "~350 tok/s",
  },
  // Llama 3.1 — larger context, slightly slower
  "llama-3.1-8b-instant": {
    contextWindow: 128_000,
    bestFor: "simple tasks, classification, extraction",
    speed: "~750 tok/s",
  },
  // Mixtral — great for multilingual
  "mixtral-8x7b-32768": {
    contextWindow: 32_768,
    bestFor: "multilingual, creative writing",
    speed: "~500 tok/s",
  },
  // Gemma 2 — compact and efficient
  "gemma2-9b-it": {
    contextWindow: 8_192,
    bestFor: "lightweight tasks, edge deployment comparison",
    speed: "~600 tok/s",
  },
};

// Choose model based on task
function selectModel(task: "chat" | "extract" | "code" | "fast"): string {
  switch (task) {
    case "chat": return "llama-3.3-70b-versatile";
    case "extract": return "llama-3.1-8b-instant";     // Fast, good enough for extraction
    case "code": return "llama-3.3-70b-versatile";
    case "fast": return "llama-3.1-8b-instant";         // Lowest latency
  }
}

Python Integration

# src/groq_client.py — Groq with Python (OpenAI-compatible)
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

# Basic completion
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Explain quantum computing in 3 sentences"}],
    temperature=0.7,
    max_tokens=200,
)
print(response.choices[0].message.content)

# With OpenAI SDK (drop-in replacement)
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["GROQ_API_KEY"],
    base_url="https://api.groq.com/openai/v1",
)
# Same API, Groq's speed

Installation

# TypeScript/JavaScript
npm install groq-sdk

# Python
pip install groq

# Or use with any OpenAI-compatible SDK
# Just set base_url to https://api.groq.com/openai/v1

Examples

Example 1: Integrating Groq into an existing application

User request:

Add Groq to my Next.js app for the AI chat feature. I want streaming responses.

The agent installs the SDK, creates an API route that initializes the Groq client, configures streaming, selects an appropriate model, and wires up the frontend to consume the stream. It handles error cases and sets up proper environment variable management for the API key.

Example 2: Optimizing structured output performance

User request:

My Groq calls are slow and expensive. Help me optimize the setup.

The agent reviews the current implementation, identifies issues (wrong model selection, missing caching, inefficient prompting, no batching), and applies optimizations specific to Groq's capabilities — adjusting model parameters, adding response caching, and implementing retry logic with exponential backoff.

Guidelines

Use Groq for latency-sensitive tasks — Chatbots, autocomplete, real-time analysis; Groq's speed is 5-10x faster than cloud GPU providers
llama-3.1-8b for simple tasks — Don't use the 70B model for classification or extraction; 8B is faster and cheaper
JSON mode for structured output — Use response_format: { type: "json_object" } for reliable structured responses
Stream everything user-facing — With Groq's speed, streaming feels instant; still use it for perceived responsiveness
Whisper for audio — Groq's Whisper is the fastest transcription API available; process audio files in seconds
Rate limits vary by model — Check your rate limits in the Groq dashboard; 8B models have higher request limits
OpenAI SDK compatibility — Switch from OpenAI to Groq by changing the base URL; no code changes needed
Fallback to other providers — Groq can have capacity constraints during peak; have a fallback to OpenAI or Anthropic