💼 ビジネスコミュニティ

topview-skill

主要なAIモデルを統合したツールキットで、動画や画像、アバターを簡単な指示から自動生成・編集・共同作業するためのSkill。

📜 元の英語説明(参考)

Generate, Edit, Collaborate. Access all mainstream AI models in one toolkit. Simply describe your vision to create videos, images, and avatars—zero manual operations.

🇯🇵 日本人クリエイター向け解説

一言でいうと

主要なAIモデルを統合したツールキットで、動画や画像、アバターを簡単な指示から自動生成・編集・共同作業するためのSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o topview-skill.zip https://jpskill.com/download/5497.zip && unzip -o topview-skill.zip && rm topview-skill.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/5497.zip -OutFile "$d\topview-skill.zip"; Expand-Archive "$d\topview-skill.zip" -DestinationPath $d -Force; ri "$d\topview-skill.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して topview-skill.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → topview-skill フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 30

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Topview AI Skill

Topview AI API 用のモジュール式 Python ツールキットです。

✨ 生成。編集。共同作業。 — すべてを 1 か所で。 ✨

🧠 すべての主要モデル: ビデオ、画像、音声の世界トップクラスの AI モデルに、1 つのツールキットでシームレスにアクセスできます。
🗣️ 説明して作成: エージェントに希望を伝えるだけです。話すアバターから製品合成まで、プロンプトが正確な出力を生成します。
⚡ 手動操作不要: 手動でのアップロードや面倒な調整は不要です。すべてが自動化され、共有ボードに直接送られます。

実行ルール

常に scripts/ 内の Python スクリプトを使用してください。curl や直接の HTTP 呼び出しは使用しないでください。

ユーザー向け返信ルール

⚠️ 最優先事項 — ユーザー向けのすべての返信は、以下のすべてのルールに従う必要があります。

ほとんどのユーザーは非技術者です。多くのユーザーは Feishu、WeChat、または同様のアプリからチャットしており、ローカルブラウザのポップアップやターミナルを見ることはできません。

返信は短くする — 結果または次のステップを直接伝えてください。1 文で十分な場合は、3 文書かないでください。
平易な言葉を使用する — API 用語、ターミナルへの言及、環境変数、ポーリング、JSON、スクリプト、「認証フロー」などの言及は避けてください。ユーザーがコマンドラインを見たことがないかのように話してください。
ターミナルの詳細には決して言及しない — コマンド出力、ログ、終了コード、ファイルパス、設定ファイル、または技術的な内部情報には言及しないでください。これらはユーザーにとって意味がありません。
ユーザーにブラウザのポップアップ操作を求めない — ユーザーはエージェントのマシン画面を見ることができません。ログインが必要な場合、唯一の正しいアクションは、認証リンクをチャットに直接送信することです。
常に直接ログインリンクを送信する — auth.py login の出力から URL: ... を抽出し、以下のログインテンプレートを使用してください。「ブラウザが開きました」などの表現は決して使用しないでください。出力に URL が見つからない場合は、auth.py login を再実行して新しいリンクを取得してください。リンクの送信をスキップしないでください。
ログイン後にユーザーの確認を待つ — ユーザーに「好了」/「done」と返信するよう求め、タスクを続行してください。
エラーは簡単に説明する — タスクが失敗した場合、何が起こったかを 1 文でユーザーに伝え、再試行するかどうかを尋ねてください。エラーメッセージや技術的な詳細は決して貼り付けないでください。
結果重視であること — タスク完了後、ユーザーに結果（リンク、画像、ビデオ）を直接提供してください。中間ステップを説明しないでください。
常にユーザーの視点に立つ — ユーザーはチャットの会話しか見ることができません。それ以外は何も見えません。ユーザーのアクションを必要とするもの（リンク、確認）はすべてチャットに表示される必要があります。
ユーザーに別途登録を促さない — 認証ページにはログインとサインアップの両方が含まれています。新規ユーザーはそのページで直接登録できます。「まず topview.ai にアクセスして登録してください」とは決して言わないでください。
直接行動し、どの方法か尋ねない — ログインが必要な場合、auth.py login を実行してリンクを送信するだけです。「どの方法がよろしいですか？」と尋ねたり、複数のオプションを提示したりしないでください。ユーザーはあなたに何かを依頼しました — ログインは単なる中間ステップであり、それを処理してください。
生成タスクの推定時間を提供する — タスクを送信した後、ユーザーに推定待ち時間を伝えて、何が予想されるかを知らせてください。以下の「推定生成時間」表の推定値を使用してください。

推定生成時間

タスクを送信した後、ユーザーに推定待ち時間を伝えてください。ユーザーの言語に合わせてください。

タスクタイプ	モデル	推定時間
ビデオ	Standard / Fast (Seedance 2.0)	約 5～10 分
ビデオ	その他のすべてのビデオモデル (Kling, Sora, Veo, Vidu など)	約 3～5 分
画像	GPT Image 1.5	約 1 分
画像	その他のすべての画像モデル (Nano Banana, Seedream, Imagen, Kontext, Grok など)	約 30 秒～1 分
アバター	avatar4	約 2～5 分 (スクリプトの長さに依存)
TTS	text2voice	約 10～30 秒
背景削除	remove_bg	約 10～30 秒
製品アバター	product_avatar	約 1～2 分

送信後のメッセージ例：

中国語: 「已经开始生成了，视频大约需要 5-10 分钟，请稍等~」
英語: 「Generation started — the video will take roughly 5–10 minutes. I'll send it to you as soon as it's ready.」

必須ログインメッセージテンプレート

<LOGIN_URL> を実際のリンクに置き換えてください。ユーザーの言語（中国語ユーザーには中国語テンプレート、英語ユーザーには英語テンプレート）に従ってください。ログインリンクは Markdown に対応したプレーンテキスト形式で送信してください。中国語の場合は 点击登录 (<LOGIN_URL>)、英語の場合は Click to sign in (<LOGIN_URL>) です。

中文模板：

安装完成，Topview Skill 已连接到你的智能助手。

点击下方登录链接，登录后将解锁以下能力：

点击登录 (<LOGIN_URL>)

🎬 视频生成
文字转视频、图片转视频、参考视频生成，自动配音配乐。
视频模型：Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI 图片生成与编辑
文字生图、AI 修图、风格转换，最高支持 4K。
图片模型：Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image

🧑‍💼 口播数字人
上传一张照片 + 文案，自动生成真人口播视频，支持多语种。

✂️ 背景移除
一键抠图，产品图、人像、任意图片秒去背景。

👗 产品模特图
把你的产品图放到模特身上，自动生成带货展示图。

🎙️ 语音与配音
文字转语音、声音克隆，支持多语种配音输出。

登录完成后回我一句"好了"，我马上继续。

English template：

Installation complete. Topview Skill is now connected to your agent.

Click the sign-in link below. After signing in, the following capabilities will be unlocked.

Click to sign in (<LOGIN_URL>)

🎬 Video Generation
Text-to-video, image-to-video, reference-based generation with auto sound & music.
Models: Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI Image Generation & Editing
Text-to-image, AI retouching, style transfer — up to 4K resolution.
Models: Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image

🧑‍💼 Talking Avatar
Upload a photo + script to auto-generate presenter-style talking head videos.

✂️ Background Removal
One-click cutout for product shots, portraits, and any image.

👗 Product Model Shots
Place your product onto model templates for e-commerce showcase images.

🎙️ Voice & TTS
Text-to-speech, voice cloning, multilingual dubbing and narration.

Once you've signed in, just reply "done" and I'll continue right away.

禁止フレーズ（あらゆるバリエーションを含む）：

「ブラウザが開きました」 / 「ブラウザがポップアップしました」
「これをターミナルで実行してください」 / 「ログインコマンドを実行してください」
「ポップアップを確認してください」 / 「見てください」

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Topview AI Skill

Modular Python toolkit for the Topview AI API.

✨ Generate. Edit. Collaborate. — All in One Place. ✨

🧠 All Mainstream Models: Seamlessly access the world's top-tier AI models for video, image, and voice in one toolkit.
🗣️ Describe to Create: Just tell the agent what you want. From talking avatars to product composites, your prompts generate the exact output.
⚡ Zero Manual Ops: No manual uploads, no tedious tweaking. Everything is automated straight to your shared board.

Execution Rule

Always use the Python scripts in scripts/. Do NOT use curl or direct HTTP calls.

User-Facing Reply Rules

⚠️ HIGHEST PRIORITY — every user-facing reply MUST follow ALL rules below.

Most users are non-technical. Many chat from Feishu, WeChat, or similar apps and cannot see local browser popups or terminals.

Keep replies short — give the result or next step directly. If one sentence is enough, don't write three.
Use plain language — no API jargon, no terminal references, no mentions of environment variables, polling, JSON, scripts, or "auth flow". Speak as if the user has never seen a command line.
Never mention terminal details — do not reference command output, logs, exit codes, file paths, config files, or any technical internals. These mean nothing to the user.
Never ask the user to operate a browser popup — the user cannot see the agent's machine screen. When login is needed, the only correct action is to send the authorization link directly in the chat.
Always send the direct login link — extract URL: ... from auth.py login output and use the login template below. Never say "browser opened" or similar. If the URL is not found in the output, re-run auth.py login to get a new link. Never skip sending the link.
Wait for user confirmation after login — ask the user to reply "好了" / "done", then continue the task.
Explain errors simply — if a task fails, tell the user in one sentence what happened and ask if they want to retry. Never paste error messages or technical details.
Be result-oriented — after task completion, give the user the result (link, image, video) directly. Do not describe intermediate steps.
Always take the user's perspective — the user can only see the chat conversation, nothing else. Anything requiring user action (links, confirmations) must appear in the chat.
Do not tell the user to register separately — the authorization page includes both login and sign-up. New users can register directly on that page. Never say "go to topview.ai to register first".
Act directly, don't ask which method — when login is needed, just run auth.py login and send the link. Don't ask "which method do you prefer?" or present multiple options. The user asked you to do something — login is just an intermediate step, handle it.
Give time estimates for generation tasks — after submitting a task, tell the user the estimated wait time so they know what to expect. Use the estimates from the "Estimated Generation Time" table below.

Estimated Generation Time

Tell the user the estimated wait time after submitting a task. Match the user's language.

Task Type	Model	Estimated Time
Video	Standard / Fast (Seedance 2.0)	~5–10 min
Video	All other video models (Kling, Sora, Veo, Vidu, etc.)	~3–5 min
Image	GPT Image 1.5	~1 min
Image	All other image models (Nano Banana, Seedream, Imagen, Kontext, Grok, etc.)	~30s–1 min
Avatar	avatar4	~2–5 min (depends on script length)
TTS	text2voice	~10–30s
Remove BG	remove_bg	~10–30s
Product Avatar	product_avatar	~1–2 min

Example messages after submitting:

Chinese: "已经开始生成了，视频大约需要 5-10 分钟，请稍等~"
English: "Generation started — the video will take roughly 5–10 minutes. I'll send it to you as soon as it's ready."

Required login message template

Replace <LOGIN_URL> with the actual link. Follow the user's language (Chinese template for Chinese users, English for English users). Send the login link in Markdown-friendly plain text format: 点击登录 (<LOGIN_URL>) for Chinese, Click to sign in (<LOGIN_URL>) for English.

中文模板：

安装完成，Topview Skill 已连接到你的智能助手。

点击下方登录链接，登录后将解锁以下能力：

点击登录 (<LOGIN_URL>)

🎬 视频生成
文字转视频、图片转视频、参考视频生成，自动配音配乐。
视频模型：Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI 图片生成与编辑
文字生图、AI 修图、风格转换，最高支持 4K。
图片模型：Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image

🧑‍💼 口播数字人
上传一张照片 + 文案，自动生成真人口播视频，支持多语种。

✂️ 背景移除
一键抠图，产品图、人像、任意图片秒去背景。

👗 产品模特图
把你的产品图放到模特身上，自动生成带货展示图。

🎙️ 语音与配音
文字转语音、声音克隆，支持多语种配音输出。

登录完成后回我一句"好了"，我马上继续。

English template:

Installation complete. Topview Skill is now connected to your agent.

Click the sign-in link below. After signing in, the following capabilities will be unlocked.

Click to sign in (<LOGIN_URL>)

🎬 Video Generation
Text-to-video, image-to-video, reference-based generation with auto sound & music.
Models: Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI Image Generation & Editing
Text-to-image, AI retouching, style transfer — up to 4K resolution.
Models: Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image

🧑‍💼 Talking Avatar
Upload a photo + script to auto-generate presenter-style talking head videos.

✂️ Background Removal
One-click cutout for product shots, portraits, and any image.

👗 Product Model Shots
Place your product onto model templates for e-commerce showcase images.

🎙️ Voice & TTS
Text-to-speech, voice cloning, multilingual dubbing and narration.

Once you've signed in, just reply "done" and I'll continue right away.

Banned phrases (including any variations):

"Browser has opened" / "browser popped up"
"Run this in the terminal" / "run the login command"
"Check the popup" / "look at the browser"
"Set the environment variable"
"Command executed successfully"
"Polling task status"
"Script output is as follows"
"Go operate on that computer" / "check the robot's computer"
"Authorization page popped up" / "if the page appeared"
"Go to topview.ai to register first" — auth page has built-in registration
"Which method do you prefer?" / "two options for you" — don't give choices, just act
"Auth flow" / "perform authentication" / "complete authentication" — too technical
"Python config" / "environment setup" — user doesn't need to know
Anything asking the user to operate outside the chat window
Anything containing code, commands, or file paths

Fallback when login URL is not captured:

If auth.py login output does not contain a URL: line (e.g. background execution missed the output), re-run auth.py login to get a fresh link. NEVER fall back to telling the user to "check the browser popup" or "go operate on the agent's computer". The user cannot see it.

Prerequisites

Python 3.8+
Authenticated — see references/auth.md for the direct-link login flow
Credits available — see references/user.md to check balance
Env vars TOPVIEW_UID + TOPVIEW_API_KEY are handled automatically after login; manual setup is only for CI/internal use

pip install -r {baseDir}/scripts/requirements.txt

Agent Workflow Rules

These rules apply to ALL generation modules (avatar4, video_gen, ai_image, remove_bg, product_avatar, text2voice).

Always start with run — it submits the task and polls automatically until done. This is the default and correct choice in almost all situations.
Do NOT ask the user to check the task status themselves. The agent is responsible for polling until the task completes or the timeout is reached.
Only use query when run has already timed out and you have a taskId to resume, or when the user explicitly provides an existing taskId.
query polls continuously — it keeps checking every --interval seconds until status is success or fail, or --timeout expires. It does not stop after one check.
If query also times out (exit code 2), increase --timeout and try again with the same taskId. Do not resubmit unless the task has actually failed.

Decision tree:
  → New request?           use `run`
  → run timed out?         use `query --task-id <id>`
  → query timed out?       use `query --task-id <id> --timeout 1200`
  → task status=fail?      resubmit with `run`

Task Status:

Status	Description
`init`	Task is queued, waiting to be processed
`running`	Task is actively being processed
`success`	Task completed successfully
`fail`	Task failed

Board ID Protocol

Every generation task should include a --board-id so results are organized and viewable on the web.

Session start — before submitting the first task, run board.py list --default -q to get the default board ID ("My First Board"). Only need to do this once per session.
Pass to all tasks — add --board-id <id> to every generation command (avatar4.py, video_gen.py, ai_image.py, product_avatar.py, text2voice.py).
After completion — if the task result contains a boardTaskId, show the user the edit link: https://www.topview.ai/board/{boardId}?boardResultId={boardTaskId}. Tell the user they can view and edit the result via this link.
User wants a new board — run board.py create --name "..." and use the returned board ID for subsequent tasks.
User specifies a board — use the user-provided board ID instead of the default.
Forgot the board ID? — run board.py list --default -q again.

Session flow:
  1. BOARD_ID = $(board.py list --default -q)
  2. avatar4.py run --board-id $BOARD_ID ...
  3. video_gen.py run --board-id $BOARD_ID ...
  4. (result shows edit link with boardTaskId)

Modules

Module	Script	Reference	Description
Auth	`scripts/auth.py`	auth.md	OAuth 2.0 Device Flow — generate login link, wait for authorization, save credentials
Avatar4	`scripts/avatar4.py`	avatar4.md	Talking avatar videos from a photo; `list-captions` for caption styles
Video Gen	`scripts/video_gen.py`	video_gen.md	Image-to-video, text-to-video, omni reference(video generation from reference video, image, audio and text)
AI Image	`scripts/ai_image.py`	ai_image.md	Text-to-image and AI image editing (10+ models)
Remove BG	`scripts/remove_bg.py`	remove_bg.md	Remove image background — step 1 of Product Avatar flow
Product Avatar	`scripts/product_avatar.py`	product_avatar.md	Model showcase product image; `list-avatars`/`list-categories` for template browsing
Text2Voice	`scripts/text2voice.py`	text2voice.md	Text-to-speech audio generation
Voice	`scripts/voice.py`	voice.md	Voice list/search, voice cloning, delete custom voices
Board	`scripts/board.py`	board.md	Board management — organize results, view/edit on web
User	`scripts/user.py`	user.md	Credit balance and usage history

Read individual reference docs for usage, options, and code examples. Local files (image/audio/video) are auto-uploaded when passed as arguments — no manual upload step needed.

Creative Guide

Core Principle: Start from the user's intent, not from the API. Analyze what the user wants to achieve, then pick the right tool, model, and parameters.

Step 1 — Intent Analysis

Every time a user requests content, identify:

Dimension	Ask Yourself	Fallback
Output Type	Image? Video? Audio? Composite?	Must ask
Purpose	Marketing? Education? Social media? Personal?	General social media
Source Material	What does the user have? What's missing?	Must ask
Style / Tone	Professional? Casual? Playful? Authoritative?	Professional & friendly
Duration	How long should the output be?	5–15s for clips, 30–60s for avatar
Language	What language? Need captions?	Match user's language
Channel	Where will it be published?	General purpose

Step 2 — Tool Selection

What does the user need?
│
├─ A person speaking to camera (talking head)?
│  → avatar4 or video_gen with native-audio models
│
├─ An image animated into a video clip?
│  → video_gen --type i2v
│
├─ A video generated purely from text?
│  → video_gen --type t2v
│
├─ A new video based on reference materials (style transfer, editing)?
│  → video_gen --type omni
│
├─ An image generated from a text prompt?
│  → ai_image --type text2image
│
├─ An existing image edited / modified with AI?
│  → ai_image --type image_edit
│
├─ Remove background from an image (e.g. product cutout)?
│  → remove_bg
│
├─ A product placed into a model/avatar scene?
│  → product_avatar (use remove_bg first if product has background)
│  → product_avatar list-avatars to browse public templates
│
├─ Browse available caption styles for avatar videos?
│  → avatar4 list-captions
│
├─ Text converted to speech audio?
│  → text2voice
│
├─ Need to find a voice / list available voices?
│  → voice list
│
├─ Clone a custom voice from audio sample?
│  → voice clone
│
├─ Delete a custom voice?
│  → voice delete
│
├─ Manage boards / view results on web?
│  → board (list, create, detail, tasks)
│
├─ A combination (e.g., talking head + product clips)?
│  → Use a recipe (see Step 3)
│
└─ Outside current capabilities?
   → See Capability Boundaries below

Quick-reference routing table:

User says...	Script & Type
"Make a talking avatar video with this photo and text"	`avatar4.py` (pass local image path directly)
"Generate a video with this photo and my audio recording"	`avatar4.py` (pass local image + audio paths)
"Animate this image / image-to-video"	`video_gen.py --type i2v` (pass local image path)
"Generate a video about..."	`video_gen.py --type t2v`
"Generate a new video referencing this image's style"	`video_gen.py --type omni`
"Generate an image / text-to-image"	`ai_image.py --type text2image`
"Modify this image / change background"	`ai_image.py --type image_edit`
"Remove image background / cutout"	`remove_bg.py`
"Put this product on a model image"	`product_avatar.py` (use `remove_bg.py` first if product has background)
"What product avatar/model templates are available?"	`product_avatar.py list-avatars`
"What caption styles are available?"	`avatar4.py list-captions`
"Convert this text to speech / audio"	`text2voice.py`
"What voices are available? / Find a female voice"	`voice.py list --gender female`
"Clone a voice from this audio recording"	`voice.py clone --audio <file>`
"Delete this custom voice"	`voice.py delete --voice-id <id>`
"View my board / check what was generated"	`board.py list` or `board.py tasks --board-id <id>`
"Create a new board"	`board.py create --name "..."`
"Check how many credits I have left"	`user.py credit`

Video model selection — see references/video_gen.md § Model Recommendation.

Image model tip: For all image tasks, default to Nano Banana 2 — strongest all-round model with best quality, 14 aspect ratios, up to 4K, and 14 reference images for editing. See references/ai_image.md § Model Recommendation.

Product Avatar workflow: For best results, use the 2-step flow: remove_bg.py to get a bgRemovedImageFileId, then product_avatar.py with --product-image-no-bg. Use product_avatar.py list-avatars to browse public templates and get an avatarId. See references/product_avatar.md § Full Workflow.

Caption styles for avatar4: Use avatar4.py list-captions to discover available caption styles, then pass the captionId via --caption.

Talking-head tip — avatar4 vs video_gen with native audio: Some video_gen models (e.g. Standard, Kling V3, Veo 3.1) support native audio and can produce talking-head videos with better visual quality than avatar4. However, they have shorter max duration (5–15s) and are significantly more expensive. Avatar4 supports up to 120s per segment at much lower cost. Rule of thumb: Default to avatar4 for most talking-head needs. Consider video_gen native-audio models only when the clip is short (<=15s) and the user explicitly prioritizes top-tier visual quality over cost.

Step 3 — Simple vs Complex

Simple requests — the user's need is clear, materials are ready → handle directly from the reference docs.

Complex requests — the user gives a goal (e.g., "make a promo video", "explain how AI works") rather than a direct API instruction. Follow this universal workflow:

Deconstruct & Clarify: Ask the user for the target audience, core message, intended duration, and what assets they currently have (photos, scripts).
Determine the Route:
- Has a person's photo + needs narration → Use avatar4 (Talking Head).
- Has a product/reference photo → Use video_gen --type i2v or omni.
- No assets, purely visual concept → Use video_gen --type t2v.
- Requires both → Plan a Hybrid approach (Avatar narration + B-roll inserts).
Structure the Content:
- Write a structured script (Hook → Body/Explanation → Call to Action).
- Add <break time="0.5s"/> tags to TTS scripts for natural pacing.
- For visuals, write detailed prompts covering Subject + Action + Lighting + Camera.
Handle Long-Form (>120s): If the script exceeds the 120s limit for a single avatar4 task, split it into logical segments (e.g., 60s each) at natural sentence boundaries. Submit tasks in parallel using the submit command, ensure parameters (voice/model) remain locked across segments, and deliver them in order.

Pre-Execution Protocol

Follow this before EVERY generation task.

Estimate cost — use video_gen.py estimate-cost for video tasks, ai_image.py estimate-cost for image tasks; avatar4 costs depend on video length; product_avatar is fixed 0.5 credits; text2voice is fixed 0.1 credits
Validate parameters — ensure model, aspect ratio, resolution, and duration are compatible (use list-models to check)
Ask about missing key parameters — if the user has not specified important parameters that affect the output, ask before proceeding. Key parameters by module:
- video_gen: duration, aspect ratio, model
- ai_image: aspect ratio, resolution, model, number of images
- avatar4: (usually determined by input, but confirm voice if not specified)
- text2voice: voice selection
- Do NOT silently pick defaults for these — always confirm with the user.
Confirm before first submission — before the very first generation task in a session, present the full plan (tool, model, parameters, cost estimate) and ask the user:
- Whether to proceed with the generation
- Whether they want the agent to ask for confirmation before each subsequent task, or trust the agent to proceed automatically for the rest of the session
- These two questions should be combined into a single confirmation message.
- If the user chooses "auto-proceed", skip the confirmation step (but still ask about missing parameters) for subsequent tasks in the same session.
- If the user explicitly said "just do it" or similar upfront, treat it as auto-proceed from the start.

Agent Behavior Protocol

During Execution

Pass local paths directly — scripts auto-upload local files to S3 before submitting tasks
Parallelize independent steps — independent generation tasks can run concurrently
Keep consistency across segments — when generating multiple segments, use identical parameters

After Execution

Use the structured result templates below. The user should see the output link first, then the board link, then key metadata. Keep it clean and scannable.

Video result template:

🎬 视频已生成完成

视频地址：<VIDEO_URL>
• 时长：<DURATION>
• 画幅：<ASPECT_RATIO>
• 模型：<MODEL_NAME>
• 消耗：<COST> credits

🔗 项目链接
https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID>
可在项目中查看、编辑和下载。

不满意的话可以告诉我，我帮你调整后重新生成。

Image result template:

🖼️ 图片已生成完成

图片地址：<IMAGE_URL>
• 分辨率：<RESOLUTION>
• 模型：<MODEL_NAME>
• 消耗：<COST> credits

🔗 项目链接
https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID>
可在项目中查看、编辑和下载。

不满意的话可以告诉我，我帮你调整后重新生成。

English video result template:

🎬 Video generated

Video: <VIDEO_URL>
• Duration: <DURATION>
• Aspect ratio: <ASPECT_RATIO>
• Model: <MODEL_NAME>
• Cost: <COST> credits

🔗 Project link
https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID>
View, edit, and download in the project.

Not happy with the result? Let me know and I'll adjust and regenerate.

English image result template:

🖼️ Image generated

Image: <IMAGE_URL>
• Resolution: <RESOLUTION>
• Model: <MODEL_NAME>
• Cost: <COST> credits

🔗 Project link
https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID>
View, edit, and download in the project.

Not happy with the result? Let me know and I'll adjust and regenerate.

Rules:

Result link first — always show the video/image URL at the very top.
Board link second — if boardTaskId is available, show the board edit link.
Key metadata only — duration, aspect ratio/resolution, model, cost. Don't dump raw JSON or extra fields.
Offer iteration — end with a short note that the user can ask for adjustments. Remind that regeneration costs additional credits.
Multiple outputs — if the task produced multiple results, number them (1, 2, 3…) each with its own link and metadata.
Match user language — use the Chinese template for Chinese users, English for English users.

Error Handling

See references/error_handling.md for error codes, task-level failures, and recovery decision tree.

Capability Boundaries

Capability	Status	Script
Photo avatar / talking head	Available	`scripts/avatar4.py`
Caption styles	Available	`scripts/avatar4.py list-captions`
Credit management	Available	`scripts/user.py`
Image-to-video (i2v)	Available	`scripts/video_gen.py --type i2v`
Text-to-video (t2v)	Available	`scripts/video_gen.py --type t2v`
Omni reference video	Available	`scripts/video_gen.py --type omni`
Text-to-image	Available	`scripts/ai_image.py --type text2image`
Image editing	Available	`scripts/ai_image.py --type image_edit`
Remove background	Available	`scripts/remove_bg.py`
Product avatar / image replace	Available	`scripts/product_avatar.py`
Product avatar templates	Available	`scripts/product_avatar.py list-avatars` / `list-categories`
Text-to-speech (TTS)	Available	`scripts/text2voice.py`
Voice list / search	Available	`scripts/voice.py list`
Voice cloning	Available	`scripts/voice.py clone`
Delete custom voice	Available	`scripts/voice.py delete`
Board management	Available	`scripts/board.py`
Board task browsing	Available	`scripts/board.py tasks` / `task-detail`
Marketing video (m2v)	No module	Suggest topview.ai web UI

Never promise capabilities that don't exist as modules.

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (25,773 bytes)
📎 LICENSE.txt (11,357 bytes)
📎 README.md (5,953 bytes)
📎 references/ai_image.md (4,481 bytes)
📎 references/auth.md (4,959 bytes)
📎 references/avatar4.md (3,751 bytes)
📎 references/board.md (4,628 bytes)
📎 references/error_handling.md (2,193 bytes)
📎 references/product_avatar.md (5,241 bytes)
📎 references/remove_bg.md (2,834 bytes)
📎 references/text2voice.md (2,953 bytes)
📎 references/user.md (1,594 bytes)
📎 references/video_gen.md (6,592 bytes)
📎 references/voice.md (4,979 bytes)
📎 scripts/__init__.py (0 bytes)
📎 scripts/ai_image.py (26,206 bytes)
📎 scripts/auth.py (10,760 bytes)
📎 scripts/avatar4.py (13,776 bytes)
📎 scripts/board.py (15,232 bytes)
📎 scripts/product_avatar.py (17,256 bytes)
📎 scripts/remove_bg.py (9,639 bytes)
📎 scripts/requirements.txt (38 bytes)
📎 scripts/shared/__init__.py (66 bytes)
📎 scripts/shared/client.py (5,878 bytes)
📎 scripts/shared/config.py (2,123 bytes)
📎 scripts/shared/upload.py (2,511 bytes)
📎 scripts/text2voice.py (11,186 bytes)
📎 scripts/user.py (3,581 bytes)
📎 scripts/video_gen.py (39,149 bytes)
📎 scripts/voice.py (12,913 bytes)