🎨 Stable Diffusionで画像生成
HuggingFace Diffusers経由でStable Diffusionから画像生成・編集するSkill。
📺 まず動画で見る(YouTube)
▶ Geminiの画像生成(NanoBanana)の面白い使い方12選 ↗
※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。
📜 元の英語説明(参考)
State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines.
🇯🇵 日本人クリエイター向け解説
HuggingFace Diffusers経由でStable Diffusionから画像生成・編集するSkill。
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 3
💬 こう話しかけるだけ — サンプルプロンプト
- › 秋の京都の路地で着物の女性、雑誌風の画像をSDXLで作って
- › ブログのアイキャッチ用に、ミニマルなノートPCの画像
- › ファッションECサイト用のモデル画像をSDで生成、白背景
- › 商品パッケージ撮影風の画像を、和菓子で作って
- › アニメ風のキャラクターアイコンを作って。猫耳メイドさん
これをClaude Code に貼るだけで、このSkillが自動発動します。
📺 実際の使用例(入出力サンプル)
入力
「秋の北海道のラベンダー畑で、白いワンピースの30代女性、夕焼けバック、雑誌の表紙風」
Claude が組み立てるSDプロンプト
Positive: masterpiece, best quality, professional fashion photography,
asian woman in her 30s, flowing white dress, standing in lavender field,
golden hour sunset, hokkaido landscape, magazine cover composition,
shallow depth of field, 85mm lens
Negative: bad anatomy, low quality, blurry, watermark, text
Steps: 30, CFG: 7, Sampler: DPM++ 2M Karras 🔗 関連するSkill
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
Stable Diffusion Image Generation
Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.
When to use Stable Diffusion
Use Stable Diffusion when:
- Generating images from text descriptions
- Performing image-to-image translation (style transfer, enhancement)
- Inpainting (filling in masked regions)
- Outpainting (extending images beyond boundaries)
- Creating variations of existing images
- Building custom image generation workflows
Key features:
- Text-to-Image: Generate images from natural language prompts
- Image-to-Image: Transform existing images with text guidance
- Inpainting: Fill masked regions with context-aware content
- ControlNet: Add spatial conditioning (edges, poses, depth)
- LoRA Support: Efficient fine-tuning and style adaptation
- Multiple Models: SD 1.5, SDXL, SD 3.0, Flux support
Use alternatives instead:
- DALL-E 3: For API-based generation without GPU
- Midjourney: For artistic, stylized outputs
- Imagen: For Google Cloud integration
- Leonardo.ai: For web-based creative workflows
Quick start
Installation
pip install diffusers transformers accelerate torch
pip install xformers # Optional: memory-efficient attention
Basic text-to-image
from diffusers import DiffusionPipeline
import torch
# Load pipeline (auto-detects model type)
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe.to("cuda")
# Generate image
image = pipe(
"A serene mountain landscape at sunset, highly detailed",
num_inference_steps=50,
guidance_scale=7.5
).images[0]
image.save("output.png")
Using SDXL (higher quality)
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
# Enable memory optimization
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A futuristic city with flying cars, cinematic lighting",
height=1024,
width=1024,
num_inference_steps=30
).images[0]
Architecture overview
Three-pillar design
Diffusers is built around three core components:
Pipeline (orchestration)
├── Model (neural networks)
│ ├── UNet / Transformer (noise prediction)
│ ├── VAE (latent encoding/decoding)
│ └── Text Encoder (CLIP/T5)
└── Scheduler (denoising algorithm)
Pipeline inference flow
Text Prompt → Text Encoder → Text Embeddings
↓
Random Noise → [Denoising Loop] ← Scheduler
↓
Predicted Noise
↓
VAE Decoder → Final Image
Core concepts
Pipelines
Pipelines orchestrate complete workflows:
| Pipeline | Purpose |
|---|---|
StableDiffusionPipeline |
Text-to-image (SD 1.x/2.x) |
StableDiffusionXLPipeline |
Text-to-image (SDXL) |
StableDiffusion3Pipeline |
Text-to-image (SD 3.0) |
FluxPipeline |
Text-to-image (Flux models) |
StableDiffusionImg2ImgPipeline |
Image-to-image |
StableDiffusionInpaintPipeline |
Inpainting |
Schedulers
Schedulers control the denoising process:
| Scheduler | Steps | Quality | Use Case |
|---|---|---|---|
EulerDiscreteScheduler |
20-50 | Good | Default choice |
EulerAncestralDiscreteScheduler |
20-50 | Good | More variation |
DPMSolverMultistepScheduler |
15-25 | Excellent | Fast, high quality |
DDIMScheduler |
50-100 | Good | Deterministic |
LCMScheduler |
4-8 | Good | Very fast |
UniPCMultistepScheduler |
15-25 | Excellent | Fast convergence |
Swapping schedulers
from diffusers import DPMSolverMultistepScheduler
# Swap for faster generation
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
# Now generate with fewer steps
image = pipe(prompt, num_inference_steps=20).images[0]
Generation parameters
Key parameters
| Parameter | Default | Description |
|---|---|---|
prompt |
Required | Text description of desired image |
negative_prompt |
None | What to avoid in the image |
num_inference_steps |
50 | Denoising steps (more = better quality) |
guidance_scale |
7.5 | Prompt adherence (7-12 typical) |
height, width |
512/1024 | Output dimensions (multiples of 8) |
generator |
None | Torch generator for reproducibility |
num_images_per_prompt |
1 | Batch size |
Reproducible generation
import torch
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
prompt="A cat wearing a top hat",
generator=generator,
num_inference_steps=50
).images[0]
Negative prompts
image = pipe(
prompt="Professional photo of a dog in a garden",
negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
guidance_scale=7.5
).images[0]
Image-to-image
Transform existing images with text guidance:
from diffusers import AutoPipelineForImage2Image
from PIL import Image
pipe = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
init_image = Image.open("input.jpg").resize((512, 512))
image = pipe(
prompt="A watercolor painting of the scene",
image=init_image,
strength=0.75, # How much to transform (0-1)
num_inference_steps=50
).images[0]
Inpainting
Fill masked regions:
from diffusers import AutoPipelineForInpainting
from PIL import Image
pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")
image = Image.open("photo.jpg")
mask = Image.open("mask.png") # White = inpaint region
result = pipe(
prompt="A red car parked on the street",
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]
ControlNet
Add spatial conditioning for precise control:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
# Load ControlNet for edge conditioning
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_canny",
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
# Use Canny edge image as control
control_image = get_canny_image(input_image)
image = pipe(
prompt="A beautiful house in the style of Van Gogh",
image=control_image,
num_inference_steps=30
).images[0]
Available ControlNets
| ControlNet | Input Type | Use Case |
|---|---|---|
canny |
Edge maps | Preserve structure |
openpose |
Pose skeletons | Human poses |
depth |
Depth maps | 3D-aware generation |
normal |
Normal maps | Surface details |
mlsd |
Line segments | Architectural lines |
scribble |
Rough sketches | Sketch-to-image |
LoRA adapters
Load fine-tuned style adapters:
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
# Load LoRA weights
pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
# Generate with LoRA style
image = pipe("A portrait in the trained style").images[0]
# Adjust LoRA strength
pipe.fuse_lora(lora_scale=0.8)
# Unload LoRA
pipe.unload_lora_weights()
Multiple LoRAs
# Load multiple LoRAs
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="character")
# Set weights for each
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe("A portrait").images[0]
Memory optimization
Enable CPU offloading
# Model CPU offload - moves models to CPU when not in use
pipe.enable_model_cpu_offload()
# Sequential CPU offload - more aggressive, slower
pipe.enable_sequential_cpu_offload()
Attention slicing
# Reduce memory by computing attention in chunks
pipe.enable_attention_slicing()
# Or specific chunk size
pipe.enable_attention_slicing("max")
xFormers memory-efficient attention
# Requires xformers package
pipe.enable_xformers_memory_efficient_attention()
VAE slicing for large images
# Decode latents in tiles for large images
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
Model variants
Loading different precisions
# FP16 (recommended for GPU)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.float16,
variant="fp16"
)
# BF16 (better precision, requires Ampere+ GPU)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.bfloat16
)
Loading specific components
from diffusers import UNet2DConditionModel, AutoencoderKL
# Load custom VAE
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
# Use with pipeline
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
vae=vae,
torch_dtype=torch.float16
)
Batch generation
Generate multiple images efficiently:
# Multiple prompts
prompts = [
"A cat playing piano",
"A dog reading a book",
"A bird painting a picture"
]
images = pipe(prompts, num_inference_steps=30).images
# Multiple images per prompt
images = pipe(
"A beautiful sunset",
num_images_per_prompt=4,
num_inference_steps=30
).images
Common workflows
Workflow 1: High-quality generation
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch
# 1. Load SDXL with optimizations
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
# 2. Generate with quality settings
image = pipe(
prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur",
negative_prompt="blurry, low quality, cartoon, anime, sketch",
num_inference_steps=30,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
Workflow 2: Fast prototyping
from diffusers import AutoPipelineForText2Image, LCMScheduler
import torch
# Use LCM for 4-8 step generation
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
# Load LCM LoRA for fast generation
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.fuse_lora()
# Generate in ~1 second
image = pipe(
"A beautiful landscape",
num_inference_steps=4,
guidance_scale=1.0
).images[0]
Common issues
CUDA out of memory:
# Enable memory optimizations
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
# Or use lower precision
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
Black/noise images:
# Check VAE configuration
# Use safety checker bypass if needed
pipe.safety_checker = None
# Ensure proper dtype consistency
pipe = pipe.to(dtype=torch.float16)
Slow generation:
# Use faster scheduler
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
# Reduce steps
image = pipe(prompt, num_inference_steps=20).images[0]
References
- Advanced Usage - Custom pipelines, fine-tuning, deployment
- Troubleshooting - Common issues and solutions
Resources
- Documentation: https://huggingface.co/docs/diffusers
- Repository: https://github.com/huggingface/diffusers
- Model Hub: https://huggingface.co/models?library=diffusers
- Discord: https://discord.gg/diffusers
同梱ファイル
※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。
- 📄 SKILL.md (12,989 bytes)
- 📎 references/advanced-usage.md (17,690 bytes)
- 📎 references/troubleshooting.md (12,401 bytes)