agentdb-reinforcement-learning-training
Train AI agents using AgentDB's 9 reinforcement learning algorithms including Q-Learning, DQN, PPO, and Actor-Critic. Build self-learning agents, implement RL training loops with experience replay, and deploy optimized models to production.
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o agentdb-reinforcement-learning-training.zip https://jpskill.com/download/18732.zip && unzip -o agentdb-reinforcement-learning-training.zip && rm agentdb-reinforcement-learning-training.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/18732.zip -OutFile "$d\agentdb-reinforcement-learning-training.zip"; Expand-Archive "$d\agentdb-reinforcement-learning-training.zip" -DestinationPath $d -Force; ri "$d\agentdb-reinforcement-learning-training.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
agentdb-reinforcement-learning-training.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
agentdb-reinforcement-learning-trainingフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 2
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
AgentDB Reinforcement Learning Training
Overview
Train AI learning plugins with AgentDB's 9 reinforcement learning algorithms including Decision Transformer, Q-Learning, SARSA, Actor-Critic, PPO, and more. Build self-learning agents, implement RL, and optimize agent behavior through experience.
When to Use This Skill
Use this skill when you need to:
- Train autonomous agents that learn from experience
- Implement reinforcement learning systems
- Optimize agent behavior through trial and error
- Build self-improving AI systems
- Deploy RL agents in production environments
- Benchmark and compare RL algorithms
Available RL Algorithms
- Q-Learning - Value-based, off-policy
- SARSA - Value-based, on-policy
- Deep Q-Network (DQN) - Deep RL with experience replay
- Actor-Critic - Policy gradient with value baseline
- Proximal Policy Optimization (PPO) - Trust region policy optimization
- Decision Transformer - Offline RL with transformers
- Advantage Actor-Critic (A2C) - Synchronous advantage estimation
- Twin Delayed DDPG (TD3) - Continuous control
- Soft Actor-Critic (SAC) - Maximum entropy RL
SOP Framework: 5-Phase RL Training Deployment
Phase 1: Initialize Learning Environment (1-2 hours)
Objective: Setup AgentDB learning infrastructure with environment configuration
Agent: ml-developer
Steps:
-
Install AgentDB Learning Module
npm install agentdb-learning@latest npm install @agentdb/rl-algorithms @agentdb/environments -
Initialize learning database
import { AgentDB, LearningPlugin } from 'agentdb-learning';
const learningDB = new AgentDB({ name: 'rl-training-db', dimensions: 512, // State embedding dimension learning: { enabled: true, persistExperience: true, replayBufferSize: 100000 } });
await learningDB.initialize();
// Create learning plugin const learningPlugin = new LearningPlugin({ database: learningDB, algorithms: ['q-learning', 'dqn', 'ppo', 'actor-critic'], config: { batchSize: 64, learningRate: 0.001, discountFactor: 0.99, explorationRate: 1.0, explorationDecay: 0.995 } });
await learningPlugin.initialize();
3. **Define environment**
```typescript
import { Environment } from '@agentdb/environments';
const environment = new Environment({
name: 'grid-world',
stateSpace: {
type: 'continuous',
shape: [10, 10],
bounds: [[0, 10], [0, 10]]
},
actionSpace: {
type: 'discrete',
actions: ['up', 'down', 'left', 'right']
},
rewardFunction: (state, action, nextState) => {
// Distance to goal reward
const goalDistance = Math.sqrt(
Math.pow(nextState[0] - 9, 2) +
Math.pow(nextState[1] - 9, 2)
);
return -goalDistance + (goalDistance === 0 ? 100 : 0);
},
terminalCondition: (state) => {
return state[0] === 9 && state[1] === 9; // Reached goal
}
});
await environment.initialize();
- Setup monitoring
const monitor = learningPlugin.createMonitor({ metrics: ['reward', 'loss', 'exploration-rate', 'episode-length'], logInterval: 100, // Log every 100 episodes saveCheckpoints: true, checkpointInterval: 1000 });
monitor.on('episode-complete', (episode) => { console.log('Episode:', episode.number, 'Reward:', episode.totalReward); });
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/environment', {
name: environment.name,
stateSpace: environment.stateSpace,
actionSpace: environment.actionSpace,
initialized: Date.now()
});
Validation:
- Learning database initialized
- Environment configured and tested
- Monitor capturing metrics
- Configuration stored in memory
Phase 2: Configure RL Algorithm (1-2 hours)
Objective: Select and configure RL algorithm for the learning task
Agent: ml-developer
Steps:
- Select algorithm
// Example: Deep Q-Network (DQN) const dqnAgent = learningPlugin.createAgent({ algorithm: 'dqn', config: { networkArchitecture: { layers: [ { type: 'dense', units: 128, activation: 'relu' }, { type: 'dense', units: 128, activation: 'relu' }, { type: 'dense', units: environment.actionSpace.size, activation: 'linear' } ] }, learningRate: 0.001, batchSize: 64, replayBuffer: { size: 100000, prioritized: true, alpha: 0.6, beta: 0.4 }, targetNetwork: { updateFrequency: 1000, tauSync: 0.001 // Soft update }, exploration: { initial: 1.0, final: 0.01, decay: 0.995 }, training: { startAfter: 1000, // Start training after 1000 experiences updateFrequency: 4 } } });
await dqnAgent.initialize();
2. **Configure hyperparameters**
```typescript
const hyperparameters = {
// Learning parameters
learningRate: 0.001,
discountFactor: 0.99, // Gamma
batchSize: 64,
// Exploration
epsilonStart: 1.0,
epsilonEnd: 0.01,
epsilonDecay: 0.995,
// Experience replay
replayBufferSize: 100000,
minReplaySize: 1000,
prioritizedReplay: true,
// Training
maxEpisodes: 10000,
maxStepsPerEpisode: 1000,
targetUpdateFrequency: 1000,
// Evaluation
evalFrequency: 100,
evalEpisodes: 10
};
dqnAgent.setHyperparameters(hyperparameters);
- Setup experience replay
import { PrioritizedReplayBuffer } from '@agentdb/rl-algorithms';
const replayBuffer = new PrioritizedReplayBuffer({ capacity: 100000, alpha: 0.6, // Prioritization exponent beta: 0.4, // Importance sampling betaIncrement: 0.001, epsilon: 0.01 // Small constant for stability });
dqnAgent.setReplayBuffer(replayBuffer);
4. **Configure training loop**
```typescript
const trainingConfig = {
episodes: 10000,
stepsPerEpisode: 1000,
warmupSteps: 1000,
trainFrequency: 4,
targetUpdateFrequency: 1000,
saveFrequency: 1000,
evalFrequency: 100,
earlyStoppingPatience: 500,
earlyStoppingThreshold: 0.01
};
dqnAgent.setTrainingConfig(trainingConfig);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/algorithm-config', {
algorithm: 'dqn',
hyperparameters: hyperparameters,
trainingConfig: trainingConfig,
configured: Date.now()
});
Validation:
- Algorithm selected and configured
- Hyperparameters validated
- Replay buffer initialized
- Training config set
Phase 3: Train Agents (3-4 hours)
Objective: Execute training iterations and optimize agent behavior
Agent: safla-neural
Steps:
- Start training loop
async function trainAgent() { console.log('Starting RL training...');
const trainingStats = { episodes: [], totalReward: [], episodeLength: [], loss: [], explorationRate: [] };
for (let episode = 0; episode < trainingConfig.episodes; episode++) { let state = await environment.reset(); let episodeReward = 0; let episodeLength = 0; let episodeLoss = 0;
for (let step = 0; step < trainingConfig.stepsPerEpisode; step++) {
// Select action
const action = await dqnAgent.selectAction(state, {
explore: true
});
// Execute action
const { nextState, reward, done } = await environment.step(action);
// Store experience
await dqnAgent.storeExperience({
state,
action,
reward,
nextState,
done
});
// Train if enough experiences
if (dqnAgent.canTrain()) {
const loss = await dqnAgent.train();
episodeLoss += loss;
}
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) break;
}
// Update target network
if (episode % trainingConfig.targetUpdateFrequency === 0) {
await dqnAgent.updateTargetNetwork();
}
// Decay exploration
dqnAgent.decayExploration();
// Log progress
trainingStats.episodes.push(episode);
trainingStats.totalReward.push(episodeReward);
trainingStats.episodeLength.push(episodeLength);
trainingStats.loss.push(episodeLoss / episodeLength);
trainingStats.explorationRate.push(dqnAgent.getExplorationRate());
if (episode % 100 === 0) {
console.log(`Episode ${episode}:`, {
reward: episodeReward.toFixed(2),
length: episodeLength,
loss: (episodeLoss / episodeLength).toFixed(4),
epsilon: dqnAgent.getExplorationRate().toFixed(3)
});
}
// Save checkpoint
if (episode % trainingConfig.saveFrequency === 0) {
await dqnAgent.save(`checkpoint-${episode}`);
}
// Evaluate
if (episode % trainingConfig.evalFrequency === 0) {
const evalReward = await evaluateAgent(dqnAgent, environment);
console.log(`Evaluation at episode ${episode}: ${evalReward.toFixed(2)}`);
}
// Early stopping
if (checkEarlyStopping(trainingStats, episode)) {
console.log('Early stopping triggered');
break;
}
}
return trainingStats; }
const trainingStats = await trainAgent();
2. **Monitor training progress**
```typescript
monitor.on('training-update', (stats) => {
// Calculate moving averages
const window = 100;
const recentRewards = stats.totalReward.slice(-window);
const avgReward = recentRewards.reduce((a, b) => a + b, 0) / recentRewards.length;
// Store metrics
agentDB.memory.store('agentdb/learning/training-progress', {
episode: stats.episodes[stats.episodes.length - 1],
avgReward: avgReward,
explorationRate: stats.explorationRate[stats.explorationRate.length - 1],
timestamp: Date.now()
});
// Plot learning curve (if visualization enabled)
if (monitor.visualization) {
monitor.plot('reward-curve', stats.episodes, stats.totalReward);
monitor.plot('loss-curve', stats.episodes, stats.loss);
}
});
- Handle convergence
function checkConvergence(stats, windowSize = 100, threshold = 0.01) { if (stats.totalReward.length < windowSize * 2) { return false; }
const recent = stats.totalReward.slice(-windowSize); const previous = stats.totalReward.slice(-windowSize * 2, -windowSize);
const recentAvg = recent.reduce((a, b) => a + b, 0) / recent.length; const previousAvg = previous.reduce((a, b) => a + b, 0) / previous.length;
const improvement = (recentAvg - previousAvg) / Math.abs(previousAvg);
return improvement < threshold; }
4. **Save trained model**
```typescript
await dqnAgent.save('trained-agent-final', {
includeReplayBuffer: false,
includeOptimizer: false,
metadata: {
trainingStats: trainingStats,
hyperparameters: hyperparameters,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1]
}
});
console.log('Training complete. Model saved.');
Memory Pattern:
await agentDB.memory.store('agentdb/learning/training-results', {
algorithm: 'dqn',
episodes: trainingStats.episodes.length,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1],
converged: checkConvergence(trainingStats),
modelPath: 'trained-agent-final',
timestamp: Date.now()
});
Validation:
- Training completed or converged
- Reward curve shows improvement
- Model saved successfully
- Training stats stored
Phase 4: Validate Performance (1-2 hours)
Objective: Benchmark trained agent and validate performance
Agent: performance-benchmarker
Steps:
-
Load trained agent
const trainedAgent = await learningPlugin.loadAgent('trained-agent-final'); -
Run evaluation episodes
async function evaluateAgent(agent, env, numEpisodes = 100) { const results = { rewards: [], episodeLengths: [], successRate: 0 };
for (let i = 0; i < numEpisodes; i++) { let state = await env.reset(); let episodeReward = 0; let episodeLength = 0; let success = false;
for (let step = 0; step < 1000; step++) {
const action = await agent.selectAction(state, { explore: false });
const { nextState, reward, done } = await env.step(action);
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) {
success = env.isSuccessful(state);
break;
}
}
results.rewards.push(episodeReward);
results.episodeLengths.push(episodeLength);
if (success) results.successRate += 1;
}
results.successRate /= numEpisodes;
return { meanReward: results.rewards.reduce((a, b) => a + b, 0) / results.rewards.length, stdReward: calculateStd(results.rewards), meanLength: results.episodeLengths.reduce((a, b) => a + b, 0) / results.episodeLengths.length, successRate: results.successRate, results: results }; }
const evalResults = await evaluateAgent(trainedAgent, environment, 100); console.log('Evaluation results:', evalResults);
3. **Compare with baseline**
```typescript
// Random policy baseline
const randomAgent = learningPlugin.createAgent({ algorithm: 'random' });
const randomResults = await evaluateAgent(randomAgent, environment, 100);
// Calculate improvement
const improvement = {
rewardImprovement: (evalResults.meanReward - randomResults.meanReward) / Math.abs(randomResults.meanReward),
lengthImprovement: (randomResults.meanLength - evalResults.meanLength) / randomResults.meanLength,
successImprovement: evalResults.successRate - randomResults.successRate
};
console.log('Improvement over random:', improvement);
- Run comprehensive benchmarks
const benchmarks = { performanceMetrics: { meanReward: evalResults.meanReward, stdReward: evalResults.stdReward, successRate: evalResults.successRate, meanEpisodeLength: evalResults.meanLength }, algorithmComparison: { dqn: evalResults, random: randomResults, improvement: improvement }, inferenceTiming: { actionSelection: 0, totalEpisode: 0 } };
// Measure inference speed const timingTrials = 1000; const startTime = performance.now(); for (let i = 0; i < timingTrials; i++) { const state = await environment.randomState(); await trainedAgent.selectAction(state, { explore: false }); } const endTime = performance.now(); benchmarks.inferenceTiming.actionSelection = (endTime - startTime) / timingTrials;
await agentDB.memory.store('agentdb/learning/benchmarks', benchmarks);
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/validation', {
evaluated: true,
meanReward: evalResults.meanReward,
successRate: evalResults.successRate,
improvement: improvement,
timestamp: Date.now()
});
Validation:
- Evaluation completed (100 episodes)
- Mean reward exceeds threshold
- Success rate acceptable
- Improvement over baseline demonstrated
Phase 5: Deploy Trained Agents (1-2 hours)
Objective: Deploy trained agents to production environment
Agent: ml-developer
Steps:
-
Export production model
await trainedAgent.export('production-agent', { format: 'onnx', // or 'tensorflowjs', 'pytorch' optimize: true, quantize: 'int8', // Quantization for faster inference includeMetadata: true }); -
Create inference API
import express from 'express';
const app = express(); app.use(express.json());
// Load production agent const productionAgent = await learningPlugin.loadAgent('production-agent');
app.post('/api/predict', async (req, res) => { try { const { state } = req.body;
const action = await productionAgent.selectAction(state, {
explore: false,
returnProbabilities: true
});
res.json({
action: action.action,
probabilities: action.probabilities,
confidence: action.confidence
});
} catch (error) { res.status(500).json({ error: error.message }); } });
app.listen(3000, () => { console.log('RL agent API running on port 3000'); });
3. **Setup monitoring**
```typescript
import { ProductionMonitor } from '@agentdb/monitoring';
const prodMonitor = new ProductionMonitor({
agent: productionAgent,
metrics: ['inference-latency', 'action-distribution', 'reward-feedback'],
alerting: {
latencyThreshold: 100, // ms
anomalyDetection: true
}
});
await prodMonitor.start();
- Create deployment pipeline
const deploymentPipeline = { stages: [ { name: 'validation', steps: [ 'Load trained model', 'Run validation suite', 'Check performance metrics', 'Verify inference speed' ] }, { name: 'export', steps: [ 'Export to production format', 'Optimize model', 'Quantize weights', 'Package artifacts' ] }, { name: 'deployment', steps: [ 'Deploy to staging', 'Run smoke tests', 'Deploy to production', 'Monitor performance' ] } ] };
await agentDB.memory.store('agentdb/learning/deployment-pipeline', deploymentPipeline);
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/production', {
deployed: true,
modelPath: 'production-agent',
apiEndpoint: 'http://localhost:3000/api/predict',
monitoring: true,
timestamp: Date.now()
});
Validation:
- Model exported successfully
- API running and responding
- Monitoring active
- Deployment pipeline documented
Integration Scripts
Complete Training Script
#!/bin/bash
# train-rl-agent.sh
set -e
echo "AgentDB RL Training Script"
echo "=========================="
# Phase 1: Initialize
echo "Phase 1: Initializing learning environment..."
npm install agentdb-learning @agentdb/rl-algorithms
# Phase 2: Configure
echo "Phase 2: Configuring algorithm..."
node -e "require('./config-algorithm.js')"
# Phase 3: Train
echo "Phase 3: Training agent..."
node -e "require('./train-agent.js')"
# Phase 4: Validate
echo "Phase 4: Validating performance..."
node -e "require('./evaluate-agent.js')"
# Phase 5: Deploy
echo "Phase 5: Deploying to production..."
node -e "require('./deploy-agent.js')"
echo "Training complete!"
Quick Start Script
// quickstart-rl.ts
import { setupRLTraining } from './setup';
async function quickStart() {
console.log('Starting RL training quick setup...');
// Setup
const { learningDB, environment, agent } = await setupRLTraining({
algorithm: 'dqn',
environment: 'grid-world',
episodes: 1000
});
// Train
console.log('Training agent...');
const stats = await agent.train(environment, {
episodes: 1000,
logInterval: 100
});
// Evaluate
console.log('Evaluating agent...');
const results = await agent.evaluate(environment, {
episodes: 100
});
console.log('Results:', results);
// Save
await agent.save('quickstart-agent');
console.log('Quick start complete!');
}
quickStart().catch(console.error);
Evidence-Based Success Criteria
-
Training Convergence (Self-Consistency)
- Reward curve stabilizes
- Moving average improvement < 1%
- Agent achieves consistent performance
-
Performance Benchmarks (Quantitative)
- Mean reward exceeds baseline by 50%
- Success rate > 80%
- Inference time < 10ms per action
-
Algorithm Validation (Chain-of-Verification)
- Hyperparameters validated
- Exploration-exploitation balanced
- Experience replay functioning
-
Production Readiness (Multi-Agent Consensus)
- Model exported successfully
- API responds within latency threshold
- Monitoring active and alerting
- Deployment pipeline documented
Additional Resources
- AgentDB Learning Documentation: https://agentdb.dev/docs/learning
- RL Algorithms Guide: https://agentdb.dev/docs/rl-algorithms
- Training Best Practices: https://agentdb.dev/docs/training
- Production Deployment: https://agentdb.dev/docs/deployment
同梱ファイル
※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。
- 📄 SKILL.md (21,282 bytes)
- 📎 README.md (1,613 bytes)