🛠️ Scikit Learn
Pythonを使って、データの分類や予測、グループ分けといった機械
📺 まず動画で見る(YouTube)
▶ 【衝撃】最強のAIエージェント「Claude Code」の最新機能・使い方・プログラミングをAIで効率化する超実践術を解説! ↗
※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。
📜 元の英語説明(参考)
Machine learning in Python with scikit-learn. Use for classification, regression, clustering, model evaluation, and ML pipelines.
🇯🇵 日本人クリエイター向け解説
Pythonを使って、データの分類や予測、グループ分けといった機械
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o scikit-learn.zip https://jpskill.com/download/3415.zip && unzip -o scikit-learn.zip && rm scikit-learn.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/3415.zip -OutFile "$d\scikit-learn.zip"; Expand-Archive "$d\scikit-learn.zip" -DestinationPath $d -Force; ri "$d\scikit-learn.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
scikit-learn.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
scikit-learnフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 1
💬 こう話しかけるだけ — サンプルプロンプト
- › Scikit Learn を使って、最小構成のサンプルコードを示して
- › Scikit Learn の主な使い方と注意点を教えて
- › Scikit Learn を既存プロジェクトに組み込む方法を教えて
これをClaude Code に貼るだけで、このSkillが自動発動します。
📖 Skill本文(日本語訳)
※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。
Scikit-learn
概要
このスキルは、古典的な機械学習のための業界標準Pythonライブラリであるscikit-learnを使用した機械学習タスクに関する包括的なガイダンスを提供します。分類、回帰、クラスタリング、次元削減、前処理、モデル評価、および本番環境対応のMLパイプライン構築にこのスキルをご利用ください。
インストール
# uv を使用して scikit-learn をインストール
uv uv pip install scikit-learn
# オプション: 可視化の依存関係をインストール
uv uv pip install matplotlib seaborn
# 一般的に使用されるライブラリ
uv uv pip install pandas numpy
このスキルを使用する場面
以下の状況でscikit-learnスキルを使用してください。
- 分類モデルまたは回帰モデルを構築する場合
- クラスタリングまたは次元削減を実行する場合
- 機械学習のためにデータを前処理および変換する場合
- 交差検定でモデルの性能を評価する場合
- グリッドサーチまたはランダムサーチでハイパーパラメータをチューニングする場合
- 本番ワークフロー用のMLパイプラインを作成する場合
- タスクに対して異なるアルゴリズムを比較する場合
- 構造化データ(表形式)とテキストデータの両方を扱う場合
- 解釈可能な古典的な機械学習アプローチが必要な場合
クイックスタート
分類例
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# Preprocess
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)
# Evaluate
y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))
混合データによる完全なパイプライン
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
# Define feature types
numeric_features = ['age', 'income']
categorical_features = ['gender', 'occupation']
# Create preprocessing pipelines
numeric_transformer = Pipeline([
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])
categorical_transformer = Pipeline([
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
])
# Combine transformers
preprocessor = ColumnTransformer([
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)
])
# Full pipeline
model = Pipeline([
('preprocessor', preprocessor),
('classifier', GradientBoostingClassifier(random_state=42))
])
# Fit and predict
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
主要な機能
1. 教師あり学習
分類および回帰タスクのための包括的なアルゴリズム。
主要なアルゴリズム:
- 線形モデル: Logistic Regression, Linear Regression, Ridge, Lasso, ElasticNet
- ツリーベース: Decision Trees, Random Forest, Gradient Boosting
- サポートベクターマシン: SVC, SVR (様々なカーネル)
- アンサンブルメソッド: AdaBoost, Voting, Stacking
- ニューラルネットワーク: MLPClassifier, MLPRegressor
- その他: Naive Bayes, K-Nearest Neighbors
使用する場面:
- 分類: 離散的なカテゴリの予測(スパム検出、画像分類、不正検出)
- 回帰: 連続値の予測(価格予測、需要予測)
参照: 詳細なアルゴリズムのドキュメント、パラメータ、使用例については references/supervised_learning.md をご覧ください。
2. 教師なし学習
クラスタリングと次元削減を通じて、ラベルなしデータ内のパターンを発見します。
クラスタリングアルゴリズム:
- パーティションベース: K-Means, MiniBatchKMeans
- 密度ベース: DBSCAN, HDBSCAN, OPTICS
- 階層的: AgglomerativeClustering
- 確率的: Gaussian Mixture Models
- その他: MeanShift, SpectralClustering, BIRCH
次元削減:
- 線形: PCA, TruncatedSVD, NMF
- 多様体学習: t-SNE, UMAP, Isomap, LLE
- 特徴抽出: FastICA, LatentDirichletAllocation
使用する場面:
- 顧客セグメンテーション、異常検出、データ可視化
- 特徴量の次元削減、探索的データ分析
- トピックモデリング、画像圧縮
参照: 詳細なドキュメントについては references/unsupervised_learning.md をご覧ください。
3. モデル評価と選択
堅牢なモデル評価、交差検定、ハイパーパラメータチューニングのためのツール。
交差検定戦略:
- KFold, StratifiedKFold (分類)
- TimeSeriesSplit (時系列データ)
- GroupKFold (グループ化されたサンプル)
ハイパーパラメータチューニング:
- GridSearchCV (網羅的探索)
- RandomizedSearchCV (ランダムサンプリング)
- HalvingGridSearchCV (逐次半減)
メトリクス:
- 分類: accuracy, precision, recall, F1-score, ROC AUC, confusion matrix
- 回帰: MSE, RMSE, MAE, R², MAPE
- クラスタリング: silhouette score, Calinski-Harabasz, Davies-Bouldin
使用する場面:
- モデルの性能を客観的に比較する場合
- 最適なハイパーパラメータを見つける場合
- 交差検定による過学習の防止
- 学習曲線によるモデルの挙動の理解
参照: 包括的なメトリクスとチューニング戦略については references/model_evaluation.md をご覧ください。
4. データ前処理
生データを機械学習に適した形式に変換します。
スケーリングと正規化:
- StandardScaler (平均0、分散1)
- MinMaxScaler (範囲を制限)
- RobustScaler (外れ値に頑健)
- Normalizer (サンプルごとの正規化)
カテゴリ変数のエンコーディング:
- OneHotEncoder (名義カテゴリ)
- OrdinalEncoder (順序カテゴリ)
- LabelEncoder (ターゲットエンコーディング)
欠損値の処理:
- SimpleImputer (平均、中央値、最頻値)
- KNNImputer (k-nearest
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開
Scikit-learn
Overview
This skill provides comprehensive guidance for machine learning tasks using scikit-learn, the industry-standard Python library for classical machine learning. Use this skill for classification, regression, clustering, dimensionality reduction, preprocessing, model evaluation, and building production-ready ML pipelines.
Installation
# Install scikit-learn using uv
uv uv pip install scikit-learn
# Optional: Install visualization dependencies
uv uv pip install matplotlib seaborn
# Commonly used with
uv uv pip install pandas numpy
When to Use This Skill
Use the scikit-learn skill when:
- Building classification or regression models
- Performing clustering or dimensionality reduction
- Preprocessing and transforming data for machine learning
- Evaluating model performance with cross-validation
- Tuning hyperparameters with grid or random search
- Creating ML pipelines for production workflows
- Comparing different algorithms for a task
- Working with both structured (tabular) and text data
- Need interpretable, classical machine learning approaches
Quick Start
Classification Example
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# Preprocess
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)
# Evaluate
y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))
Complete Pipeline with Mixed Data
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
# Define feature types
numeric_features = ['age', 'income']
categorical_features = ['gender', 'occupation']
# Create preprocessing pipelines
numeric_transformer = Pipeline([
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])
categorical_transformer = Pipeline([
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
])
# Combine transformers
preprocessor = ColumnTransformer([
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)
])
# Full pipeline
model = Pipeline([
('preprocessor', preprocessor),
('classifier', GradientBoostingClassifier(random_state=42))
])
# Fit and predict
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Core Capabilities
1. Supervised Learning
Comprehensive algorithms for classification and regression tasks.
Key algorithms:
- Linear models: Logistic Regression, Linear Regression, Ridge, Lasso, ElasticNet
- Tree-based: Decision Trees, Random Forest, Gradient Boosting
- Support Vector Machines: SVC, SVR with various kernels
- Ensemble methods: AdaBoost, Voting, Stacking
- Neural Networks: MLPClassifier, MLPRegressor
- Others: Naive Bayes, K-Nearest Neighbors
When to use:
- Classification: Predicting discrete categories (spam detection, image classification, fraud detection)
- Regression: Predicting continuous values (price prediction, demand forecasting)
See: references/supervised_learning.md for detailed algorithm documentation, parameters, and usage examples.
2. Unsupervised Learning
Discover patterns in unlabeled data through clustering and dimensionality reduction.
Clustering algorithms:
- Partition-based: K-Means, MiniBatchKMeans
- Density-based: DBSCAN, HDBSCAN, OPTICS
- Hierarchical: AgglomerativeClustering
- Probabilistic: Gaussian Mixture Models
- Others: MeanShift, SpectralClustering, BIRCH
Dimensionality reduction:
- Linear: PCA, TruncatedSVD, NMF
- Manifold learning: t-SNE, UMAP, Isomap, LLE
- Feature extraction: FastICA, LatentDirichletAllocation
When to use:
- Customer segmentation, anomaly detection, data visualization
- Reducing feature dimensions, exploratory data analysis
- Topic modeling, image compression
See: references/unsupervised_learning.md for detailed documentation.
3. Model Evaluation and Selection
Tools for robust model evaluation, cross-validation, and hyperparameter tuning.
Cross-validation strategies:
- KFold, StratifiedKFold (classification)
- TimeSeriesSplit (temporal data)
- GroupKFold (grouped samples)
Hyperparameter tuning:
- GridSearchCV (exhaustive search)
- RandomizedSearchCV (random sampling)
- HalvingGridSearchCV (successive halving)
Metrics:
- Classification: accuracy, precision, recall, F1-score, ROC AUC, confusion matrix
- Regression: MSE, RMSE, MAE, R², MAPE
- Clustering: silhouette score, Calinski-Harabasz, Davies-Bouldin
When to use:
- Comparing model performance objectively
- Finding optimal hyperparameters
- Preventing overfitting through cross-validation
- Understanding model behavior with learning curves
See: references/model_evaluation.md for comprehensive metrics and tuning strategies.
4. Data Preprocessing
Transform raw data into formats suitable for machine learning.
Scaling and normalization:
- StandardScaler (zero mean, unit variance)
- MinMaxScaler (bounded range)
- RobustScaler (robust to outliers)
- Normalizer (sample-wise normalization)
Encoding categorical variables:
- OneHotEncoder (nominal categories)
- OrdinalEncoder (ordered categories)
- LabelEncoder (target encoding)
Handling missing values:
- SimpleImputer (mean, median, most frequent)
- KNNImputer (k-nearest neighbors)
- IterativeImputer (multivariate imputation)
Feature engineering:
- PolynomialFeatures (interaction terms)
- KBinsDiscretizer (binning)
- Feature selection (RFE, SelectKBest, SelectFromModel)
When to use:
- Before training any algorithm that requires scaled features (SVM, KNN, Neural Networks)
- Converting categorical variables to numeric format
- Handling missing data systematically
- Creating non-linear features for linear models
See: references/preprocessing.md for detailed preprocessing techniques.
5. Pipelines and Composition
Build reproducible, production-ready ML workflows.
Key components:
- Pipeline: Chain transformers and estimators sequentially
- ColumnTransformer: Apply different preprocessing to different columns
- FeatureUnion: Combine multiple transformers in parallel
- TransformedTargetRegressor: Transform target variable
Benefits:
- Prevents data leakage in cross-validation
- Simplifies code and improves maintainability
- Enables joint hyperparameter tuning
- Ensures consistency between training and prediction
When to use:
- Always use Pipelines for production workflows
- When mixing numerical and categorical features (use ColumnTransformer)
- When performing cross-validation with preprocessing steps
- When hyperparameter tuning includes preprocessing parameters
See: references/pipelines_and_composition.md for comprehensive pipeline patterns.
Example Scripts
Classification Pipeline
Run a complete classification workflow with preprocessing, model comparison, hyperparameter tuning, and evaluation:
python scripts/classification_pipeline.py
This script demonstrates:
- Handling mixed data types (numeric and categorical)
- Model comparison using cross-validation
- Hyperparameter tuning with GridSearchCV
- Comprehensive evaluation with multiple metrics
- Feature importance analysis
Clustering Analysis
Perform clustering analysis with algorithm comparison and visualization:
python scripts/clustering_analysis.py
This script demonstrates:
- Finding optimal number of clusters (elbow method, silhouette analysis)
- Comparing multiple clustering algorithms (K-Means, DBSCAN, Agglomerative, Gaussian Mixture)
- Evaluating clustering quality without ground truth
- Visualizing results with PCA projection
Reference Documentation
This skill includes comprehensive reference files for deep dives into specific topics:
Quick Reference
File: references/quick_reference.md
- Common import patterns and installation instructions
- Quick workflow templates for common tasks
- Algorithm selection cheat sheets
- Common patterns and gotchas
- Performance optimization tips
Supervised Learning
File: references/supervised_learning.md
- Linear models (regression and classification)
- Support Vector Machines
- Decision Trees and ensemble methods
- K-Nearest Neighbors, Naive Bayes, Neural Networks
- Algorithm selection guide
Unsupervised Learning
File: references/unsupervised_learning.md
- All clustering algorithms with parameters and use cases
- Dimensionality reduction techniques
- Outlier and novelty detection
- Gaussian Mixture Models
- Method selection guide
Model Evaluation
File: references/model_evaluation.md
- Cross-validation strategies
- Hyperparameter tuning methods
- Classification, regression, and clustering metrics
- Learning and validation curves
- Best practices for model selection
Preprocessing
File: references/preprocessing.md
- Feature scaling and normalization
- Encoding categorical variables
- Missing value imputation
- Feature engineering techniques
- Custom transformers
Pipelines and Composition
File: references/pipelines_and_composition.md
- Pipeline construction and usage
- ColumnTransformer for mixed data types
- FeatureUnion for parallel transformations
- Complete end-to-end examples
- Best practices
Common Workflows
Building a Classification Model
-
Load and explore data
import pandas as pd df = pd.read_csv('data.csv') X = df.drop('target', axis=1) y = df['target'] -
Split data with stratification
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=42 ) -
Create preprocessing pipeline
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.compose import ColumnTransformer # Handle numeric and categorical features separately preprocessor = ColumnTransformer([ ('num', StandardScaler(), numeric_features), ('cat', OneHotEncoder(), categorical_features) ]) -
Build complete pipeline
model = Pipeline([ ('preprocessor', preprocessor), ('classifier', RandomForestClassifier(random_state=42)) ]) -
Tune hyperparameters
from sklearn.model_selection import GridSearchCV param_grid = { 'classifier__n_estimators': [100, 200], 'classifier__max_depth': [10, 20, None] } grid_search = GridSearchCV(model, param_grid, cv=5) grid_search.fit(X_train, y_train) -
Evaluate on test set
from sklearn.metrics import classification_report best_model = grid_search.best_estimator_ y_pred = best_model.predict(X_test) print(classification_report(y_test, y_pred))
Performing Clustering Analysis
-
Preprocess data
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X) -
Find optimal number of clusters
from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score scores = [] for k in range(2, 11): kmeans = KMeans(n_clusters=k, random_state=42) labels = kmeans.fit_predict(X_scaled) scores.append(silhouette_score(X_scaled, labels)) optimal_k = range(2, 11)[np.argmax(scores)] -
Apply clustering
model = KMeans(n_clusters=optimal_k, random_state=42) labels = model.fit_predict(X_scaled) -
Visualize with dimensionality reduction
from sklearn.decomposition import PCA pca = PCA(n_components=2) X_2d = pca.fit_transform(X_scaled) plt.scatter(X_2d[:, 0], X_2d[:, 1], c=labels, cmap='viridis')
Best Practices
Always Use Pipelines
Pipelines prevent data leakage and ensure consistency:
# Good: Preprocessing in pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('model', LogisticRegression())
])
# Bad: Preprocessing outside (can leak information)
X_scaled = StandardScaler().fit_transform(X)
Fit on Training Data Only
Never fit on test data:
# Good
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # Only transform
# Bad
scaler = StandardScaler()
X_all_scaled = scaler.fit_transform(np.vstack([X_train, X_test]))
Use Stratified Splitting for Classification
Preserve class distribution:
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
Set Random State for Reproducibility
model = RandomForestClassifier(n_estimators=100, random_state=42)
Choose Appropriate Metrics
- Balanced data: Accuracy, F1-score
- Imbalanced data: Precision, Recall, ROC AUC, Balanced Accuracy
- Cost-sensitive: Define custom scorer
Scale Features When Required
Algorithms requiring feature scaling:
- SVM, KNN, Neural Networks
- PCA, Linear/Logistic Regression with regularization
- K-Means clustering
Algorithms not requiring scaling:
- Tree-based models (Decision Trees, Random Forest, Gradient Boosting)
- Naive Bayes
Troubleshooting Common Issues
ConvergenceWarning
Issue: Model didn't converge
Solution: Increase max_iter or scale features
model = LogisticRegression(max_iter=1000)
Poor Performance on Test Set
Issue: Overfitting Solution: Use regularization, cross-validation, or simpler model
# Add regularization
model = Ridge(alpha=1.0)
# Use cross-validation
scores = cross_val_score(model, X, y, cv=5)
Memory Error with Large Datasets
Solution: Use algorithms designed for large data
# Use SGD for large datasets
from sklearn.linear_model import SGDClassifier
model = SGDClassifier()
# Or MiniBatchKMeans for clustering
from sklearn.cluster import MiniBatchKMeans
model = MiniBatchKMeans(n_clusters=8, batch_size=100)
Additional Resources
- Official Documentation: https://scikit-learn.org/stable/
- User Guide: https://scikit-learn.org/stable/user_guide.html
- API Reference: https://scikit-learn.org/stable/api/index.html
- Examples Gallery: https://scikit-learn.org/stable/auto_examples/index.html
Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.