🛠️ 開発・MCP コミュニティ

feature-engineering

価格、出来高、オンチェーンデータ、マイクロストラクチャといった市場データから、機械学習を用いた取引モデルに役立つ特徴量を構築し、取引戦略の精度向上や新たな洞察の発見を支援するSkill。

📜 元の英語説明(参考)

Feature construction from market data for ML trading models including price, volume, on-chain, and microstructure features

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o feature-engineering.zip https://jpskill.com/download/10410.zip && unzip -o feature-engineering.zip && rm feature-engineering.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/10410.zip -OutFile "$d\feature-engineering.zip"; Expand-Archive "$d\feature-engineering.zip" -DestinationPath $d -Force; ri "$d\feature-engineering.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して feature-engineering.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → feature-engineering フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

トレーディングMLのための特徴量エンジニアリング

特徴量エンジニアリングは、MLトレーディングモデルを構築する上で最も影響力の大きい活動です。モデル選択（XGBoost vs. ニューラルネット vs. ロジスティック回帰）は、入力特徴量の質と多様性ほど重要ではありません。優れた特徴量を持つシンプルなモデルは、生の価格データに基づく複雑なモデルよりも常に優れたパフォーマンスを発揮します。

このスキルでは、暗号通貨/Solanaトークンの取引を対象とした分類（signal-classification）および回帰モデルで使用するために、市場データから特徴量を構築、検証、および選択する方法を扱います。

なぜ特徴量がモデルに勝るのか

生のOHLCVデータは、非定常でノイズが多く、高次元です。価格系列を直接学習したモデルは過学習を起こします。特徴量エンジニアリングは、生データを定常的で有益なシグナルに変換し、市場の行動の異なる側面を捉えます。

圧縮: 何千もの価格バーを数十の記述統計量に削減します
定常性: 非定常な価格を定常的なリターンと比率に変換します
ドメイン知識: トレーダーの直感（サポート/レジスタンス、出来高のクライマックス）を計算可能な量としてエンコードします
レジーム認識: トレンド市場とレンジ市場で異なる挙動を示す特徴量は、モデルが適応するのに役立ちます

特徴量のカテゴリ

1. 価格特徴量

純粋にOHLCV価格列から派生します。これらは、価格系列自体からトレンド、モメンタム、およびボラティリティを捉えます。

Feature	Formula	Lookback
`log_return`	`ln(close_t / close_{t-1})`	1 bar
`abs_return`	`abs(log_return)`	1 bar
`return_volatility`	`std(log_return, N)`	20 bars
`momentum_N`	`close_t / close_{t-N} - 1`	5, 10, 20
`acceleration`	`momentum_5 - momentum_5[5]`	10 bars
`high_low_range`	`(high - low) / close`	1 bar
`close_position`	`(close - low) / (high - low)`	1 bar
`gap`	`open_t / close_{t-1} - 1`	1 bar
`rolling_skew`	`skew(log_return, N)`	20 bars
`rolling_kurtosis`	`kurtosis(log_return, N)`	20 bars

2. 出来高特徴量

出来高は価格の動きを裏付けたり、矛盾させたりします。価格と出来高の乖離は、短期取引において最も信頼できるシグナルの1つです。

Feature	Formula	Lookback
`volume_ratio`	`volume_t / mean(volume, N)`	20 bars
`volume_ma_ratio`	`sma(volume, 5) / sma(volume, 20)`	20 bars
`obv_slope`	`slope(OBV, N)`	10 bars
`vwap_deviation`	`(close - VWAP) / VWAP`	intraday
`volume_acceleration`	`volume_ratio_t - volume_ratio_{t-1}`	21 bars
`buy_volume_ratio`	`buy_volume / total_volume`	1 bar
`dollar_volume`	`close * volume`	1 bar
`volume_cv`	`std(volume, N) / mean(volume, N)`	20 bars

3. テクニカル特徴量

pandas-taを介して計算される標準的なテクニカル指標です。完全なパラメータドキュメントについては、pandas-taスキルを使用してください。

Feature	Source	Lookback
`rsi`	RSI(14)	14 bars
`macd_histogram`	MACD(12,26,9) histogram	33 bars
`bb_position`	`(close - BB_lower) / (BB_upper - BB_lower)`	20 bars
`bb_width`	`(BB_upper - BB_lower) / BB_mid`	20 bars
`atr_ratio`	`ATR(14) / close`	14 bars
`adx`	ADX(14)	14 bars
`stoch_k`	Stochastic %K(14,3)	14 bars
`cci`	CCI(20)	20 bars
`mfi`	MFI(14)	14 bars
`supertrend_direction`	Supertrend direction (+1/-1)	10 bars

4. ミクロ構造特徴量

取引レベルのデータ（個々のスワップ/トランザクション）から派生します。オンチェーンまたはDEX APIデータが必要です。

Feature	Description
`trade_count_ratio`	Trades this bar / avg trades per bar
`avg_trade_size`	Mean trade size in USD
`large_trade_pct`	% of volume from trades > $10k
`unique_traders`	Count of distinct wallet addresses
`buy_count_ratio`	Buy trades / total trades
`trade_size_entropy`	Shannon entropy of trade size distribution

5. オンチェーン特徴量

ブロックチェーンの状態変化から派生します。HeliusまたはSolana RPCデータが必要です。

Feature	Description
`holder_count_change`	Change in unique holders over N periods
`whale_net_flow`	Net tokens moved by top-10 holders
`token_velocity`	Transfer volume / circulating supply
`liquidity_change`	Change in DEX liquidity pool TVL

6. クロスアセット特徴量

ターゲットトークンとより広範な市場との関係を捉えます。

Feature	Description
`sol_correlation`	Rolling correlation with SOL price
`btc_beta`	Rolling beta to BTC returns
`sector_momentum`	Average return of tokens in same sector

7. 時間特徴量

カレンダー時間の周期的なエンコーディング。周期的な連続性を維持するために、sin/cosエンコーディングを使用します（23時は0時に近い）。

import numpy as np

hour_sin = np.sin(2 * np.pi * hour / 24)
hour_cos = np.cos(2 * np.pi * hour / 24)
day_of_week = np.sin(2 * np.pi * day / 7)

定常性

非定常な特徴量は、新しいデータでモデルが失敗する原因となります。 特徴量の統計的特性（平均、分散）が時間とともに変化しない場合、その特徴量は定常的です。

定常性のテスト

Augmented Dickey-Fuller（ADF）テストを使用します。

from scipy.stats import adfuller

result = adfuller(feature_series.dropna())
p_value = result[1]
is_stationary = p_value < 0.05

特徴量を定常的にする

Non-Stationary	Stationary Transform
Price	Log return
Volume	Volume ratio (vol / avg vol)
OBV	OBV slope (regression coefficient)
Holder count	Holder count change
RSI	Already stationary (bounded 0-100)
Dollar volume	Dollar volume / rolling mean

ルール: 特徴量が時間とともに上昇または下降傾向にある場合、それは非定常です。それを比率、差、または変化率に変換します。

正規化

特徴量を計算した後、すべての特徴量が同等のスケールになるように正規化します。これは、距離ベースのモデル（KNN、SVM）にとって非常に重要です。

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Feature Engineering for Trading ML

Feature engineering is the single highest-leverage activity in building ML trading models. Model selection (XGBoost vs. neural net vs. logistic regression) matters far less than the quality and diversity of input features. A simple model on great features will outperform a complex model on raw prices every time.

This skill covers constructing, validating, and selecting features from market data for use in classification (signal-classification) and regression models targeting crypto/Solana token trading.

Why Features Beat Models

Raw OHLCV data is non-stationary, noisy, and high-dimensional. Models trained directly on price series will overfit. Feature engineering transforms raw data into stationary, informative signals that capture distinct aspects of market behavior:

Compression: Reduce thousands of price bars to dozens of descriptive statistics
Stationarity: Convert non-stationary prices into stationary returns and ratios
Domain knowledge: Encode trader intuition (support/resistance, volume climax) as computable quantities
Regime awareness: Features that behave differently in trending vs. ranging markets help models adapt

Feature Categories

1. Price Features

Derived purely from OHLCV price columns. These capture trend, momentum, and volatility from the price series itself.

Feature	Formula	Lookback
`log_return`	`ln(close_t / close_{t-1})`	1 bar
`abs_return`	`abs(log_return)`	1 bar
`return_volatility`	`std(log_return, N)`	20 bars
`momentum_N`	`close_t / close_{t-N} - 1`	5, 10, 20
`acceleration`	`momentum_5 - momentum_5[5]`	10 bars
`high_low_range`	`(high - low) / close`	1 bar
`close_position`	`(close - low) / (high - low)`	1 bar
`gap`	`open_t / close_{t-1} - 1`	1 bar
`rolling_skew`	`skew(log_return, N)`	20 bars
`rolling_kurtosis`	`kurtosis(log_return, N)`	20 bars

2. Volume Features

Volume confirms or contradicts price movements. Divergences between price and volume are among the most reliable signals in short-term trading.

Feature	Formula	Lookback
`volume_ratio`	`volume_t / mean(volume, N)`	20 bars
`volume_ma_ratio`	`sma(volume, 5) / sma(volume, 20)`	20 bars
`obv_slope`	`slope(OBV, N)`	10 bars
`vwap_deviation`	`(close - VWAP) / VWAP`	intraday
`volume_acceleration`	`volume_ratio_t - volume_ratio_{t-1}`	21 bars
`buy_volume_ratio`	`buy_volume / total_volume`	1 bar
`dollar_volume`	`close * volume`	1 bar
`volume_cv`	`std(volume, N) / mean(volume, N)`	20 bars

3. Technical Features

Standard technical indicators computed via pandas-ta. Use the pandas-ta skill for full parameter documentation.

Feature	Source	Lookback
`rsi`	RSI(14)	14 bars
`macd_histogram`	MACD(12,26,9) histogram	33 bars
`bb_position`	`(close - BB_lower) / (BB_upper - BB_lower)`	20 bars
`bb_width`	`(BB_upper - BB_lower) / BB_mid`	20 bars
`atr_ratio`	`ATR(14) / close`	14 bars
`adx`	ADX(14)	14 bars
`stoch_k`	Stochastic %K(14,3)	14 bars
`cci`	CCI(20)	20 bars
`mfi`	MFI(14)	14 bars
`supertrend_direction`	Supertrend direction (+1/-1)	10 bars

4. Microstructure Features

Derived from trade-level data (individual swaps/transactions). Require on-chain or DEX API data.

Feature	Description
`trade_count_ratio`	Trades this bar / avg trades per bar
`avg_trade_size`	Mean trade size in USD
`large_trade_pct`	% of volume from trades > $10k
`unique_traders`	Count of distinct wallet addresses
`buy_count_ratio`	Buy trades / total trades
`trade_size_entropy`	Shannon entropy of trade size distribution

5. On-Chain Features

Derived from blockchain state changes. Require Helius or Solana RPC data.

Feature	Description
`holder_count_change`	Change in unique holders over N periods
`whale_net_flow`	Net tokens moved by top-10 holders
`token_velocity`	Transfer volume / circulating supply
`liquidity_change`	Change in DEX liquidity pool TVL

6. Cross-Asset Features

Capture relationships between the target token and broader market.

Feature	Description
`sol_correlation`	Rolling correlation with SOL price
`btc_beta`	Rolling beta to BTC returns
`sector_momentum`	Average return of tokens in same sector

7. Time Features

Cyclical encoding of calendar time. Use sin/cos encoding to preserve cyclical continuity (hour 23 is close to hour 0).

import numpy as np

hour_sin = np.sin(2 * np.pi * hour / 24)
hour_cos = np.cos(2 * np.pi * hour / 24)
day_of_week = np.sin(2 * np.pi * day / 7)

Stationarity

Non-stationary features will cause your model to fail on new data. A feature is stationary if its statistical properties (mean, variance) don't change over time.

Testing for Stationarity

Use the Augmented Dickey-Fuller (ADF) test:

from scipy.stats import adfuller

result = adfuller(feature_series.dropna())
p_value = result[1]
is_stationary = p_value < 0.05

Making Features Stationary

Non-Stationary	Stationary Transform
Price	Log return
Volume	Volume ratio (vol / avg vol)
OBV	OBV slope (regression coefficient)
Holder count	Holder count change
RSI	Already stationary (bounded 0-100)
Dollar volume	Dollar volume / rolling mean

Rule: If a feature trends upward or downward over time, it is non-stationary. Transform it into a ratio, difference, or rate of change.

Normalization

After computing features, normalize them so that all features have comparable scales. This is critical for distance-based models (KNN, SVM) and helpful for tree models.

Method	Formula	When to Use
Z-score	`(x - mean) / std`	Gaussian-like distributions
Min-max	`(x - min) / (max - min)`	Bounded features (RSI, BB position)
Rank	`rank(x) / len(x)`	Heavy-tailed distributions

Critical: Use rolling statistics for normalization. Never use full-sample mean/std — that introduces lookahead bias.

# CORRECT: rolling z-score
z = (feature - feature.rolling(60).mean()) / feature.rolling(60).std()

# WRONG: full-sample z-score (lookahead bias!)
z = (feature - feature.mean()) / feature.std()

No-Lookahead Guarantee

The most dangerous bug in trading ML is lookahead bias — using future information to compute features or targets. Follow these rules absolutely:

Rolling calculations only: Never use .mean() or .std() on the full series. Always use .rolling(N).mean().
Shift targets forward, not features backward: The target is close.shift(-N) / close - 1 (future return), not close / close.shift(N) - 1 (past return used as target).
No future index alignment: When joining feature and target DataFrames, verify that feature row t is paired with target row t (where target already contains the forward shift).
Train/test split by time: Never random split. Always train = data[:split_idx], test = data[split_idx:].

Feature Selection

After computing many features, select the most predictive and least redundant:

Step 1: Remove Low-Variance Features

from sklearn.feature_selection import VarianceThreshold
selector = VarianceThreshold(threshold=0.01)
X_filtered = selector.fit_transform(X)

Step 2: Correlation Filter

Remove features with > 0.9 correlation to another feature (keep the one with higher target correlation):

corr_matrix = X.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
to_drop = [col for col in upper.columns if any(upper[col] > 0.9)]

Step 3: Feature Importance

Train a random forest and rank by importance:

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
importances = pd.Series(rf.feature_importances_, index=X.columns).sort_values(ascending=False)

Step 4: Mutual Information

Non-linear alternative to correlation:

from sklearn.feature_selection import mutual_info_classif
mi = mutual_info_classif(X_train, y_train, random_state=42)
mi_scores = pd.Series(mi, index=X.columns).sort_values(ascending=False)

Label Creation

Labels (targets) define what the model learns to predict.

Binary Classification

forward_return = close.shift(-N) / close - 1
label = (forward_return > threshold).astype(int)  # 1 = up, 0 = not up

Typical thresholds: 1% for 1h bars, 3% for 4h bars, 5% for daily bars.

Multi-Class Classification

label = pd.cut(forward_return,
               bins=[-np.inf, -threshold, threshold, np.inf],
               labels=[0, 1, 2])  # 0=down, 1=flat, 2=up

Regression

target = forward_return  # Predict exact return magnitude

Binary classification is recommended for initial models — it's simpler and more robust to noise.

Integration with Other Skills

pandas-ta: Compute technical indicators that become features
birdeye-api: Fetch OHLCV and trade data for feature computation
helius-api: Fetch on-chain data for holder/whale features
signal-classification: Use engineered features as model inputs
regime-detection: Regime labels as features or for regime-conditional models
ohlcv-processing: Clean and resample raw data before feature computation

Files

References

references/feature_catalog.md — Complete catalog of ~40 features with formulas, lookbacks, stationarity status, and interpretation notes
references/pitfalls.md — Common mistakes in trading feature engineering: lookahead bias, overfitting, survivorship bias, data snooping, non-stationarity

Scripts

scripts/build_features.py — Compute 25+ features from OHLCV data with stationarity testing and quality reporting. Supports demo mode with synthetic data or live data via Birdeye API.
scripts/feature_importance.py — Rank features by predictive power using tree-based importance and permutation importance. Identifies redundant features via correlation analysis.