etl-pipelines
Design and implement ETL pipelines for extracting, transforming, and loading data in data engineering workflows.
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o etl-pipelines.zip https://jpskill.com/download/22183.zip && unzip -o etl-pipelines.zip && rm etl-pipelines.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/22183.zip -OutFile "$d\etl-pipelines.zip"; Expand-Archive "$d\etl-pipelines.zip" -DestinationPath $d -Force; ri "$d\etl-pipelines.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
etl-pipelines.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
etl-pipelinesフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 1
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
etl-pipelines
Purpose
This skill enables OpenClaw to design and implement ETL pipelines for data extraction, transformation, and loading in data engineering workflows. It focuses on handling structured data sources like databases, files, and APIs, ensuring efficient data flow for analytics and reporting.
When to Use
Use this skill when building data pipelines for batch processing, real-time data ingestion, or data migration. Apply it in scenarios involving large datasets (e.g., >1TB), integrating with tools like Apache Spark or AWS Glue, or automating ETL for BI dashboards.
Key Capabilities
- Extract data from sources like CSV, JSON files, SQL databases, or APIs using connectors (e.g., JDBC for databases).
- Transform data with operations such as filtering, aggregation, or SQL queries (e.g., via Pandas or Spark DataFrames).
- Load data into targets like PostgreSQL, BigQuery, or S3 buckets with schema validation and error logging.
- Support for scheduling pipelines with cron-like expressions or integration with orchestration tools like Airflow.
- Handle incremental loads by tracking last processed timestamps or change data capture (CDC).
Usage Patterns
To use this skill, invoke OpenClaw with specific ETL commands. Start by defining a pipeline configuration in JSON format, then execute it via CLI or API. For example, pass a config file like this:
{
"source": {"type": "file", "path": "data/input.csv"},
"transform": {"operations": ["filter column='id' > 100"]},
"destination": {"type": "postgres", "table": "processed_data"}
}
Structure pipelines modularly: extract first, then transform in memory or distributed environments, and finally load with retry mechanisms. Always set environment variables for authentication, e.g., export OPENCLAW_API_KEY=your_key.
Common Commands/API
Use the OpenClaw CLI for ETL tasks. For instance:
- Create a pipeline:
openclaw etl create --config path/to/config.json --env $OPENCLAW_API_KEY(Flags: --config for JSON file, --env for auth; outputs pipeline ID). - Run a pipeline:
openclaw etl run <pipeline-id> --params '{"batch_size": 1000}'(Flags: --params for runtime overrides; monitors progress via stdout). - API endpoints: POST /v1/etl/pipelines to create, with body as JSON config; GET /v1/etl/pipelines/{id} to retrieve status.
For code integration, use OpenClaw's Python SDK:
import openclaw client = openclaw.Client(api_key=os.environ['OPENCLAW_API_KEY']) pipeline = client.etl.create(config={'source': 'file.csv', 'transform': 'sql_query'}) client.etl.run(pipeline.id)Always validate configs with
openclaw etl validate --file path/to/config.jsonbefore execution.
Integration Notes
Integrate this skill with data tools by referencing dependencies in your config, e.g., specify "engine": "spark" for distributed processing. For AWS, set env vars like $AWS_ACCESS_KEY_ID and use connectors like S3 for sources. Chain with other OpenClaw skills by passing outputs, e.g., pipe ETL results to a machine-learning skill. Ensure compatibility by matching data formats (e.g., Parquet for big data). For multi-tool setups, use webhooks: configure POST /v1/etl/webhook to trigger on external events.
Error Handling
Handle errors by wrapping commands in try-catch blocks or using built-in flags like --retry 3 for automatic retries on transient failures (e.g., network issues). Check API responses for error codes (e.g., 400 for bad config, 500 for server errors) and log details. In code, use:
try:
client.etl.run(pipeline.id)
except openclaw.EtlError as e:
print(f"Error: {e.code} - {e.message}; Retrying...")
Monitor logs with openclaw etl logs <pipeline-id> and set thresholds for failures, e.g., abort if >10% records fail validation. Use env vars for custom error handlers, like $ETL_ERROR_WEBHOOK_URL.
Concrete Usage Examples
-
Extract from a CSV file, transform with SQL, and load into PostgreSQL:
First, create config:{"source": {"type": "file", "path": "sales.csv"}, "transform": {"sql": "SELECT * FROM data WHERE amount > 100"}, "destination": {"type": "postgres", "table": "sales_filtered", "conn_str": "dbname=mydb"}}
Then run:openclaw etl create --config sales_config.json; openclaw etl run <id> --env $OPENCLAW_API_KEY
This processes 1M rows in under 5 minutes on a standard setup. -
Incremental ETL for a database source to BigQuery:
Config:{"source": {"type": "mysql", "query": "SELECT * FROM orders WHERE updated_at > '2023-01-01'"}, "transform": {"operations": ["add_column": "processed_at=NOW()"}], "destination": {"type": "bigquery", "dataset": "my_dataset", "table": "orders_incremental"}}
Execute:openclaw etl run <pipeline-id> --params '{"incremental": true}'
This handles daily updates, appending only new records.
Graph Relationships
- Related to: data-engineering cluster (e.g., skills like data-warehousing, big-data-processing)
- Connected via tags: etl (links to data-pipelines), data-engineering (links to analytics-tools)
- Dependencies: Requires authentication with $OPENCLAW_API_KEY; integrates with external tools like Spark or Airflow for orchestration.