📄 ドキュメントコミュニティ

csv-data-wrangler

PythonやDuckDB、コマンドラインツールを駆使し、CSVファイルの高速処理、解析、データクレンジングを行うSkill。

📜 元の英語説明(参考)

Expert in high-performance CSV processing, parsing, and data cleaning using Python, DuckDB, and command-line tools. Use when working with CSV files, cleaning data, transforming datasets, or processing large tabular data files.

🇯🇵 日本人クリエイター向け解説

一言でいうと

PythonやDuckDB、コマンドラインツールを駆使し、CSVファイルの高速処理、解析、データクレンジングを行うSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o csv-data-wrangler.zip https://jpskill.com/download/6636.zip && unzip -o csv-data-wrangler.zip && rm csv-data-wrangler.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/6636.zip -OutFile "$d\csv-data-wrangler.zip"; Expand-Archive "$d\csv-data-wrangler.zip" -DestinationPath $d -Force; ri "$d\csv-data-wrangler.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して csv-data-wrangler.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → csv-data-wrangler フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

[Skill 名] csv-data-wrangler

CSVデータラングラー

目的

効率的なCSVファイルの処理、データクリーニング、変換に関する専門知識を提供します。大規模ファイル、エンコーディングの問題、不正な形式のデータ、表形式データワークフローのパフォーマンス最適化に対応します。

使用する場面

大規模なCSVファイルを効率的に処理する場合
CSVデータのクリーニングと検証を行う場合
データセットの変換と再形成を行う場合
エンコーディングや区切り文字の問題に対処する場合
CSVファイルを結合または分割する場合
表形式フォーマット間で変換する場合
SQL (DuckDB) でCSVをクエリする場合

クイックスタート

このスキルを呼び出すのは、以下の場合です。

大規模なCSVファイルを効率的に処理する場合
CSVデータのクリーニングと検証を行う場合
データセットの変換と再形成を行う場合
エンコーディングや区切り文字の問題に対処する場合
SQLでCSVをクエリする場合

以下の場合には呼び出さないでください。

書式設定されたExcelファイルを作成する場合 (xlsx-skillを使用)
データの統計分析を行う場合 (data-analystを使用)
データパイプラインを構築する場合 (data-engineerを使用)
データベース操作を行う場合 (sql-proを使用)

意思決定フレームワーク

Tool Selection by File Size:
├── < 100MB → pandas
├── 100MB - 1GB → pandas with chunking or polars
├── 1GB - 10GB → DuckDB or polars
├── > 10GB → DuckDB, Spark, or streaming
└── Quick exploration → csvkit or xsv CLI

Processing Type:
├── SQL-like queries → DuckDB
├── Complex transforms → pandas/polars
├── Simple filtering → csvkit/xsv
└── Streaming → Python csv module

主要なワークフロー

1. 大規模CSV処理

ファイルのプロファイリング (サイズ、エンコーディング、区切り文字)
スケールに適したツールの選択
メモリ制約がある場合はチャンクで処理
エンコーディングの問題 (UTF-8, Latin-1) の処理
列ごとのデータ型の検証
適切なクォーティングで出力の書き込み

2. データクリーニングパイプライン

構造を理解するためにサンプルをロード
欠損値と不正な形式の値を特定
列ごとのクリーニングルールを定義
変換の適用
出力品質の検証
クリーニング統計のログ記録

3. DuckDBによるCSVクエリ

DuckDBをCSVファイルにポイント
DuckDBにスキーマを推論させる
SQLクエリを直接記述
結果を新しいCSVにエクスポート
必要に応じてParquetとして永続化

ベストプラクティス

常にエンコーディングを明示的に指定してください。
大規模ファイルにはチャンク読み込みを使用してください。
ツールを選択する前にプロファイリングを行ってください。
元のファイルを保持し、新しいファイルに書き込んでください。
処理前後にレコード数を検証してください。
クォートされたフィールドとエスケープを適切に処理してください。

アンチパターン

アンチパターン	問題	正しいアプローチ
すべてをメモリにロードする	大規模ファイルでOOM	チャンク処理またはストリーミングを使用する
エンコーディングを推測する	文字化け	最初にchardetで検出する
クォーティングを無視する	フィールド解析の破損	適切なCSVパーサーを使用する
検証を行わない	サイレントなデータ破損	行/列数を検証する
手動での文字列分割	エッジケースで壊れる	csvモジュールまたはpandasを使用する

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

CSV Data Wrangler

Purpose

Provides expertise in efficient CSV file processing, data cleaning, and transformation. Handles large files, encoding issues, malformed data, and performance optimization for tabular data workflows.

When to Use

Processing large CSV files efficiently
Cleaning and validating CSV data
Transforming and reshaping datasets
Handling encoding and delimiter issues
Merging or splitting CSV files
Converting between tabular formats
Querying CSV with SQL (DuckDB)

Quick Start

Invoke this skill when:

Processing large CSV files efficiently
Cleaning and validating CSV data
Transforming and reshaping datasets
Handling encoding and delimiter issues
Querying CSV with SQL

Do NOT invoke when:

Building Excel files with formatting (use xlsx-skill)
Statistical analysis of data (use data-analyst)
Building data pipelines (use data-engineer)
Database operations (use sql-pro)

Decision Framework

Tool Selection by File Size:
├── < 100MB → pandas
├── 100MB - 1GB → pandas with chunking or polars
├── 1GB - 10GB → DuckDB or polars
├── > 10GB → DuckDB, Spark, or streaming
└── Quick exploration → csvkit or xsv CLI

Processing Type:
├── SQL-like queries → DuckDB
├── Complex transforms → pandas/polars
├── Simple filtering → csvkit/xsv
└── Streaming → Python csv module

Core Workflows

1. Large CSV Processing

Profile file (size, encoding, delimiter)
Choose appropriate tool for scale
Process in chunks if memory-constrained
Handle encoding issues (UTF-8, Latin-1)
Validate data types per column
Write output with proper quoting

2. Data Cleaning Pipeline

Load sample to understand structure
Identify missing and malformed values
Define cleaning rules per column
Apply transformations
Validate output quality
Log cleaning statistics

3. CSV Query with DuckDB

Point DuckDB at CSV file(s)
Let DuckDB infer schema
Write SQL queries directly
Export results to new CSV
Optionally persist as Parquet

Best Practices

Always specify encoding explicitly
Use chunked reading for large files
Profile before choosing tools
Preserve original files, write to new
Validate row counts before/after
Handle quoted fields and escapes properly

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Loading all to memory	OOM on large files	Use chunking or streaming
Guessing encoding	Corrupted characters	Detect with chardet first
Ignoring quoting	Broken field parsing	Use proper CSV parser
No validation	Silent data corruption	Validate row/column counts
Manual string splitting	Breaks on edge cases	Use csv module or pandas