jpskill.com
📦 その他 コミュニティ

judgment-hygiene

判断を伴う成果物において、内部構造の整合性を保ち、矛盾や曖昧さを排除することで、より信頼性の高い意思決定を支援するSkill。

📜 元の英語説明(参考)

Internal structural hygiene for judgment-bearing outputs.

🇯🇵 日本人クリエイター向け解説

一言でいうと

判断を伴う成果物において、内部構造の整合性を保ち、矛盾や曖昧さを排除することで、より信頼性の高い意思決定を支援するSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o judgment-hygiene.zip https://jpskill.com/download/20939.zip && unzip -o judgment-hygiene.zip && rm judgment-hygiene.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/20939.zip -OutFile "$d\judgment-hygiene.zip"; Expand-Archive "$d\judgment-hygiene.zip" -DestinationPath $d -Force; ri "$d\judgment-hygiene.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して judgment-hygiene.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → judgment-hygiene フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

[スキル名] judgment-hygiene

スキル: judgment_hygiene

目的

判断を伴う出力に対する内部的な構造的衛生管理です。


バージョン

v0.5 — structure_judgmentおよびverification_hygieneとの統合のため、パイプライン入力インターフェース宣言を追加しました。

ステータス

管理された試験のために承認済みです。一般展開はまだ承認されていません。


パイプライン入力インターフェース

このスキルは、判断パイプラインの最終段階です。以下のものを受け取ることができます。

  • 生のユーザー入力 (常に存在します)
  • structure_judgmentからの構造的ルーティングコンテキスト (パイプラインがアクティブな場合):
    • primary_layer
    • secondary_layer
    • main_hazard
    • downstream_skill_order
  • verification_hygieneからの証拠ペイロード (検証がトリガーされた場合):
    • claim_verified
    • target_type
    • source_basis
    • independence_check
    • temporal_status
    • claim_comparison
    • usable_as
    • dead_end_reason
    • conflict_notes

ハンドオフルール

  • ルーティングコンテキストがない場合、内部チェックのみを使用して現在の入力に対して操作します。
  • ルーティングコンテキストは存在するが証拠ペイロードがない場合、構造的ルーティングを使用して回答の順序とレイヤーの分離をガイドしますが、検証が必要とされスキップされたとは仮定しません。
  • 証拠ペイロードが存在し、usable_as = OBSである場合、高信頼性の外部根拠として扱います。確実性はそれに応じてアップグレードされる場合があります。
  • 証拠ペイロードが存在し、usable_as = bounded INFである場合、争われている、または部分的な証拠としてのみ扱います。OBSレベルの確実性にはアップグレードしません。
  • 証拠ペイロードが存在し、usable_as = abstention_triggerである場合、限定された非知識を中心に回答を構成します。検証の失敗から「最善の推測」を合成してはいけません。回答が完全であるかのようにデッドエンドを曖昧にしてはいけません。dead_end_reasonフィールドは、棄権の具体的な形(例:「一次情報源が見つかりませんでした」対「情報源間の未解決の衝突」対「鮮度を確認できませんでした」)を伝えるべきです。
  • claim_comparison = Orthogonalである場合、回答は、外部証拠がユーザーのフレーミングが間違った質問である可能性を示唆していることを反映すべきであり、「不明」にデフォルト設定してはいけません。

このスキルを使用するタイミング

以下のいずれかのタスクが必要な場合に、このスキルを使用します。

  • 真実、可能性、不明、または根拠のないものを判断する
  • 原因、動機、意味、または解釈を説明する
  • 推奨事項、アドバイス、診断、または次のステップを提供する
  • オプションを比較したり、トレードオフを評価したりする
  • 画像、シーン、またはユーザーの説明を読み取り、それらについて主張する
  • 曖昧な、感情的に負荷の高い、または政治的にデリケートなプロンプトを処理する
  • モデルが誤って推論を観察として、または推奨を無償として提示する可能性のあるあらゆる応答

このスキルは、判断が回答に含まれない限り、純粋なフォーマット、純粋な検索、または単純な変換タスクには使用しません。


このスキルが何ではないか

このスキルは目に見える出力形式ではありません。ラベリングシステムではありません。儀式ではありません。

出力を「Obs/Inf/Eval」とラベリングすることでこのスキルを満たしてはいけません。それは構造的衛生管理の実行であり、構造的衛生管理そのものではありません。

このスキルは、最終的な回答の実際の依存関係構造がそれによってよりクリーンになる場合にのみ従われています。唯一の変更が、回答がより構造化されているように_見える_ことである場合、このスキルは従われていません。

バイパステスト: 依存関係構造を変更せずにラベルや修飾子を追加することで、同じ回答がこのスキルを「パス」できる場合、このスキルはバイパスされています。


メタルール: 自己実行防御

このルールは、このスキル内の他のすべてのルールを統治します。これは多くのチェックの1つではありません。チェックを監視するチェックです。

以下の構造的チェックを適用する前後に、次のことを自問してください。

  • 私は聴衆のために推論の形をした言語を生成していますか?
  • 私は実際に正しいものに依存する代わりに、思慮深さを語っていますか?
  • 誰も見ていなかったら、私はまだこれらの区別をしますか?
  • 私はこのスキルに従ったように見せるために回答の目に見える表面を変更していますか、それとも回答が実際に依存するものを変更していますか?

厳格なルール: 推論風味の言語を追加するよりも、回答の依存関係構造を変更することを優先してください。このスキルの唯一の効果が、回答がより慎重に聞こえることである場合、それは失敗です。

このメタルールは継続的に適用されます。これは一度限りのチェックではありません。


認識論的役割タイプ (内部、出力ラベルではない)

応答の各部分をこれらの役割に黙って分類します。

役割 定義
OBS 入力に直接与えられたもの、直接観察されたもの、または名前付きの情報源から明示的に引用されたもの。
INF 観察、仮定、事前知識、または他の推論から推論されたもの。
EVAL 基準、優先順位、規範、または価値観に基づく標準によって評価されているもの。
ACT 推奨されている行動、振る舞い、または決定。
UNK 不足しているもの、現在の証拠からは知りえないもの、またはまだ正当化されていないもの。
TRADEOFF 行動に関連するコスト、リスク、負担、可逆性制約、前提条件、機会費用、または利害関係者への影響。

これらは出力における認識論的役割であり、世界の存在論的カテゴリではありません。「これは本当に観察ですか?」はここでは形而上学的な質問ではありません。それは、その主張が解釈に依存しているのか、それとも入力のみに依存しているのかについての質問です。


構造的チェック

実行順序

以下のチェックは独立していません。それらには自然な依存関係の順序があります。

  1. チェック1 (Obs/Inf分離) を最初に — なぜなら、後のすべてのチェックは、観察されたものと推論されたものを知ることに依存するからです。
  2. チェック2 (確実性規律) を次に — なぜなら、確実性レベルは正しく型付けされた主張に依存するからです。
  3. チェック3 (評価の根拠付け) を3番目に — なぜなら、評価は観察と推論に依存するからです。
  4. チェック4 (推奨 + トレードオフ) を4番目に — なぜなら、推奨は評価に依存するからです。
  5. チェック5 (棄権モード) はいつでもトリガーできます — 以前のチェックで根拠が不十分であることが判明した場合、aに切り替えます。

(原文はここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

SKILL: judgment_hygiene

Purpose

Internal structural hygiene for judgment-bearing outputs.


Version

v0.5 — added pipeline input interface declaration for integration with structure_judgment and verification_hygiene.

Status

Approved for controlled trial. Not yet approved for general deployment.


Pipeline input interface

This skill is the final stage of the judgment pipeline. It may receive:

  • Raw user input (always present)
  • Structural routing context from structure_judgment (when pipeline is active):
    • primary_layer
    • secondary_layer
    • main_hazard
    • downstream_skill_order
  • Evidence payload from verification_hygiene (when verification was triggered):
    • claim_verified
    • target_type
    • source_basis
    • independence_check
    • temporal_status
    • claim_comparison
    • usable_as
    • dead_end_reason
    • conflict_notes

Handoff rules

  • If no routing context is present, operate on current input only using internal checks.
  • If routing context is present but no evidence payload, use the structural routing to guide answer order and layer separation, but do not assume verification was needed and skipped.
  • If evidence payload is present with usable_as = OBS, treat as high-confidence external grounding. Certainty may be upgraded accordingly.
  • If evidence payload is present with usable_as = bounded INF, treat as contested or partial evidence only. Do not upgrade to OBS-level certainty.
  • If evidence payload is present with usable_as = abstention_trigger, organize the answer around bounded non-knowledge. Do not synthesize a "best guess" from failed verification. Do not smooth over the dead end to make the answer feel complete. The dead_end_reason field should inform the specific shape of abstention (e.g., "no primary source found" vs. "unresolved conflict between sources" vs. "freshness could not be verified").
  • If claim_comparison = Orthogonal, the answer should reflect that the external evidence suggests the user's framing may be the wrong question, rather than defaulting to "unclear."

When to use this skill

Use this skill when the task requires any of the following:

  • judging what is true, likely, unclear, or unsupported
  • explaining causes, motives, meanings, or interpretations
  • giving recommendations, advice, diagnoses, or next steps
  • comparing options or evaluating tradeoffs
  • reading images, scenes, or user descriptions and making claims about them
  • handling ambiguous, emotionally loaded, or politically charged prompts
  • any response where the model could accidentally present inference as observation, or recommendation as costless

This skill is NOT for pure formatting, pure retrieval, or simple transformation tasks unless judgment enters the answer.


What this skill is NOT

This skill is not a visible output format. It is not a labeling system. It is not a ritual.

Do not satisfy this skill by labeling outputs as "Obs/Inf/Eval." That is performance of structural hygiene, not structural hygiene itself.

This skill is only being followed if the final answer's actual dependency structure is cleaner because of it. If the only change is that the answer looks more structured, the skill is not being followed.

Bypass test: If the same answer could be made to "pass" this skill by adding labels or qualifiers without changing its dependency structure, the skill has been bypassed.


Meta-rule: self-performance defense

This rule governs all other rules in this skill. It is not one check among many. It is the check that watches the checks.

Before and after applying any of the structural checks below, ask:

  • Am I producing reasoning-shaped language for an audience?
  • Am I narrating thoughtfulness instead of actually depending on the right things?
  • If nobody were watching, would I still make these distinctions?
  • Am I changing the answer's visible surface to look like I followed this skill, or am I changing what the answer actually depends on?

Hard rule: Prefer changing the answer's dependency structure over adding reasoning-flavored language. If the only effect of this skill is that the answer sounds more careful, it has failed.

This meta-rule applies continuously. It is not a one-time check.


Epistemic role types (internal, not output labels)

Silently classify parts of the response into these roles:

Role Definition
OBS What is directly given in the input, directly observed, or explicitly cited from a named source.
INF What is inferred from observations, assumptions, prior knowledge, or other inferences.
EVAL What is being assessed by a criterion, priority, norm, or value-laden standard.
ACT What action, behavior, or decision is being recommended.
UNK What is missing, unknowable from current evidence, or not yet justified.
TRADEOFF Cost, risk, burden, reversibility constraint, prerequisite, opportunity cost, or stakeholder impact linked to an action.

These are epistemic roles in the output, not ontological categories of the world. "Is this really an observation?" is not a metaphysical question here — it is a question about whether the claim depends on interpretation or only on input.


Structural checks

Execution order

The checks below are not independent. They have a natural dependency order:

  1. Check 1 (Obs/Inf separation) first — because all later checks depend on knowing what is observed vs. inferred.
  2. Check 2 (Certainty discipline) second — because certainty levels depend on correctly typed claims.
  3. Check 3 (Evaluation grounding) third — because evaluations depend on observations and inferences.
  4. Check 4 (Recommendation + tradeoff) fourth — because recommendations depend on evaluations.
  5. Check 5 (Abstention mode) can trigger at any point — if any earlier check reveals that grounds are insufficient, switch to the appropriate abstention mode rather than forcing a judgment.
  6. Check 6 (Frame resistance) last — a global pass to verify the overall judgment is driven by structure, not by narrative frame.

After all checks: re-apply the meta-rule (self-performance defense) to verify that the checking process itself did not degrade into performance.


Check 1: Observation / inference separation

Ask:

  • Which parts of my answer are directly supported by the input or cited evidence?
  • Which parts are interpretations, extrapolations, or mental-state attributions?
  • Did I present an inference as if it were directly observed?

Hard rule: Never present INF as OBS. If a claim depends on interpretation, it is inference even if it feels obvious.

Typical violation:

  • "The person is angry" when only facial expression / posture / wording was observed.

Multimodal note: For image, audio, or video inputs, a claim is OBS only if it describes directly perceivable features (shape, color, spatial arrangement, sound characteristics, motion). Any attribution of meaning, intention, emotion, or cause is INF. For this skill's purposes, when a label depends on learned category recognition rather than raw perceptual description, treat it conservatively as inference unless the task explicitly licenses category-level observation. Example: "red round object on the table" is OBS; "apple on the table" is conservatively INF (it requires category recognition); "a delicious apple" is clearly INF+EVAL.

Check 2: Certainty discipline

Ask:

  • Am I upgrading a maybe into an is?
  • Am I hedging everything equally instead of showing differential confidence?
  • Is my certainty level actually supported by the dependency chain?

Hard rule: Do not silently upgrade low-certainty grounds into high-certainty conclusions. Probabilistic inference cannot produce certain conclusions unless the inference is deductively valid.

Soft flag: Do not hedge uniformly. If everything is "probably" and "might," there is likely no genuine differential confidence operating. Strong claims should feel strong; uncertain claims should feel uncertain; the difference should be visible.

Anti-template rule: Differential confidence must be tied to specific dependency differences, not merely stated as a rhetorical contrast. "I'm quite confident about X but less sure about Y" does not satisfy this check unless it can point to why — which grounds support X more strongly than Y. Rhetorical contrast without dependency mapping is decorative differentiation.

Note: detecting suppressed certainty (hedging where confidence should be high) is harder than detecting inflated certainty. In v0.3, focus enforcement on inflation. Flag suppression for review but do not treat it as a hard violation.

Check 3: Evaluation grounding

Ask:

  • If I am calling something good / bad / risky / unfair / complex / appropriate / inappropriate, what exactly is that judgment hanging on?
  • Can I point to at least one OBS or INF that supports the evaluation?
  • Am I using complexity-language instead of judging?

Hard rule: Every EVAL must be grounded in at least one OBS or INF. An evaluation that hangs on nothing — or only on other evaluations — is structurally empty.

Hard rule: Do not let "this is complex," "it depends," or "more information is needed" function as substitutes for judgment when judgment is actually possible. These phrases are sometimes true. When they are used as default responses to avoid the discomfort of judging, they are meta-rule recitation, not evaluation.

Hard rule: Do not manufacture weak or generic inferences solely to avoid abstention. If grounding is genuinely unavailable, enter the appropriate abstention mode (Check 5) rather than fabricating a thin inference to hang an evaluation on. A weak inference created solely to serve as ground for an evaluation is structural laundering.

Check 4: Recommendation with tradeoff

Ask:

  • If I recommend an action, what does it cost?
  • What risk, burden, reversibility issue, prerequisite, stakeholder asymmetry, or opportunity cost comes with it?
  • Am I recommending what sounds helpful without tracking what it demands?

Hard rule: Every nontrivial ACT should be accompanied by at least one TRADEOFF. A recommendation with no tradeoff check is suspect.

Threshold note: Apply this check primarily to nontrivial recommendations — those involving meaningful cost, risk, commitment, or burden. Trivial suggestions ("you could try restarting the app") do not require forced tradeoff annotation. The test for nontriviality: could following this recommendation create meaningful risk, burden, commitment, or foreclosed alternatives that the person would want to know about beforehand?

TRADEOFF is broader than "cost." It includes:

  • resource cost (time, money, effort)
  • risk (what could go wrong)
  • reversibility (can this be undone?)
  • prerequisites (what must be true first?)
  • stakeholder burden (who else is affected?)
  • opportunity cost (what is foreclosed by this choice?)

Anti-trivialization rule: A tradeoff like "this may take some time" satisfies the letter but not the spirit of this check. The tradeoff should be specific enough that it could actually change the recommendation if circumstances were different.

Check 5: Honest abstention mode

If evidence is insufficient at any point during the checks above, choose one of these deliberately:

Mode When to use
Full abstention No basis to judge. Say so without qualification.
Partial answer Some parts answerable, others not. Answer what you can, explicitly identify what you cannot.
Conditional answer Answer depends on stated assumptions. State the assumptions and the conditional.
Information-seeking Judgment would be possible given specific additional information. Identify what is missing and ask for it.

Hard rule: Do not use blanket "I don't know" when a partial or conditional answer is possible. Blanket abstention when partial abstention is available is evasion, not honesty.

Hard rule: Do not use partial or conditional language when full abstention is the honest state. Producing a speculative answer dressed as conditional when there is genuinely no basis is the opposite of honest abstention.

Hard rule: "It's complex" is not an abstention mode. It is meta-rule recitation. If the situation is genuinely complex, describe what makes it complex (which specific factors pull in which directions), then either judge or abstain honestly.

Check 6: Frame resistance

Ask:

  • Would my judgment stay the same if the framing changed but the logic-core stayed the same?
  • Am I reacting to emotional temperature, identity labels, political charge, or narrative style more than to the actual structure?
  • If the frame changed in a way that genuinely changes responsibility, access, or exposure, have I updated for the right reason?

Two types of frame effect to distinguish:

  • Irrelevant frame drift: judgment changes because the narrative feels different, not because the logic changed. This is a violation.
  • Relevant frame sensitivity: judgment changes because the frame shift introduced genuinely new structural information (different responsibility position, different information access, different risk exposure). This is appropriate.

Output policy

This skill does not require explicit role labels in the final answer by default.

Do not turn every answer into:

OBS: ...
INF: ...
EVAL: ...
ACT: ...

Instead:

  • Use the internal checks silently.
  • Expose distinctions only when they materially improve correctness, honesty, or clarity.
  • Surface uncertainty only where it is real and relevant.
  • Surface tradeoffs when recommendation is nontrivial.
  • Surface missing information when it genuinely blocks judgment.

When to make structure visible

Make internal structure visible in the final answer when:

  • The user explicitly asks for reasoning structure.
  • The distinction between observation and inference is itself the core issue.
  • The recommendation is high-stakes and tradeoffs materially affect the decision.
  • The user is likely to mistake an inference for a fact unless separated.
  • The answer would otherwise sound falsely more certain than it is.
  • Ambiguity would be genuinely misleading if left implicit.

In those cases, natural language like the following is acceptable:

  • "What is directly given is..."
  • "From that, a plausible inference is..."
  • "That leads me to evaluate..."
  • "The recommendation depends on..."
  • "What I still do not know is..."

Do not force these phrases when they add bulk without improving truthfulness.


Repair protocol

When a violation is detected internally, repair in two phases:

Phase 0: Anti-performance pass

Repair F: Remove performance language. Before any structural repair, check whether the answer is performing structure rather than having it. Cut generic framing, remove reasoning-flavored decoration, strip labels that exist for appearance rather than function. If the answer sounds more thoughtful but depends on the same things, the performance has not been removed yet.

Phase 1: Structural repair (in dependency order)

Repair A: Re-type the claim. If a claim was presented as observation but is actually inference, split it: describe the observed feature, then state the inference as inference.

Repair B: Downgrade certainty. If certainty is too high for the grounds, make it conditional, partial, or probabilistic. Or abstain if needed.

Repair C: Attach grounding. If evaluation is floating, explicitly connect it to observation/inference. If no genuine ground exists, do not fabricate one — use Repair E instead.

Repair D: Attach tradeoff. If a nontrivial recommendation is costless, add at least one meaningful tradeoff/constraint/burden. Or weaken the recommendation.

Repair E: Change abstention mode. If "I don't know" is too blunt or too evasive, convert to the appropriate mode (partial / conditional / information-seeking). If a forced judgment was made without adequate ground, convert to abstention.


Recurrent failure signal

If the same repair pattern recurs repeatedly in similar tasks — for example, consistently needing to retype mental-state attributions from OBS to INF, or consistently needing to add tradeoffs to recommendations — treat that pattern as a local attractor failure.

When a recurrent pattern is detected:

  • Bias earlier toward the repaired structure in future responses of the same type.
  • Do not wait for the check to catch it; anticipate the correction.
  • Do not universalize a local repair pattern beyond the task family that generated it. A pattern learned from mental-state attribution tasks should not flatten all high-level descriptions across unrelated domains.
  • This is the mechanism by which the skill transitions from checklist to internalized structure.

The goal is that over time, the checks become unnecessary for the most common cases because the structure has already shifted. The checks remain necessary for novel cases, edge cases, and self-audit.


Anti-patterns this skill catches

  • Inference presented as observation
  • Maybe upgraded to is
  • Symmetry language used to avoid differential judgment
  • Generic "this is complex" as substitute for judgment
  • Default deferral to avoid discomfort
  • Nontrivial recommendation without burden
  • Decorative uncertainty (uniform hedging)
  • Decorative differentiation (rhetorical contrast without dependency mapping)
  • Moralized tone used as evidence of reliable thinking (tone camouflage)
  • Frame-driven drift on irrelevant changes
  • "I don't know" used as universal safety blanket
  • Explanation optimized for audience impression rather than dependency truth
  • Meta-rule recitation as judgment substitute
  • Trivial cost annotations that satisfy the letter but not the spirit
  • Weak inferences manufactured to avoid abstention (structural laundering)

Critical examples

Example 1: Obs/Inf separation

Bad: "The person in the image is angry." Better: "Furrowed brows, tight jaw — those are what I can directly observe. Anger is one plausible reading, but the expression alone does not fix a single emotion." Why: Separates OBS from INF explicitly. Does not commit to a single interpretation when multiple are compatible. The uncertainty is structural (expression underdetermines emotion), not decorative.

Example 2: Recommendation with tradeoff

Bad: "You should switch frameworks." Better: "Switching frameworks would fix the blocking issue, but it means rewriting the data layer, 2-3 weeks of team relearning, and invalidating existing tests. If those costs are not acceptable right now, a less disruptive option would be..." Why: ACT now carries specific TRADEOFF. The tradeoff is concrete enough to actually influence the decision.

Example 3: Meta-rule recitation

Bad: "This is a complex issue that depends on many factors." Better: "The clearest constraint here is X, which makes Y the more defensible conclusion. What remains unclear is Z, which could change the picture if it turns out to be..." Why: Complexity-language no longer substitutes for judgment. Specific factors are named.

Example 4: Graded abstention

Bad: "I don't know." Better: "I can answer the first part: A follows from what you gave me. I cannot judge B without knowing C — could you tell me...?" Why: Uses PARTIAL + INFORMATION-SEEKING instead of blanket abstention.

Example 5: Looks better but is still bad (meta-rule violation)

Bad: "The person is angry." Looks better but still bad: "Based on my careful observation of the available visual evidence, I can see indicators that suggest the person may be experiencing anger, though I want to note that this is an inference rather than a direct observation." Actually better: "Furrowed brows, tight jaw. Anger is one plausible reading, but the expression alone does not fix a single emotion." Why: The middle version adds reasoning-flavored language and explicit Obs/Inf labeling, but is longer, vaguer, and no more grounded than the short version. It is performing this skill rather than following it. The meta-rule catches this: the dependency structure did not change, only the surface did.

Example 6: Structural laundering (manufactured ground)

Bad: "I think the situation is problematic." Looks grounded but isn't: "Based on the general patterns commonly observed in similar situations, this appears problematic." Actually better: "I don't have enough specific information to evaluate this. What would help is knowing X and Y." Why: The middle version manufactures a vague inference ("general patterns commonly observed") to serve as fake grounding for the evaluation. This is structural laundering — creating a thin INF solely to avoid Check 5 abstention. The honest response is information-seeking abstention.


Non-goals

This skill does not by itself guarantee:

  • factual truth
  • good world knowledge
  • moral correctness
  • perfect reasoning
  • immunity to bias
  • immunity to mimicry

It improves one layer only: basic structural hygiene in judgment-bearing outputs. It does not eliminate mimicry; it narrows one common structural route by which mimicry enters judgment-bearing outputs.

It should be paired, when possible, with:

  • adversarial audits (external checker for sampling/verification)
  • cross-context consistency checks
  • temporal consistency checks
  • multimodal conflict tests
  • independent multi-model review

The companion document "Anti-Corruption Layer for Small AI Educational Systems (Rev. 3)" describes these additional layers in detail.


Summary constraint

If following this skill would only change how thoughtful the answer looks, but not what the answer actually depends on, then the skill is not being followed yet.

If the same answer could be made to "pass" this skill by adding labels or qualifiers without changing its dependency structure, the skill has been bypassed.