# aicodingpricing.com — SEO Copy Freeze Package

Date: 2026-05-28
Task: t_072e9f4c
Site: aicodingpricing.com
Source PRD: /root/.hermes/reports/aicodingpricing-leaderboard-20260528/prd.md
Data contract: /root/.hermes/reports/aicodingpricing-leaderboard-20260528/data-contract.md
Seed dataset: /root/.hermes/reports/aicodingpricing-leaderboard-20260528/model-leaderboard-seed.json

## Freeze decision

These three pages are frozen as P0 existing-site inner pages:

1. /llm-leaderboard
2. /best-llm-for-coding
3. /coding-agent-cost-calculator

Positioning lock:
AICodingPricing is not a generic LLM leaderboard. These pages help developers choose AI coding models by workflow evidence, token pricing, context, caveats, and estimated task cost.

Copy safety rules:
- Never claim one model is the best for every coding task.
- Never merge different benchmarks into a fake universal score.
- Never invent benchmark, speed, context, price, or task-cost values.
- Unknown values must appear as `not disclosed`, `not publicly benchmarked`, `source needs recheck`, or `partial evidence`.
- Token price and task cost must stay separate.
- Every recommendation must carry a caveat, evidence source, confidence label, and calculator CTA.

---

# Page 1: /llm-leaderboard

## SEO fields

Path: /llm-leaderboard
Index policy: index
Canonical: self
Primary keyword: ai coding model leaderboard
Semantic keywords: best llm for coding, coding model benchmark, coding agent model, llm coding benchmark, AI coding model comparison, coding model cost, coding agent cost
Target visible word count: 1,200–1,800
Primary density range: semantic cluster 2.2%–3.4%; do not stuff exact-match `ai coding model leaderboard`

SEO Title:
AI Coding Model Leaderboard: Compare Coding LLMs by Cost & Benchmarks

Meta Description:
Compare AI coding models by public coding benchmarks, API pricing, context, speed signals, caveats, and estimated task cost for coding agents.

OG Title:
AI Coding Model Leaderboard for Coding Agents

OG Description:
Compare coding LLMs by workflow evidence, token price, context, caveats, and task-cost assumptions.

H1:
AI Coding Model Leaderboard for Coding Agents and Developer Workflows

Subhead:
Compare coding LLMs by public benchmark evidence, API pricing, cache pricing, context window, speed signals when available, and estimated task cost for real developer workflows.

Primary CTA:
Estimate Task Cost

Secondary CTA:
Read Methodology

Row CTA:
Calculate This Model’s Task Cost

## Above-fold answer block

The best AI coding model depends on your workflow, not a single universal score. Use this leaderboard to compare public coding benchmark evidence, API pricing, cache pricing, context window, speed signals when available, and estimated task cost across models for coding agents, frontend generation, repo refactors, bug fixing, code review, and test generation.

## Hero copy blocks

Eyebrow:
AI coding model decision engine

Headline:
Compare Coding Models by Workflow Cost

Subhead:
Raw token price is only the starting point. AICodingPricing connects coding benchmark evidence, API pricing, cache rules, context limits, model caveats, and task-cost assumptions so you can choose a model for the work you are actually running.

CTA Primary:
Estimate Task Cost

CTA Secondary:
See Benchmark Methodology

Trust strip:
Public sources only · Official pricing preferred · Missing values marked · No fake universal score

## H2 / H3 outline

H2: Best AI coding models by workflow
- H3: Coding agents and long-horizon edits
- H3: Frontend generation and UI code
- H3: Repo-level refactor and code review
- H3: Low-cost automation and test generation
- H3: Chinese coding workflows with partial evidence labels

H2: Coding benchmark sources and what each benchmark measures
- H3: SWE-bench and real issue resolution
- H3: Aider and polyglot code editing
- H3: LiveCodeBench, LiveBench, and contamination-aware coding tasks
- H3: Arena and usage signals are not deterministic correctness scores

H2: API price, cache price, and task-cost comparison
- H3: Why cheap tokens can still create expensive coding tasks
- H3: Cache, batch, retry, and output-token assumptions

H2: Context window and long-context caveats
- H3: Advertised context is not the same as reliable repo-level editing
- H3: When to prefer smaller context and lower retry risk

H2: Speed and latency signals
- H3: Use TTFT and tokens/sec only when a public source exists
- H3: Why speed stays `not disclosed` when exact sources are missing

H2: How to choose a model for coding agents
- H3: Start from workflow, then check evidence, price, context, and caveat
- H3: Move from leaderboard row to calculator estimate

H2: FAQ

## Final copy blocks

### Section: Short answer
Design priority: above_fold
Copy:
There is no single coding model that wins every workflow. A model can lead one benchmark, cost less per token, or offer a larger context window, but still be the wrong choice if it retries more often, produces longer trajectories, lacks exact source-backed pricing, or has weak evidence for your task type.

### Section: Filterable leaderboard intro
Design priority: above_fold
Copy:
Use the filters to compare models by workflow, not by hype. Filters can highlight candidates for coding agents, frontend generation, repo refactor, long-context review, low-cost automation, and Chinese coding workflows. A filter is not an absolute ranking. It is a lens over source-backed fields, confidence labels, and caveats.

Recommended filter labels:
- Best coding evidence
- Cheapest good-enough
- Best for coding agents
- Best long context
- Best for frontend generation
- Best for repo refactor
- Best Chinese coding workflow
- Best open / low-cost model

### Section: Leaderboard table helper copy
Design priority: table_context
Copy:
Each row should show the model, provider, benchmark evidence, input price, output price, cache price, context window, speed signal when available, best-for labels, caveat, source, last checked date, confidence, and data status. If a field is unknown, show the reason instead of leaving it blank.

Empty value copy:
- Not disclosed: the provider or source did not publish this value in the verified source.
- Not publicly benchmarked: this exact model was not found in the selected public benchmark source.
- Source needs recheck: a source exists, but exact model alias, price mode, or context value was not verified.
- Partial evidence: the row has useful data, but not enough to support a strong recommendation.

### Section: Methodology caveat
Design priority: near_table
Copy:
This leaderboard does not combine SWE-bench, Aider, LiveCodeBench, arena scores, pricing tables, and usage signals into one fake universal score. Benchmarks measure different things. Pricing pages use different token and cache rules. Some providers publish exact API prices; others require model alias verification. We keep evidence fields separate, show confidence labels, and explain caveats where the public data is incomplete.

### Section: Source and freshness block
Design priority: near_table
Copy:
Pricing data should prefer official provider pages. Benchmark data should link to the original public benchmark or a clearly labeled third-party source. Every source-backed field needs source name, source URL, last checked date, confidence, and update policy. If a model changes, a provider reprices, or a benchmark updates, stale rows should be marked before they are used for recommendations.

### Section: Token price vs task cost
Design priority: conversion_bridge
Copy:
A cheaper token price does not always mean a cheaper coding task. A low-cost model can become expensive if it needs more retries, produces longer outputs, fails more trajectories, or requires extra human review. Use the leaderboard to understand evidence and unit price, then use the calculator to estimate task cost under your workflow assumptions.

### Section: CTA bridge
Design priority: above_and_after_table
Copy:
Found a candidate model? Open the calculator with this model prefilled and test your own workflow assumptions: input tokens, output tokens, retry rate, cache hit rate, and monthly task volume.

CTA Primary:
Calculate This Model’s Task Cost

CTA Secondary:
Compare Another Workflow

## FAQ

Q: What is an AI coding model leaderboard?
A: An AI coding model leaderboard compares language models for coding workflows such as coding agents, frontend generation, repo refactors, code review, bug fixing, and test generation. This page uses public benchmark evidence, pricing, context, caveats, and confidence labels instead of a single generic intelligence score.

Q: What is the best LLM for coding agents?
A: There is no universal best LLM for coding agents. Start with models that have public coding benchmark evidence, source-backed API pricing, reliable context handling, and acceptable retry behavior for your workflow. Then estimate task cost before choosing a default model.

Q: Is a coding benchmark the same as real task cost?
A: No. A coding benchmark measures performance under a specific test setup. Real task cost depends on input length, output length, retries, tool calls, cache behavior, batch mode, failure rate, and human review time. Use benchmarks for evidence, then use task-cost assumptions for budgeting.

Q: What is the cheapest good-enough coding model?
A: The cheapest good-enough model depends on the workflow and the minimum evidence threshold. A low input/output price can be attractive for high-volume automation, but only if retry rate, output length, and failure cleanup do not erase the savings.

Q: Why do benchmark rankings disagree?
A: Benchmarks disagree because they test different tasks, datasets, evaluation harnesses, dates, and scoring rules. SWE-bench, Aider, LiveCodeBench, arena scores, and usage signals should be read as separate evidence sources, not blended into one unquestioned ranking.

Q: How often should leaderboard data be updated?
A: Pricing and benchmark rows should be rechecked on a fixed cadence and whenever major providers launch or reprice models. Every visible row should show a last checked date, confidence label, and status such as available, partial, not disclosed, or source needs recheck.

Q: Why does the page show `not disclosed` or `not publicly benchmarked`?
A: Those labels protect the user from fake precision. If an exact price, context window, speed signal, or benchmark result was not verified from a public source, the page should say so instead of copying values from a similar model or making an assumption.

## Schema copy

Use:
- WebPage
- BreadcrumbList
- FAQPage for visible FAQs only
- Dataset if the leaderboard table is versioned or downloadable
- ItemList only for selected/filter views with a clear methodology caveat

Do not use:
- AggregateRating
- Review schema without real reviews
- Schema that claims a universal model ranking
- Hidden FAQ schema

---

# Page 2: /best-llm-for-coding

## SEO fields

Path: /best-llm-for-coding
Index policy: index
Canonical: self
Primary keyword: best llm for coding
Semantic keywords: best AI model for coding, best coding LLM, best model for coding agents, best LLM for code review, best LLM for frontend generation, cheapest coding model, coding model benchmark
Target visible word count: 1,200–1,600
Primary density range: semantic cluster 2.0%–3.2%; use `best for X`, not `best overall`

SEO Title:
Best LLM for Coding: Choose by Workflow, Evidence & Cost

Meta Description:
Find the best LLM for coding by workflow: agents, refactors, frontend generation, code review, low-cost automation, and task-cost assumptions.

OG Title:
Best LLM for Coding by Workflow

OG Description:
Compare coding models by scenario, evidence, price, context, caveats, and calculator-ready task-cost assumptions.

H1:
Best LLM for Coding by Workflow, Evidence, and Cost

Subhead:
Choose a coding model by the job you need done: agent loops, frontend generation, repo refactor, code review, testing, Chinese coding workflows, or low-cost automation.

Primary CTA:
Compare the Leaderboard

Secondary CTA:
Estimate Coding-Agent Cost

## Above-fold answer block

There is no single best LLM for all coding tasks. The right choice depends on the workflow, public benchmark evidence, API pricing, context needs, retry risk, and output length. Use this page to shortlist models by scenario, then compare source-backed rows and estimate task cost before committing to a model.

## Hero copy blocks

Eyebrow:
Scenario-based coding model guide

Headline:
Choose the Best LLM for Your Coding Workflow

Subhead:
Stop asking which model is best in general. Pick the model that fits the job: coding agents, repo refactors, frontend generation, code review, test generation, Chinese coding workflows, or budget-sensitive automation.

CTA Primary:
Compare Source-Backed Models

CTA Secondary:
Calculate My Coding Cost

Trust strip:
Best for workflow · Evidence visible · Caveats included · Cost estimate next

## H2 / H3 outline

H2: Short answer: the best coding LLM depends on the job
- H3: Why “best overall” is the wrong question
- H3: How to read scenario recommendations

H2: Best LLM for coding agents
- H3: What matters for long-horizon agent tasks
- H3: Evidence, caveat, and calculator handoff

H2: Best LLM for repo-level refactor and code review
- H3: Context window vs effective long-context reliability
- H3: Why caveats matter for large repositories

H2: Best LLM for frontend generation and UI code
- H3: Use benchmark evidence plus human inspection
- H3: When speed and iteration cost matter more than peak reasoning

H2: Best cheap model for coding automation
- H3: Token price is not the same as task cost
- H3: Cheap model checklist before high-volume use

H2: Best LLM for Chinese coding workflows
- H3: Use partial evidence honestly
- H3: Kimi, Qwen, DeepSeek-style rows need exact source labels

H2: How we make recommendations
- H3: Source-backed fields used
- H3: Why confidence labels can lower a recommendation

H2: FAQ

## Final copy blocks

### Section: Short answer
Design priority: above_fold
Copy:
The best LLM for coding is the one that fits your workflow with enough evidence, acceptable cost, and clear caveats. A model that looks strong for repo-level refactor may not be the cheapest option for high-volume test generation. A model with a low token price may still cost more if it creates longer failed trajectories.

### Section: Scenario card — coding agents
Design priority: recommendation_cards
Copy:
Best for coding agents means strong evidence for multi-step coding tasks, reliable tool-following behavior, manageable output cost, and visible caveats around retries, context, and failure cleanup. Use this label for a shortlist, not a crown. Open the leaderboard row to inspect benchmark source, pricing source, confidence, and last checked date.

CTA:
Compare Coding-Agent Candidates

### Section: Scenario card — repo refactor and code review
Design priority: recommendation_cards
Copy:
Repo-level refactor and long-context code review need more than a large advertised context window. Look for source-backed context data, benchmark evidence that matches editing or issue-resolution tasks, and caveats about effective long-context reliability. If context or exact model alias is not verified, the page should say so.

CTA:
Compare Long-Context Candidates

### Section: Scenario card — frontend generation
Design priority: recommendation_cards
Copy:
Frontend generation is partly benchmark evidence and partly practical inspection. Use available coding and webdev signals, but do not treat preference scores as guaranteed correctness. Compare output quality, cost, speed when public, and how often the model needs follow-up prompts.

CTA:
Compare Frontend Models

### Section: Scenario card — cheapest good-enough
Design priority: recommendation_cards
Copy:
A cheap coding model is only good enough if it clears the minimum evidence bar for your task. Before using it for high-volume automation, check benchmark coverage, retry assumptions, output-token behavior, cache options, and failure cleanup cost. If benchmark data is missing, label the recommendation as partial.

CTA:
Estimate Low-Cost Task Spend

### Section: Scenario card — Chinese coding workflow
Design priority: recommendation_cards
Copy:
Chinese coding workflow recommendations should be explicit about data coverage. Kimi, Qwen, DeepSeek, and similar candidates can be useful, but the page must separate source-backed pricing, exact benchmark evidence, context data, and unknown fields. Partial evidence is acceptable. Fake certainty is not.

CTA:
View Chinese Workflow Candidates

### Section: Evidence table intro
Design priority: evidence_table
Copy:
Use the evidence table to see what supports each recommendation. The table should show benchmark/source, price availability, context status, data status, confidence, caveat, and calculator handoff. Recommendations can be reordered as data improves, but unsupported claims should not be shipped.

### Section: Methodology copy
Design priority: methodology
Copy:
Recommendations on this page are editorial labels derived from visible source-backed fields. They are not universal truth. A model can be recommended for a workflow only when the page can show the evidence used, the missing fields, the caveat, and the confidence level. When evidence is incomplete, the recommendation should say partial or insufficient public data.

### Section: Calculator CTA bridge
Design priority: conversion_bridge
Copy:
Once you have a shortlist, the next question is cost. Open the coding-agent cost calculator with a model and workflow prefilled, then adjust input tokens, output tokens, retry rate, cache assumptions, and monthly task volume.

CTA Primary:
Estimate Coding-Agent Cost

CTA Secondary:
Return to Leaderboard

## FAQ

Q: What is the best LLM for coding?
A: There is no single best LLM for every coding task. The best choice depends on your workflow, benchmark evidence, price, context, retry risk, and output length. Use scenario labels such as best for coding agents, best for refactor, or cheapest good-enough.

Q: Which LLM should I use for a coding agent?
A: Start with models that have public coding evidence, reliable context behavior, visible caveats, and source-backed pricing. Then estimate the cost of your agent loop. Agent tasks can become expensive through retries, tool calls, long outputs, and failed trajectories.

Q: Which model is best for repo-level refactor?
A: For repo-level refactor, prioritize editing evidence, context handling, caveat visibility, and confidence labels. A large context window helps, but it does not guarantee reliable repo-level changes. Treat unverified context or benchmark fields as partial evidence.

Q: What is the cheapest LLM for coding?
A: The cheapest LLM for coding depends on task complexity and retry rate. A low price per million tokens can be attractive, but the real cost depends on how many prompts, outputs, retries, cache reads, and human corrections the workflow needs.

Q: Should I trust coding benchmarks?
A: Trust them as evidence, not as a full product decision. Each benchmark measures a different task under a different setup. The safest approach is to compare multiple source-specific benchmark columns, then combine that with pricing, context, caveats, and your own workflow test.

Q: How should I compare Claude, GPT, Gemini, DeepSeek, Kimi, and Qwen for coding?
A: Compare them by workflow and source coverage. Separate API token pricing from subscription-tool pricing, keep benchmark sources separate, mark unknown fields, and avoid forcing all models into one absolute winner list.

Q: Can I use this page for production buying decisions?
A: Use it as a shortlist and cost-estimation aid, not a procurement guarantee. Always recheck official pricing, model availability, regional terms, and your own workload before making a production decision.

## Schema copy

Use:
- WebPage
- BreadcrumbList
- FAQPage for visible FAQs only
- ItemList only for scenario shortlists if each item has visible caveat/methodology

Do not use:
- `best overall` claims in schema
- AggregateRating
- Hidden FAQ schema
- Claims that recommendations are provider-neutral audited rankings

---

# Page 3: /coding-agent-cost-calculator

## SEO fields

Path: /coding-agent-cost-calculator
Index policy: index
Canonical: self
Primary keyword: coding agent cost calculator
Semantic keywords: AI coding cost calculator, LLM coding cost, coding model pricing, coding agent token cost, monthly AI coding cost, API model cost calculator
Target visible word count: 900–1,300 plus calculator UI
Primary density range: semantic cluster 2.0%–3.0%; do not over-repeat `calculator`

SEO Title:
Coding Agent Cost Calculator: Estimate LLM Task & Monthly Cost

Meta Description:
Estimate coding-agent task cost and monthly LLM spend from model pricing, input/output tokens, retries, cache assumptions, and task volume.

OG Title:
Coding Agent Cost Calculator

OG Description:
Turn token pricing into estimated coding-agent task cost and monthly spend with visible assumptions and caveats.

H1:
Coding Agent Cost Calculator

Subhead:
Estimate how much an AI coding workflow may cost per task and per month using model price, input tokens, output tokens, cache assumptions, retry rate, and monthly task volume.

Primary CTA:
Estimate Monthly Coding-Agent Cost

Secondary CTA:
Compare AI Coding Models

## Above-fold answer block

Token price is not the same as coding-agent cost. A real coding task can include long prompts, repository context, tool calls, generated diffs, retries, cache reads, batch or standard pricing, and failed attempts. This calculator turns visible assumptions into an estimated task cost and monthly spend.

## Hero copy blocks

Eyebrow:
From token price to task cost

Headline:
Estimate Coding-Agent Cost Before You Ship

Subhead:
Use source-backed model pricing where available, then adjust workflow assumptions: input tokens, output tokens, retry rate, cache hit rate, batch mode, and tasks per month.

CTA Primary:
Estimate My Monthly Cost

CTA Secondary:
Compare Models First

Trust strip:
Estimate, not billing quote · Assumptions visible · Unknown prices stay blank · Model rows can prefill inputs

## H2 / H3 outline

H2: Estimate coding-agent task cost
- H3: Select a model and workflow
- H3: Enter input, output, retry, cache, and volume assumptions

H2: Formula: token price is not task cost
- H3: Input, output, cache, and retry components
- H3: Monthly cost from tasks per month

H2: Sample coding workflow scenarios
- H3: Coding agent loop
- H3: Repo-level refactor
- H3: Test generation and bug fixing
- H3: Low-cost automation

H2: Sensitivity range and uncertainty
- H3: Retry rate changes cost faster than users expect
- H3: Unknown pricing should not be silently filled

H2: When to return to the leaderboard
- H3: Compare alternatives before committing
- H3: Use caveats to lower risk

H2: FAQ

## Final copy blocks

### Section: Calculator intro
Design priority: above_fold
Copy:
Start with a model from the leaderboard or choose one manually. If source-backed API prices exist, the calculator can prefill input, output, and cache prices. If exact pricing is not verified, leave the price field blank and show `not disclosed` or `source needs recheck` instead of guessing.

### Section: Input helper copy
Design priority: form_helpers
Copy:
Required inputs:
- Selected model
- Workflow type
- Average input tokens per task
- Average output or reasoning tokens per task
- Retry rate
- Tasks or sessions per month
- Cache hit assumption
- Batch or standard mode when provider supports it

Helper note:
If you do not know your token counts yet, start with conservative estimates and run a sensitivity check. The point is not fake precision. The point is to see which assumptions control monthly spend.

### Section: Formula copy
Design priority: formula_block
Copy:
Estimated task cost = input token cost + output token cost + cache write cost + cache read cost, adjusted by retry multiplier and pricing mode where applicable.

Monthly estimate = estimated task cost × tasks per month.

Show the components separately so users can see whether cost is driven by input context, output length, retries, cache behavior, or volume.

### Section: Required calculator caveat
Design priority: near_result
Copy:
This is an estimate, not a billing quote. Actual bills can change with model updates, prompt length, tool calls, retries, cache hit rate, regional availability, batch mode, taxes, provider pricing changes, and failure cleanup time.

### Section: Sensitivity copy
Design priority: result_explanation
Copy:
Small changes in retry rate can change the economics of a coding agent. A cheaper model can lose its advantage if it needs repeated fixes or produces longer failed trajectories. A more expensive model can be cheaper for the task if it finishes in fewer attempts. Always compare task cost, not only token price.

### Section: Sample scenario copy — coding agent loop
Design priority: sample_cards
Copy:
Use this scenario when your workflow reads files, calls tools, edits code, runs tests, and retries after failures. Increase input tokens for larger repository context and increase retry rate if the agent often needs multiple repair loops.

### Section: Sample scenario copy — repo refactor
Design priority: sample_cards
Copy:
Repo-level refactor usually has higher input context and output length. Check whether the selected model has source-backed context information and whether long-context reliability is only advertised or actually supported by evidence.

### Section: Sample scenario copy — low-cost automation
Design priority: sample_cards
Copy:
Low-cost automation works when the task is repetitive, the model has enough coding evidence, and retries stay low. If a model has a low token price but missing benchmark evidence, label the estimate as a cost experiment, not a safe recommendation.

### Section: Leaderboard return CTA
Design priority: conversion_bridge
Copy:
If the estimate looks too expensive or too uncertain, go back to the leaderboard and compare alternatives for the same workflow. Look for lower output price, cache support, stronger coding evidence, or lower retry risk.

CTA Primary:
Compare Alternatives on the Leaderboard

CTA Secondary:
Adjust Assumptions

## FAQ

Q: What is a coding agent cost calculator?
A: A coding agent cost calculator estimates task and monthly spend for AI coding workflows. It uses model pricing and assumptions such as input tokens, output tokens, retries, cache usage, pricing mode, and task volume. It is an estimate, not a billing quote.

Q: Why is token price not enough?
A: Token price is only the unit cost. Coding-agent cost also depends on repository context, generated diffs, retries, tool calls, cache hit rate, failures, and how many tasks you run per month. A cheap token can still create an expensive workflow.

Q: What inputs do I need for a useful estimate?
A: You need a selected model, workflow type, average input tokens, average output tokens, retry rate, tasks per month, and cache assumptions. If model pricing is not source-backed, leave those fields blank or mark them as not disclosed.

Q: How should I estimate retry rate?
A: Start with a conservative assumption and test sensitivity. Retry rate should reflect failed attempts, repair loops, tool-call corrections, and prompts needed after bad output. For uncertain models, compare several retry rates before deciding.

Q: Does the calculator include cache pricing?
A: It should include cache read and cache write pricing only when the provider publishes source-backed values and the workflow can use caching. If cache terms are unclear, show `not disclosed` or `not applicable` instead of blending it into base input price.

Q: Can this predict my exact bill?
A: No. It estimates cost under visible assumptions. Real bills can change with provider pricing, model updates, regional availability, taxes, prompt length, output length, cache behavior, and production traffic.

Q: What should I do after getting an estimate?
A: Compare at least one cheaper alternative and one safer alternative on the leaderboard. Then run a small real workload test before using the estimate for production budgeting.

## Schema copy

Use:
- WebPage
- BreadcrumbList
- FAQPage for visible FAQs only
- SoftwareApplication if the calculator is interactive and actually available

Do not use:
- FinancialProduct
- AggregateRating
- FAQ schema for hidden content
- Any property implying exact billing accuracy

---

# Cross-page CTA map

Primary journey:
/llm-leaderboard → row CTA `Calculate This Model’s Task Cost` → /coding-agent-cost-calculator?model=<model>&workflow=<workflow>

Decision journey:
/best-llm-for-coding → `Compare Source-Backed Models` → /llm-leaderboard
/best-llm-for-coding → `Estimate Coding-Agent Cost` → /coding-agent-cost-calculator

Calculator return journey:
/coding-agent-cost-calculator → `Compare Alternatives on the Leaderboard` → /llm-leaderboard
/coding-agent-cost-calculator → `Compare AI Coding Models` → /llm-leaderboard

Methodology support:
/llm-leaderboard → `Read Methodology` → /coding-model-benchmark when that P1 page exists
/best-llm-for-coding → internal link anchor `see coding benchmark methodology` → /coding-model-benchmark when complete

Pricing support:
/coding-agent-cost-calculator → internal link anchor `compare source-backed API prices` → /llm-api-pricing-comparison when complete

Fallback if P1 pages are not built:
Keep methodology and pricing explainer sections on P0 pages; do not link to empty or noindex pages from prominent CTAs.

---

# Design handoff notes

- Design must fit the answer block, filters, table helper copy, source/freshness labels, caveats, and calculator CTA above or near the first meaningful table/result.
- FAQ can be collapsible, but only visible FAQ content should be marked in FAQPage schema.
- Mobile 390px must keep source/caveat and row CTA visible. Do not hide caveats behind desktop-only hover states.
- The leaderboard table must be crawlable HTML or accessible card markup, not canvas-only.
- Unknown-state labels need their own visual treatment: `not disclosed`, `not publicly benchmarked`, `source needs recheck`, `partial evidence`.

# Implementation copy guardrails

Forbidden wording:
- best LLM overall
- most authoritative global leaderboard
- guaranteed cheapest coding model
- complete ranking
- 100% accurate ranking
- real billing quote
- official benchmark owned by AICodingPricing

Allowed wording:
- best for coding agents
- best for repo-level refactor
- candidate for low-cost automation
- source-backed pricing where available
- public benchmark evidence
- estimated task cost
- partial evidence
- not disclosed
- not publicly benchmarked

# Metadata required by Kanban

```json
{
  "page_copy_blocks": {
    "/llm-leaderboard": [
      "seo_fields",
      "above_fold_answer_block",
      "hero_copy",
      "h2_h3_outline",
      "filterable_leaderboard_intro",
      "leaderboard_table_helper_copy",
      "methodology_caveat",
      "source_freshness_block",
      "token_price_vs_task_cost",
      "cta_bridge",
      "faq",
      "schema_copy"
    ],
    "/best-llm-for-coding": [
      "seo_fields",
      "above_fold_answer_block",
      "hero_copy",
      "h2_h3_outline",
      "scenario_cards",
      "evidence_table_intro",
      "methodology_copy",
      "calculator_cta_bridge",
      "faq",
      "schema_copy"
    ],
    "/coding-agent-cost-calculator": [
      "seo_fields",
      "above_fold_answer_block",
      "hero_copy",
      "h2_h3_outline",
      "calculator_intro",
      "input_helper_copy",
      "formula_copy",
      "required_calculator_caveat",
      "sensitivity_copy",
      "sample_scenarios",
      "leaderboard_return_cta",
      "faq",
      "schema_copy"
    ]
  },
  "faq": {
    "/llm-leaderboard": 7,
    "/best-llm-for-coding": 7,
    "/coding-agent-cost-calculator": 7
  },
  "schema_copy": {
    "/llm-leaderboard": ["WebPage", "BreadcrumbList", "FAQPage", "Dataset_if_versioned_or_downloadable", "ItemList_only_with_methodology_caveat"],
    "/best-llm-for-coding": ["WebPage", "BreadcrumbList", "FAQPage", "ItemList_only_for_visible_scenario_shortlists"],
    "/coding-agent-cost-calculator": ["WebPage", "BreadcrumbList", "FAQPage", "SoftwareApplication_if_interactive"]
  },
  "cta_map": {
    "primary_flow": "/llm-leaderboard -> /coding-agent-cost-calculator?model=<model>&workflow=<workflow>",
    "decision_flow": "/best-llm-for-coding -> /llm-leaderboard -> /coding-agent-cost-calculator",
    "calculator_return_flow": "/coding-agent-cost-calculator -> /llm-leaderboard",
    "p1_methodology_flow": "/llm-leaderboard -> /coding-model-benchmark when complete",
    "p1_pricing_flow": "/coding-agent-cost-calculator -> /llm-api-pricing-comparison when complete"
  },
  "residual_risk": [
    "P0 seed dataset is partial; implementation must keep partial/not_disclosed/not_publicly_benchmarked labels visible.",
    "Exact model alias, context, speed, and pricing verification remains incomplete for some OpenAI, Gemini, Kimi, and Qwen rows.",
    "P1 methodology/pricing pages should not receive prominent CTAs until implemented with unique indexable content.",
    "Calculator defaults for task token counts should stay blank or clearly marked site_default until product approves assumptions.",
    "SEO copy depends on source/freshness UI; if design hides caveats or source labels, SEO/Product gate should block."
  ]
}
```
