# aicodingpricing.com AI Coding Model Leaderboard SEO/SERP + Data-Source Plan

Date: 2026-05-28
Task: t_980da6a3
Canonical site: https://aicodingpricing.com
Verdict: Conditional Go

## 1. Executive decision

Build the AI Coding Model Leaderboard as a core inner-page cluster under aicodingpricing.com. Do not create a separate domain and do not position it as a generic LLM leaderboard.

Why Conditional Go:

- Go: SERP has clear demand for coding-specific model selection, benchmark comparison, and API pricing/value queries.
- Go: Existing site already owns AI coding pricing / calculator intent, so leaderboard can extend the user journey from “what does it cost?” to “which model should I pick for this coding workflow?”
- Conditional: Generic `llm leaderboard` is dominated by broad benchmark/data platforms; aicodingpricing should avoid that head term as primary positioning.
- Conditional: Benchmark, speed, context, and pricing fields must be source-led. Missing data must render as `not disclosed` / `not publicly benchmarked`, not inferred.
- Conditional: The page must keep pricing and task-cost conversion in the primary UX; otherwise it becomes another low-differentiation benchmark aggregator.

## 2. SERP intent validation

### Target keyword cluster verdict

| Query | Intent | SERP pattern observed | Fit for aicodingpricing | Verdict |
|---|---|---|---|---|
| `ai coding model leaderboard` | Coding-specific ranking / model selection | Kilo leaderboard, Onyx best LLM for coding, LM Arena code/webdev, LiveBench, Scale Labs | Strong fit if framed as coding workflow + cost/value, not generic intelligence | P0 target |
| `best llm for coding` | Recommendation / comparison | Blog-style rankings, Onyx benchmark pages, developer articles, YouTube/local LLM content | Strong fit with scenario filters and source-led evidence | P0 target |
| `coding model benchmark` | Benchmark evidence / methodology | Artificial Analysis Coding Index, Kilo, Onyx, benchmark explainers, SWE/Terminal/LiveCodeBench references | Fit as methodology/supporting page, not just a table | P1 target |
| `llm api pricing comparison` | Pricing table / cost comparison | CostGoat, CloudZero, Cloudidr, official pricing comparisons, broad LLM cost guides | Existing site fit is strong; should remain pricing-intent anchor | P0/P1 target |
| `cheapest coding model` | Budget model / cost-performance | Kilo low-cost article, OpenCode Go, Reddit/HN, Medium tests | Fit if page ties cheap models to coding capability and caveats | P1 target |
| `claude vs gpt coding` | Comparison / decision support | Blog comparisons, tool-vs-tool articles, social/forum results | Fit if source-led and workflow-specific, but avoid unsupported winner claims | P1 target |

### SERP opportunity

The opportunity is not “make a better generic leaderboard.” The opportunity is a decision-engine page that combines:

1. coding benchmark evidence,
2. API/subscription pricing,
3. context/cache/batch economics,
4. speed/latency if public,
5. task-cost calculator handoff,
6. workflow recommendations for coding agents, refactors, frontend generation, bug fixing, code review, test generation, and Chinese coding workflows.

### SERP risks

- Broad leaderboard SERPs already have specialized authorities: LM Arena, Artificial Analysis, LiveBench, SWE-bench, Kilo, Onyx, Scale Labs, OpenRouter-style directories.
- `llm leaderboard` and generic model ranking terms are too broad and authority-heavy for P0.
- Pages that claim one universal “best model” without benchmark comparability will look thin/untrustworthy and may trigger AI citation distrust.
- Google and AI search will prefer pages that show source, last checked date, methodology, caveats, and direct answers. A black-box score without citation is a risk.

## 3. URL matrix, canonical / alias / redirect / index policy

Principle: keep one canonical leaderboard page for the cluster, avoid duplicate near-synonym pages, and use comparison/supporting pages only when they have unique intent and unique content.

| Proposed URL | Role | Primary keyword | Index policy | Canonical | Alias / redirect policy | Notes |
|---|---|---|---|---|---|---|
| `/llm-leaderboard` | P0 core page | `ai coding model leaderboard` | index | self | Keep as canonical if brief requires this as first priority; title must clarify “for coding” | Main leaderboard + recommender + cost handoff. Do not use generic H1 “LLM Leaderboard” alone. |
| `/ai-coding-model-leaderboard` | Alias / possible landing | `ai coding model leaderboard` | noindex or 301 | `/llm-leaderboard` | Prefer 301 to `/llm-leaderboard` unless product chooses this as canonical | Cleaner keyword match, but duplicate risk. If chosen as canonical, redirect `/llm-leaderboard` here instead. Pick one only. |
| `/best-llm-for-coding` | P0 decision page | `best llm for coding` | index | self | No redirect | Editorial/decision page: best by workflow, not same table copy. Internally links to `/llm-leaderboard` and calculator. |
| `/coding-agent-cost-calculator` | P0 conversion page | `coding agent cost calculator` | index | self | No redirect | Calculator page; must connect leaderboard model choice to monthly task cost. |
| `/coding-model-benchmark` | P1 methodology hub | `coding model benchmark` | index if unique methodology content exists; otherwise noindex until complete | self | No redirect | Explain SWE-bench, Terminal-Bench, LiveCodeBench, Aider, Artificial Analysis. No fake composite score. |
| `/llm-api-pricing-comparison` | P1 pricing hub | `llm api pricing comparison` | index | self | No redirect | Strong site fit. Should expand beyond coding tools to coding-relevant API models and link to calculator. |
| `/cheapest-coding-model` | P1 value page | `cheapest coding model` | index after data table exists | self | No redirect | Rank by cost bucket + benchmark threshold, not price alone. |
| `/claude-vs-gpt-for-coding` | P1 comparison page | `claude vs gpt coding` | index | self | Alias `/claude-vs-chatgpt-coding` can 301 or canonical here | Must compare API models and coding-tool subscriptions separately. Avoid absolute winner. |
| `/kimi-vs-qwen-vs-deepseek-coding` | P2 China/open-model comparison | `kimi vs qwen vs deepseek coding` | index only if data coverage is sufficient; otherwise noindex | self | No redirect | Good for Chinese coding workflow angle. Needs public benchmark/pricing/context data. |
| `/models/{model}` | P2 model detail pages | `{model} pricing`, `{model} coding benchmark`, `{model} context window` | index only for models with unique data + source links | self | stale/empty model pages noindex | Each page needs pricing, benchmark references, context, caveats, calculator CTA. Avoid doorway pages. |

Recommended canonical decision:

- If preserving the owner brief exactly: keep `/llm-leaderboard` as canonical, but Title/H1 must include coding qualifier.
- SEO-clean alternative: use `/ai-coding-model-leaderboard` as canonical and 301 `/llm-leaderboard` to it. This is stronger keyword alignment but changes the owner’s proposed P0 priority.
- Do not index both `/llm-leaderboard` and `/ai-coding-model-leaderboard` with similar content.

## 4. Page-level SEO requirements

### P0: `/llm-leaderboard`

Recommended title:
AI Coding Model Leaderboard: Compare Coding LLMs by Cost & Benchmarks

Recommended meta description:
Compare AI coding models by public coding benchmarks, API pricing, context window, speed signals, and estimated task cost for coding agents and developer workflows.

H1:
AI Coding Model Leaderboard for Coding Agents and Developer Workflows

Above-fold blocks:

- Short answer, 40-70 words:
  “The best AI coding model depends on your workflow, not a single universal score. Use this leaderboard to compare public coding benchmark results, API pricing, context window, speed signals, and estimated task cost across models for coding agents, frontend generation, repo refactors, bug fixing, code review, and test generation.”
- Filter chips: Best coding, cheapest good-enough, best long context, best agent model, best Chinese coding workflow, best open/low-cost model.
- Primary CTA: “Estimate task cost” -> `/coding-agent-cost-calculator`.
- Secondary CTA: “Read methodology” -> `/coding-model-benchmark`.

Required H2 structure:

1. Best AI coding models by workflow
2. Coding benchmark sources and what each benchmark measures
3. API price, cache price, and task-cost comparison
4. Context window and long-context caveats
5. Speed and latency signals
6. How to choose a model for coding agents
7. FAQ

Indexable content requirement:

- 800+ words minimum.
- At least 3 H2 and 2 H3; recommended more due comparison intent.
- Table must be crawlable HTML, not canvas-only.
- Every row must include source, last checked date, and confidence.
- Include update date near table.
- Include methodology/caveat block above or below table.

### P0: `/best-llm-for-coding`

Intent: recommendation page for people who do not want a raw leaderboard.

Required modules:

- Short answer: “There is no single best LLM for all coding tasks…”
- Scenario sections:
  - best for coding agents,
  - best for repo-level refactor,
  - best for frontend generation,
  - best cheap model,
  - best long-context model,
  - best Chinese coding workflow,
  - best open/low-cost model if supported.
- Evidence table per scenario: benchmark/source, price, context, caveat.
- Internal links to `/llm-leaderboard`, `/coding-agent-cost-calculator`, `/coding-model-benchmark`, `/llm-api-pricing-comparison`.

Avoid:

- “X is the best coding model” as a universal claim.
- Recommendations without source/caveat.

### P0: `/coding-agent-cost-calculator`

Intent: conversion and differentiation.

Required modules:

- Input fields:
  - selected model,
  - workflow type,
  - average trajectory input tokens,
  - average output/reasoning tokens where available,
  - retry rate,
  - sessions/tasks per month,
  - cache hit assumption,
  - batch/standard mode toggle where provider supports it.
- Output:
  - estimated task cost,
  - estimated monthly cost,
  - sensitivity range,
  - caveat: estimate, not billing quote.
- Internal link from each leaderboard row: “Calculate this model’s task cost.”

### P1: `/coding-model-benchmark`

Intent: methodology and trust.

Required modules:

- Explain benchmark differences:
  - SWE-bench: real GitHub issue resolution; reports % resolved and avg cost in official leaderboard.
  - Aider polyglot: code editing benchmark across multiple languages; reports percent correct, cost, edit format, malformed rate, time.
  - LiveCodeBench: contamination-aware coding problems from LeetCode/AtCoder/Codeforces; useful for code generation/self-repair/test output prediction.
  - LiveBench: broader benchmark with coding and agentic coding averages; contamination-resistant with regular refresh.
  - Artificial Analysis Coding Index: combines Terminal-Bench Hard and SciCode in its Coding Index; includes cost breakdown and token composition.
  - LM Arena code/webdev: preference-based arena; good for webdev preference signal, not deterministic task success.
- Explain why scores should not be merged blindly.
- Provide a transparent mapping from source benchmark to page fields.

### P1: `/llm-api-pricing-comparison`

Intent: defend and expand pricing ownership.

Required modules:

- Official pricing source table by provider.
- Separate input, cached input, cache write, output, batch/flex/priority pricing.
- Last checked dates.
- “Pricing is not task cost” explainer.
- CTA to calculator.

### P1: `/cheapest-coding-model`

Intent: low-cost coding model discovery.

Required modules:

- Cheapest by input/output price.
- Cheapest that clears minimum coding evidence threshold.
- Cheapest for high-volume agent tasks.
- Caveats: cheap model may increase retry count, output tokens, failed trajectories, and human review time.

### P1: `/claude-vs-gpt-for-coding`

Intent: comparison with high search demand but risky claims.

Required modules:

- Split API model comparison from subscription tool comparison.
- Compare benchmark sources side-by-side, not a single winner.
- Compare context/cache/pricing/task cost.
- Use “better for X” format:
  - complex repo refactor,
  - frontend/UI generation,
  - fast iteration,
  - long-context review,
  - budget-sensitive agent loops.

## 5. Public data-source policy

Hard rule: data table fields must have `source_url`, `source_name`, `last_checked`, `confidence`, and `update_policy`. Unknown values are displayed as `not disclosed` or `not publicly benchmarked`.

### Field policy

| Field | Acceptable sources | Display rule | Update cadence | Confidence default |
|---|---|---|---|---|
| model | provider docs, benchmark pages, API docs | exact model name + API name if available | weekly for P0 | high if official provider, medium if third-party |
| provider | official provider docs | provider name | weekly | high |
| coding score | SWE-bench, Aider, LiveCodeBench, LiveBench, Artificial Analysis, LM Arena code/webdev, Kilo usage where labeled | keep separate by benchmark; do not force into one fake score | weekly/monthly depending source | high for official benchmark, medium for aggregator |
| input price | official provider pricing first; OpenRouter only if provider official unavailable and labeled | USD / 1M tokens, standard mode; include batch/flex separately | weekly | high official, medium aggregator |
| output price | official provider pricing first | USD / 1M tokens; clarify reasoning/thinking inclusion if provider states it | weekly | high official |
| cache price | official provider docs only | cache read/write separated; unknown if not disclosed | weekly | high official |
| context | official model docs/provider pricing page | max advertised context; include effective long-context caveat separately | monthly or release-triggered | high official |
| speed | Artificial Analysis, public benchmark pages, provider docs if available | TTFT/tokens/sec only if source publishes it; otherwise `not disclosed` | weekly/monthly | medium unless independently measured |
| task cost | internal calculator using source pricing + disclosed token assumptions | estimate only; show formula and assumptions | generated on page | medium; depends on assumptions |
| best for | derived editorial label | must cite evidence fields used | manual review | medium |
| caveat | source limitations, missing data, model/tool limits | required for every row | every update | high |

### Recommended benchmark sources

Use as source candidates, not as instructions to copy rankings blindly:

- SWE-bench official leaderboard: real-world GitHub issue resolution, `% Resolved`, avg cost, agent/harness, date.
- Aider LLM Leaderboards: polyglot code editing benchmark; percent correct, edit format, cost, malformed output, seconds/case.
- LiveCodeBench: contamination-free coding evaluation from recent programming contest problems; useful for code generation/self-repair/test-output prediction.
- LiveBench: contamination-resistant benchmark with coding and agentic coding categories.
- Artificial Analysis Coding Index: Coding Index currently references Terminal-Bench Hard and SciCode, with cost/token breakdown.
- LM Arena code/webdev: human preference / arena score for webdev coding; useful as preference signal, not deterministic correctness.
- Kilo leaderboard: real-world Kilo Code usage by mode plus linked performance/pricing sources; useful as usage signal, not objective benchmark.
- Onyx best LLM for coding: competitor/reference page; useful for SERP structure and benchmark mix, not primary source of truth.

### Recommended pricing sources

Primary official sources:

- OpenAI API pricing: https://openai.com/api/pricing/
- Anthropic Claude API pricing: https://docs.anthropic.com/en/docs/about-claude/pricing
- Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing
- DeepSeek API pricing: https://api-docs.deepseek.com/quick_start/pricing
- Kimi/Moonshot API pricing: https://platform.moonshot.ai/docs/pricing
- Add Qwen/Alibaba Cloud, xAI, Mistral, OpenRouter only when specific models enter the P0 table.

Pricing rules:

- Official provider docs win over aggregators.
- OpenRouter pricing can be shown only as route-specific pricing and must not be treated as official provider API pricing.
- Subscription tool pricing (Claude Code, Codex, Cursor, Copilot) must be separated from API model token pricing.
- Cache and batch pricing must be separate columns, not blended into base price.
- If a provider says preview/regional/priority/flex pricing varies, display the mode explicitly.

## 6. Schema / GEO / AEO requirements

### Schema suggestions

Use JSON-LD where appropriate:

- `WebPage` for all core pages.
- `BreadcrumbList` for navigation hierarchy.
- `FAQPage` for visible FAQ only.
- `Dataset` for leaderboard table if downloadable/exportable or versioned.
- `ItemList` for ranked/filtered model lists, with caveat that ranking is by selected methodology/filter.
- `SoftwareApplication` only for the calculator page if the calculator is an actual interactive tool.
- `HowTo` is optional for “how to choose a coding model” if written as steps.

Avoid:

- Fake `AggregateRating` or review schema.
- Claiming first-party benchmark ownership in schema.
- Marking hidden FAQ content that users cannot see.

### AI citation blocks

Each core page should include:

- Definition block: “What is an AI coding model leaderboard?”
- Short answer block: 40-70 words near top.
- Methodology block: source list + scoring rules + caveats.
- Data freshness block: last updated, last checked per source.
- FAQ block with direct answers:
  - What is the best LLM for coding agents?
  - What is the cheapest coding model?
  - Is a coding benchmark the same as real task cost?
  - Why do benchmark rankings disagree?
  - How should I compare Claude, GPT, Gemini, DeepSeek, Kimi, and Qwen for coding?

## 7. Internal linking and conversion flow

Primary flow:

`/llm-leaderboard` -> model row CTA -> `/coding-agent-cost-calculator` -> existing pricing/coding tool pages -> conversion.

Supporting flow:

- `/best-llm-for-coding` -> `/llm-leaderboard` and `/coding-agent-cost-calculator`
- `/coding-model-benchmark` -> `/llm-leaderboard`
- `/llm-api-pricing-comparison` -> `/coding-agent-cost-calculator`
- `/models/{model}` -> calculator prefilled with model assumptions

Anchor text examples:

- “compare AI coding model pricing and benchmark sources”
- “estimate coding agent task cost for this model”
- “see coding benchmark methodology”
- “compare Claude and GPT for coding workflows”

Avoid generic anchors like “click here.”

## 8. SEO-Copy Freeze inputs for design/product

### `/llm-leaderboard`

- primary keyword: `ai coding model leaderboard`
- semantic keywords: best llm for coding, coding model benchmark, coding agent model, llm coding benchmark, ai coding model comparison, coding model cost
- target words: 1,200-1,800
- required visible blocks: short answer, filterable leaderboard, methodology caveat, source/freshness block, task-cost CTA, FAQ
- FAQ count: 5-7
- indexable: yes

### `/best-llm-for-coding`

- primary keyword: `best llm for coding`
- semantic keywords: best ai model for coding, best coding llm, best model for coding agents, best llm for code review, best llm for frontend generation
- target words: 1,200-1,600
- required visible blocks: scenario recommendations, evidence table, no-single-best caveat, calculator CTA, FAQ
- indexable: yes

### `/coding-agent-cost-calculator`

- primary keyword: `coding agent cost calculator`
- semantic keywords: ai coding cost calculator, llm coding cost, coding model pricing, coding agent token cost, monthly ai coding cost
- target words: 900-1,300 plus calculator UI
- required visible blocks: calculator, assumptions, formula, sample scenarios, caveats, FAQ
- indexable: yes

### `/coding-model-benchmark`

- primary keyword: `coding model benchmark`
- semantic keywords: llm coding benchmark, swe-bench, aider leaderboard, livecodebench, terminal-bench, artificial analysis coding index
- target words: 1,000-1,500
- required visible blocks: benchmark comparison table, methodology caveat, contamination/comparability explanation, FAQ
- indexable: yes if unique content exists

### `/llm-api-pricing-comparison`

- primary keyword: `llm api pricing comparison`
- semantic keywords: llm api cost, openai api pricing, claude api pricing, gemini api pricing, deepseek api pricing, cheapest llm api
- target words: 1,000-1,500
- required visible blocks: pricing table, official source links, cache/batch explanation, task-cost CTA, FAQ
- indexable: yes

## 9. Implementation guardrails

- Add `last_checked` to every source-backed field.
- Add `source_confidence`: high / medium / low.
- Add `data_status`: available / not disclosed / not publicly benchmarked / stale.
- Do not show an empty model detail page in sitemap.
- Do not include noindex pages in sitemap.
- Add canonical self-reference for every indexable page.
- Add `og:title`, `og:description`, `og:url`, `twitter:card` for core pages.
- Add an SEO audit gate before deploy covering title, meta description, canonical, H1/H2/H3, word count, sitemap inclusion, noindex, image alt, social meta, schema presence, source/freshness block, and crawlable table content.

## 10. Risks and mitigations

| Risk | Severity | Mitigation |
|---|---:|---|
| Duplicate `/llm-leaderboard` and `/ai-coding-model-leaderboard` | High | Pick one canonical; redirect or noindex the other. |
| Generic leaderboard competition too strong | High | Position as AI coding workflow + cost/value decision engine. |
| Fabricated or stale benchmark/pricing data | High | Source-led schema, last_checked, confidence, unknown states, audit checks. |
| Fake composite score reduces trust | High | Keep benchmark columns separate; derive editorial recommendations with caveats. |
| Pricing intent diluted | Medium | Put calculator CTA in every leaderboard row and above fold. |
| Thin programmatic model pages | Medium | Only index model pages with unique sourced data and answer blocks. |
| AI Overview/citation pulls wrong claims | Medium | Use direct answer blocks, caveats, source labels, update dates. |

## 11. Next inputs needed for downstream product/dev/content

1. Choose canonical URL: `/llm-leaderboard` vs `/ai-coding-model-leaderboard`.
2. Confirm P0 model list: recommended 12-20 models max for first release.
3. Confirm source set for P0 benchmark fields.
4. Provide existing repository/data model path for aicodingpricing.
5. Confirm whether calculator can prefill model assumptions from leaderboard rows.
6. Decide whether `/models/{model}` pages ship in V1 or remain noindex until data depth is sufficient.
7. Confirm update cadence owner: manual weekly update vs semi-automated source refresh.

## 12. Recommended P0 build order

1. `/llm-leaderboard` canonical page with source-led table and calculator CTA.
2. `/coding-agent-cost-calculator` integrated with leaderboard row assumptions.
3. `/best-llm-for-coding` decision page using the same data source.
4. `/coding-model-benchmark` methodology page.
5. `/llm-api-pricing-comparison` expansion page.

## 13. Metadata snapshot

```json
{
  "project_slug": "aicodingpricing-leaderboard",
  "canonical_site": "aicodingpricing.com",
  "seo_verdict": "Conditional Go",
  "primary_keyword": "ai coding model leaderboard",
  "target_keywords": [
    "ai coding model leaderboard",
    "best llm for coding",
    "coding model benchmark",
    "llm api pricing comparison",
    "cheapest coding model",
    "claude vs gpt coding",
    "coding agent cost calculator"
  ],
  "url_matrix": [
    {"url": "/llm-leaderboard", "role": "P0 core", "index_policy": "index", "canonical": "self"},
    {"url": "/ai-coding-model-leaderboard", "role": "alias", "index_policy": "301_or_noindex", "canonical": "/llm-leaderboard"},
    {"url": "/best-llm-for-coding", "role": "P0 decision", "index_policy": "index", "canonical": "self"},
    {"url": "/coding-agent-cost-calculator", "role": "P0 conversion", "index_policy": "index", "canonical": "self"},
    {"url": "/coding-model-benchmark", "role": "P1 methodology", "index_policy": "index_if_unique", "canonical": "self"},
    {"url": "/llm-api-pricing-comparison", "role": "P1 pricing", "index_policy": "index", "canonical": "self"},
    {"url": "/cheapest-coding-model", "role": "P1 value", "index_policy": "index_after_data", "canonical": "self"},
    {"url": "/claude-vs-gpt-for-coding", "role": "P1 comparison", "index_policy": "index", "canonical": "self"},
    {"url": "/kimi-vs-qwen-vs-deepseek-coding", "role": "P2 comparison", "index_policy": "index_if_data_sufficient", "canonical": "self"},
    {"url": "/models/{model}", "role": "P2 model detail", "index_policy": "index_only_with_unique_sourced_data", "canonical": "self"}
  ],
  "data_sources": [
    "SWE-bench official leaderboard",
    "Aider LLM Leaderboards",
    "LiveCodeBench",
    "LiveBench",
    "Artificial Analysis Coding Index",
    "LM Arena code/webdev leaderboard",
    "Kilo coding model leaderboard as usage signal",
    "Official provider API pricing pages: OpenAI, Anthropic, Google Gemini, DeepSeek, Kimi/Moonshot; add Qwen/xAI/Mistral/OpenRouter as models enter scope"
  ],
  "index_policy": "Only index pages with unique intent, crawlable content, self canonical, sitemap inclusion, source/freshness blocks, and no noindex. Redirect or noindex duplicate aliases.",
  "risks": [
    "generic leaderboard SERP authority is high",
    "duplicate URL risk between /llm-leaderboard and /ai-coding-model-leaderboard",
    "benchmark incomparability",
    "stale or fabricated pricing/benchmark data",
    "pricing intent dilution if calculator flow is weak",
    "thin model detail pages"
  ],
  "next_inputs": [
    "canonical URL choice",
    "P0 model list",
    "P0 benchmark source set",
    "existing repo/data model path",
    "calculator prefill capability",
    "V1 decision for /models/{model}",
    "source update cadence owner"
  ],
  "handoff_path": "/root/.hermes/reports/aicodingpricing-leaderboard-20260528/seo-serp-plan.md",
  "next_assignee": "moce"
}
```
