TOPZLE — The fastest visual Wikipedia list engine
TOPZLE turns Wikipedia tables/lists into clean, fast, searchable pages with auto-charts, timelines, choropleths, and ranked lists — without writing custom chart logic per page. The hard part isn’t rendering one page — it’s keeping thousands of pages fresh, deterministic, and stable at scale.
Lists → Collages → Charts (automatically)
Wikipedia tables are messy: nested headers, mixed units, year ranges, currencies, and inconsistent naming. TOPZLE normalizes the data, infers chart intent, and renders the best visualization per page — fast, stable, and SEO-ready.
1) Problem statement
Most “list sites” fail in one of two ways: they either hardcode pages manually, or their ingestion becomes inconsistent, expensive, and impossible to reason about over time. TOPZLE’s goal is a repeatable publishing system: scrape/normalize structured data and render it using reusable chart primitives — not one-off chart code per page.
2) Data sources & ingestion
TOPZLE is primarily built on Wikipedia list & table data (structured HTML tables and list pages). The platform’s ingestion layer extracts: titles, slugs, table headers, rows, thumbnail/collage images (when present), and link metadata — then converts it into a stable internal model for rendering.
A. Source signals kept for trust
- Provenance: preserve where a value came from and when it was last refreshed.
- Stable identity: stable IDs per entity/page/row so re-fetching doesn’t drift.
- Schema evolution: allow new metrics/columns without breaking older pages.
The product assumption is simple: “tables are a dataset, not a screenshot”. Once you treat Wikipedia tables as datasets, correctness, determinism, and normalization become the core engineering work.
3) How it works: end-to-end SSR (Request → Response)
TOPZLE ships static assets plus an SSR Worker that renders either the homepage/search grid or a dynamic wiki page. SSR gives fast first paint + SEO-grade HTML, while still allowing charts to progressively enhance in the browser.
A. Homepage vs Search vs Page
- Home mode: shows pinned collages + trending buckets (deduped).
- Search mode: renders grid results for
?q=.... - Dynamic page: renders the selected Wikipedia-derived dataset and chart shell.
4) How it works: Auto-Charts (client pipeline)
After SSR loads the page, TOPZLE conditionally enhances the content: it embeds a compact payload of headers/rows, then the client boot code infers columns and selects the right chart renderer. D3 loads only when needed.
A. Inference rules (why this works at scale)
- Detects nameCol (most text), yearCol (year/range/date-like), and strong numeric columns.
- Ignores non-metrics like “Rank/Peak” and other noise columns.
- Handles nested headers via breadcrumb labels (“Parent — Child”).
- Parses currencies, percents, and suffixes (K/M/B) conservatively.
5) Chart modules (file-by-file)
Common modules:
src/charts/bars.ts,
src/charts/multiKeyBars.ts,
src/charts/timeline.ts,
src/charts/choropleth.ts,
src/charts/rankList.ts
A. bars.ts
A deterministic single-metric ranked bar chart. Works when the dataset has a clear “name + value” shape. Designed to remain stable even when labels are long or values include formatting noise.
B. multiKeyBars.ts
Multi-metric comparison renderer (one entity, many numeric columns). Detects currency/percent per column and tolerates messy cells (including ranges) by taking the first numeric token consistently.
C. timeline.ts
Timeline renderer for year-based datasets, including year ranges (e.g., “2018–19”). Groups cards by year and keeps ordering strict — avoids “pretty but wrong” interpolation.
D. choropleth.ts
Choropleth renderer that only activates when region matching is strong enough (>1 region matched). Normalizes region keys and uses aliasing to handle naming mismatches between datasets and geojson keys.
E. rankList.ts
A clean ranked list module (Gold/Silver/Bronze) optimized for scannability. Deterministic ordering with stable tie rules reduces churn across builds.
6) Parsing & normalization stack
The real work is converting “Wikipedia table HTML” into clean rows with predictable types. TOPZLE’s parsing layer is designed to survive: weird encodings (mojibake), nested headers, mixed units, and partial data.
- Fixes common mojibake + HTML entity artifacts.
- Normalizes whitespace + punctuation noise.
- Handles row/col spans and nested header grids.
- Produces labels like “Parent — Child” for stability.
- Finds year/range/date-like columns.
- Finds strong numeric metrics; ignores “Rank/Peak”.
- Currency & percent hints per column.
- Ranges become “first numeric token” for determinism.
- Timeline if years exist.
- MultiKey bars if multiple metrics exist.
- Choropleth only if region match is strong.
7) Performance & SEO system
A. Performance
- Minimal JS on list pages; charts load only when needed.
- Lazy images + CLS-safe thumbnails with explicit dimensions.
- Bounded work: inference and rendering is constrained per page to avoid UI stalls.
B. SEO
- Semantic HTML + accessible landmarks.
- OpenGraph/Twitter tags per page.
- Canonical URLs, strong title/description rules.
- JSON-LD surfaces: WebSite + ItemList (featured/pinned).
8) Engineering constraints solved
A. Determinism across builds
TOPZLE enforces deterministic transforms and stable serialization so rebuilds don’t reshuffle ranks or reorder rows due to minor parsing noise. This reduces SEO churn and makes debugging tractable.
B. Incremental updates without drift
Scraped sources evolve. The ingestion and parsing boundaries are designed for schema evolution and safer fallbacks so source changes degrade gracefully instead of breaking pages.
C. Expressive charts without per-page custom code
The module system makes it cheap to add new chart primitives and apply them broadly. Pages are “dataset + renderer decision”, not hand-built visualizations.
9) Status & roadmap
TOPZLE is designed as a long-term platform. The next stages are focused on expanding chart primitives, improving provenance display (refresh time and source signals in-page), and strengthening “related lists” discovery without sacrificing performance.
For technical or partnership discussions related to TOPZLE, reach out via the Azonova contact form.