Case Study

TOPZLE — The fastest visual Wikipedia list engine

TOPZLE turns Wikipedia tables/lists into clean, fast, searchable pages with auto-charts, timelines, choropleths, and ranked lists — without writing custom chart logic per page. The hard part isn’t rendering one page — it’s keeping thousands of pages fresh, deterministic, and stable at scale.

Astro + SSR Worker Auto-charts Parsing & inference SEO scale Deterministic builds

Visit TOPZLE Privacy policy

Lists → Collages → Charts (automatically)

Wikipedia tables are messy: nested headers, mixed units, year ranges, currencies, and inconsistent naming. TOPZLE normalizes the data, infers chart intent, and renders the best visualization per page — fast, stable, and SEO-ready.

On this page

1) Problem statement
2) Data sources & ingestion
3) How it works: SSR request → response
4) How it works: Auto-charts client pipeline
5) Chart modules (file-by-file)
6) Parsing & normalization stack
7) Performance & SEO system
8) Engineering constraints solved
9) Status & roadmap

1) Problem statement

Most “list sites” fail in one of two ways: they either hardcode pages manually, or their ingestion becomes inconsistent, expensive, and impossible to reason about over time. TOPZLE’s goal is a repeatable publishing system: scrape/normalize structured data and render it using reusable chart primitives — not one-off chart code per page.

2) Data sources & ingestion

TOPZLE is primarily built on Wikipedia list & table data (structured HTML tables and list pages). The platform’s ingestion layer extracts: titles, slugs, table headers, rows, thumbnail/collage images (when present), and link metadata — then converts it into a stable internal model for rendering.

A. Source signals kept for trust

Provenance: preserve where a value came from and when it was last refreshed.
Stable identity: stable IDs per entity/page/row so re-fetching doesn’t drift.
Schema evolution: allow new metrics/columns without breaking older pages.

The product assumption is simple: “tables are a dataset, not a screenshot”. Once you treat Wikipedia tables as datasets, correctness, determinism, and normalization become the core engineering work.

3) How it works: end-to-end SSR (Request → Response)

TOPZLE ships static assets plus an SSR Worker that renders either the homepage/search grid or a dynamic wiki page. SSR gives fast first paint + SEO-grade HTML, while still allowing charts to progressively enhance in the browser.

Request → SSR Worker → Page render → HTML

SSR

User

Visits / or /{lang}/{slug}

SSR Worker

Astro SSR runtime

Page renderer

index / dynamic slug

API fetch

search / pinned / page data

HTML response

SEO + fast paint

A. Homepage vs Search vs Page

Home mode: shows pinned collages + trending buckets (deduped).
Search mode: renders grid results for ?q=....
Dynamic page: renders the selected Wikipedia-derived dataset and chart shell.

4) How it works: Auto-Charts (client pipeline)

After SSR loads the page, TOPZLE conditionally enhances the content: it embeds a compact payload of headers/rows, then the client boot code infers columns and selects the right chart renderer. D3 loads only when needed.

AutoCharts → Lazy D3 → Infer → Render

Client

AutoCharts shell

Embeds compact dataset

Boot script

Loads only on pages with tables

Lazy D3

Download only when chart needed

Inference

name/year/value + formats

Chart render

bars / multiKey / timeline / map

A. Inference rules (why this works at scale)

Detects nameCol (most text), yearCol (year/range/date-like), and strong numeric columns.
Ignores non-metrics like “Rank/Peak” and other noise columns.
Handles nested headers via breadcrumb labels (“Parent — Child”).
Parses currencies, percents, and suffixes (K/M/B) conservatively.

5) Chart modules (file-by-file)

Common modules: src/charts/bars.ts, src/charts/multiKeyBars.ts, src/charts/timeline.ts, src/charts/choropleth.ts, src/charts/rankList.ts

A. bars.ts

A deterministic single-metric ranked bar chart. Works when the dataset has a clear “name + value” shape. Designed to remain stable even when labels are long or values include formatting noise.

B. multiKeyBars.ts

Multi-metric comparison renderer (one entity, many numeric columns). Detects currency/percent per column and tolerates messy cells (including ranges) by taking the first numeric token consistently.

C. timeline.ts

Timeline renderer for year-based datasets, including year ranges (e.g., “2018–19”). Groups cards by year and keeps ordering strict — avoids “pretty but wrong” interpolation.

D. choropleth.ts

Choropleth renderer that only activates when region matching is strong enough (>1 region matched). Normalizes region keys and uses aliasing to handle naming mismatches between datasets and geojson keys.

E. rankList.ts

A clean ranked list module (Gold/Silver/Bronze) optimized for scannability. Deterministic ordering with stable tie rules reduces churn across builds.

6) Parsing & normalization stack

The real work is converting “Wikipedia table HTML” into clean rows with predictable types. TOPZLE’s parsing layer is designed to survive: weird encodings (mojibake), nested headers, mixed units, and partial data.

Normalization pipeline

Data

1) Decode + clean text

UTF-8 + entity cleanup

Fixes common mojibake + HTML entity artifacts.
Normalizes whitespace + punctuation noise.

2) Flatten headers

spans → breadcrumbs

Handles row/col spans and nested header grids.
Produces labels like “Parent — Child” for stability.

3) Infer columns

name / year / numeric

Finds year/range/date-like columns.
Finds strong numeric metrics; ignores “Rank/Peak”.

4) Parse values

currency / % / KMB

Currency & percent hints per column.
Ranges become “first numeric token” for determinism.

5) Render decision

best-fit chart

Timeline if years exist.
MultiKey bars if multiple metrics exist.
Choropleth only if region match is strong.

7) Performance & SEO system

A. Performance

Minimal JS on list pages; charts load only when needed.
Lazy images + CLS-safe thumbnails with explicit dimensions.
Bounded work: inference and rendering is constrained per page to avoid UI stalls.

B. SEO

Semantic HTML + accessible landmarks.
OpenGraph/Twitter tags per page.
Canonical URLs, strong title/description rules.
JSON-LD surfaces: WebSite + ItemList (featured/pinned).

8) Engineering constraints solved

A. Determinism across builds

TOPZLE enforces deterministic transforms and stable serialization so rebuilds don’t reshuffle ranks or reorder rows due to minor parsing noise. This reduces SEO churn and makes debugging tractable.

B. Incremental updates without drift

Scraped sources evolve. The ingestion and parsing boundaries are designed for schema evolution and safer fallbacks so source changes degrade gracefully instead of breaking pages.

C. Expressive charts without per-page custom code

The module system makes it cheap to add new chart primitives and apply them broadly. Pages are “dataset + renderer decision”, not hand-built visualizations.

9) Status & roadmap

TOPZLE is designed as a long-term platform. The next stages are focused on expanding chart primitives, improving provenance display (refresh time and source signals in-page), and strengthening “related lists” discovery without sacrificing performance.

For technical or partnership discussions related to TOPZLE, reach out via the Azonova contact form.