The idea
A data engineer's portfolio should be queryable, not just readable. This project publishes the site's own metadata — articles, projects and activity aggregates — as a small, documented public dataset and lets anyone run SQL over it directly in the browser. No backend, no API keys: the engine is DuckDB-WASM, compiled to WebAssembly and running entirely client-side.
How it works
A build step exports the content registry to versioned JSON/CSV/Parquet tables
plus a catalog.json describing every table and column. On the page, DuckDB-WASM
boots in a Web Worker, registers the published files and runs queries against them
locally:
SELECT tag, COUNT(*) AS mentions
FROM (
SELECT UNNEST(STRING_SPLIT(tags, '|')) AS tag FROM articles
UNION ALL
SELECT UNNEST(STRING_SPLIT(tags, '|')) AS tag FROM projects
)
GROUP BY tag
ORDER BY mentions DESC;Results render as a table and a quick bar chart, EXPLAIN shows the query plan,
and the current query is encoded into the URL hash so any query is shareable.
Trade-offs
- WASM vs. a hosted query API. WASM keeps the whole thing static, free to host and private (queries never leave the browser) at the cost of a one-time engine download.
- Curated dataset vs. raw events. Only public, aggregated metadata is exposed — no private events or raw API payloads — so the dataset stays small, documented and reproducible.