About

Aleksandr Andreev

Lead Data Engineer with 9+ years building streaming, analytics and lakehouse platforms at production scale.

I work across the full data platform surface area: streaming pipelines, batch compute, table formats, orchestration, data quality and the tooling teams need to move faster without losing discipline.

In recent years I have also been building internal AI tooling, especially review and knowledge-assist systems grounded in real documentation rather than generic prompts.

This site is both a portfolio and a writing lab: essays, compact notes and case studies about systems that are interesting because they have trade-offs.

Highlights

Built and scaled data platforms for analytics and near real-time decisioning.

Hands-on with Kafka, Flink, Spark, Airflow, dbt, Iceberg, Trino and Python.

Experienced leading delivery, mentoring engineers and tightening engineering standards.

Experience

Lead Data Engineer

AlfaStrakhovanie

2021 — present

  • Led design of real-time claims pipelines on Kafka, Flink and Iceberg for high-throughput workloads.
  • Drove migration of analytical workloads toward open lakehouse patterns with Trino, dbt and Iceberg.
  • Built an LLM-assisted merge request review workflow adopted by multiple teams.

Senior Data Engineer

Large financial services company

2018 — 2021

  • Built and operated Spark-based ETL pipelines across multiple upstream systems.
  • Standardized orchestration patterns in Airflow and improved observability for data SLAs.
  • Improved query performance through better file layout, partitioning and columnar storage practices.

Data Engineer / Data Analyst

Earlier analytics and data roles

2015 — 2018

  • Moved from SQL-heavy analytics toward Python and distributed data engineering.
  • Built early event pipelines and learned how platform decisions affect downstream teams.

Selected skills

Streaming

Kafka, Kafka Streams, Flink

Batch

Spark, Airflow, dbt

Storage

Iceberg, Parquet, ClickHouse, PostgreSQL

Query

Trino, DuckDB, SQL optimization

Infra

Kubernetes, Docker, Terraform, GitLab CI

AI tooling

Claude API, RAG, Qdrant, internal developer tools

How I work

Prefer boring infrastructure when it scales operationally.Make trade-offs explicit instead of hiding them in architecture diagrams.Build platforms teams can actually adopt, not just admire.

Subscribe

Get new articles by email when they are published. No spam, just notes on data platforms, distributed systems and AI tooling.

Your email is stored only for this self-hosted mailing list.