Your ML Logging Stack Should Be Boring
CSV files and SQL, and why simple beats shiny
TLDR: Log indiscriminately, analyze later.
Logging
When you write a ML pipeline or training session: log everything. Hyperparameters, per-epoch metrics, per-step losses, learning rates, optimizer stats, dataset sizes, random seeds, host machine name, timestamps: dump it all, anything that is not a checkpoint (or tensor). Don’t waste energy deciding what to log and what not to log as that decision always bites you after the experiment is over. Instead, treat logging as a indiscriminate data capture.
The simplest format is also the most effective: dump everything into a CSV file. It is readable today, it will be readable in 50 years by 50 different tools even when tools MLFLow and Wandb change their API or pricing. Append a row each time you want to record state, and let it grow as large as it needs. Yes, the CSV file can balloon to hundreds of megabytes or sometimes a few gigabytes, but that’s a problem for DuckDB, not for you.
Analyzing
DuckDB is an extremely fast OLAP engine that can chew through tens of gigabytes of CSV in seconds, in place without indexing or preprocessing. Unlike Pandas/Polars, it doesn’t load the file into memory, it performs all the queries lazily. It supports SQL, which is a skill that will remain useful, even when Pandas, Polars or DuckDB use falls out of mainstream use (unlike the Pandas API.) With full support of battle tested SQL at your disposal, you can filter, aggregate, group, your data to retrieve your required insights from your experiments, with support of database views, joins, set operations on queries for incremental query refining. Queries like “find all hyperparameters where learning_rate > 0.001 AND final_accuracy > 0.95.” becomes a simple SQL statement.
And unlike rigid database schema or structured logging formats, CSV doesn’t lock you into a schema upfront. As your pipeline evolves (maybe you need to add a new regularization technique, or track GPU memory usage, or a different optimizer) just add new columns. DuckDB handles this gracefully. You don’t need migrations, schema updates, or version compatibility checks.
Storage
Storage size these days is no longer a bottleneck. What is a bottleneck is realizing two weeks later that you didn’t log the learning rate at each epoch, or the random seed, or the augmentation setting that happened to produce a good run. If every metric and hyperparameter is recorded, you can re-create the exact conditions of a run later. No “I think I used lr=0.001.” or “I forgot which augmentations were on.” Because, storage is cheap, regret is not. If you need to store many experiments, save each experiment as a CSV file, and later you can compress or convert data after the experiment is over for archival purpose. Since, your log files are flattened tables, it compresses well with gzip or you can use a different format like Parquet.
Visualization
Do not rely on stdout, progress bars and pretty console output (tqdm, rich, etc.) are great for immediate feedback, but they’re not a durable record. Hosted/SDK loggers (Wandb, MLFlow, TensorBoard, etc) are not a replacement for a raw, append-only record you control. Hosted services add dependencies, and SDK integrations sometimes hide the data behind proprietary APIs (behind rate-limits, custom query languages and slower JSON-parsing) that make long-term archival difficult and tied to their services. If your team/lab already uses WandB, you can still keep your own CSV as the source of truth.
If you absolutely need a live graph while training, you don’t need to lock your valuable experiment data in WandB or MLFlow. You can spin up a Streamlit dashboard that re-reads your CSV every few seconds, or use Grafana (with a CSV plugin that auto-refreshes when the file changes). If you just need a quick plot after each epoch, a simple matplotlib script works fine. That requires almost the same setup as setting up WandB, but the broader point is: your data remains yours independent of your visualization tool, readable by any tool you choose, not locked into a vendor’s API.
Your training loop stays clean, and visualization is just an external process watching the log. And because the CSV already tracks everything, you’re not locked into a handful of pre-baked charts. You can slice runs by hyperparameter groups, generate any plot, or answer whatever curveball questions your advisor (or a reviewer) throws at you without rerunning anything. The raw data is there, so any graph is possible after the fact.
Remember: When the experiment is over, metadata is science.

