Linklog
A curated collection of links and resources I have found over time.
April 2025
- A Visual Exploration of Gaussian Processes (distill.pub)
This is a very thorough and well designed visual introduction to Gaussian Processes and Bayesian optimisation. The article features interactive visualisations, which I found great to truly get a feel for what's happening.
- A feel for the data | Briefer (briefer.cloud)
This is a high-quality review of how visualisations shape our understanding of data. Its focus on the strengths of each visualisation type makes it a great learning resource to improve our storytelling skills.
- Managing friction (arslan.io)
This article shares an interesting viewpoint on the role of friction in our lives, both as a positive and negative influence.
- Getting Started with TDD: A Practical Guide to Beginning a Lasting Practice (8thlight.com)
TDD can feel daunting, and advocacy to strict adherence of TDD principles can be off-putting when you are starting with it. This article does a good job of reminding all of us of the pragmatic take that some testing is better than no testing, and that TDD just like any other practice, is something you learn with time.
March 2025
- Writing useful Documentation (www.blog.philodev.one)
A great write-up on how to write good documentation.
- Don't Be Afraid Of Types (lmika.org)
Adding new types to existing codebases can be daunting, but one shouldn't be shy to do what's necessary. This is a good opinion piece on this topic!
- "Vibe Coding" vs Reality (cendyne.dev)
Read this if you want a good reality check on the current "vibe coding" trend.
- A Visual Guide to LLM Agents (newsletter.maartengrootendorst.com)
It is possibly one of the best summaries out there of how LLMs function, broken down by high-level components (memory, tools), and well illustrated.
- Learning Word Embedding (lilianweng.github.io)
An old but fantastic reference on vector embeddings.
- On the Importance of Naming in Programming (wasp.sh)
Some musings on the importance of good naming conventions in programming.
- How To Boil the Mediterranean Sea (benbyfax.substack.com)
This is an extremely interesting take on recent global warming data and the role of sulfur in masking some of the effects.
- Algorithms Books (algorithmsbook.com)
A fantastic collection of free textbooks on algorithms for optimization, decision making and validation.
- Slidev (sli.dev)
During my PhD, I wrangled with beamer for important presentations, but I always yearned for a simpler markdown-based system for smaller, recurrent presentations. I just discovered slidev, and it just checked every feature I would want from this, and more.
- Succinct data structures (blog.startifact.com)
Succinct data structures are clever ways to pack a lot of information in lightweight structures like bit vectors. A very interesting read!
- patrick-kidger/jaxtyping (github.com)
I've been looking for a good numpy and pytorch typing system in Python. Initially written for Jax, this library looks like exactly what I wanted.
- Understanding Attention in LLMs (bartoszmilewski.com)
This is a good example that even if you understand the math behind a concept, there's nothing like good storytelling. I knew how attention worked, but this post brillantly summarized it and clarified some steps for me. A great read!
- Death of Best Practices (korshakov.com)
An interesting take on the rigidity of best practices and how much more productive we can be once we let go of them.
- Markov Chains explained visually (setosa.io)
A very neat summary of what Markov chains are and how they work, with beautiful animations.
- Some Advanced Typing Concepts in Python (jellis18.github.io)
Another article about python's type system. This one is addressed to a more advanced audience. I had been looking for such a resource for a while, and I wasn't disappointed.
- Abstract Base Classes and Protocols: What Are They? When To Use Them?? Lets Find Out! (jellis18.github.io)
Very cool breakdown of the difference between abstract classes and protocols in python. Well written and with lots of clear examples.
February 2025
- Git Branching for Small Teams (victoria.dev)
A good git workflow for small teams. Reading it, it happens to be the one we use in my team, and I can confirm it's a very effective one!
- SolracHQ/bmath (github.com)
An interesting CLI math tool, with its own language and sane defaults.
- Do not log (sobolevn.me)
An interesting analysis of the cost of modern logging infrastructure and its usefulness (or lack thereof).
- Generating Mazes (healeycodes.com)
A great introduction to maze generation algorithms with informative visual.
- Summary of Major Changes Between Python Versions (www.nicholashairs.com)
This is a very useful reference sheet containing major changes added by every new python version. Extremely handy if you have to update an old codebase multiple versions at once.
- Prototyping in Rust (corrode.dev)
This article presents various pieces of advice and tips on how to efficiently write Rust code at the prototype stage. Most introductory material on the language focuses on "proper use of syntax." But prototyping is often a compromise between code quality and coding efficiency, and this article makes some great suggestions on how to do that.
- Deep dive into LLMs like ChatGPT by Andrej Karpathy (TL;DR) (anfalmushtaq.com)
A TL;DR version of Andrej Karpathy's "Deep dive into LLMs like ChatGPT" video. Manages to keep the essentials but presents them in digestible clear chunks.
- (Ab)using General Search Algorithms on Dynamic Optimization Problems (dubovik.eu)
An interesting analysis of various optimization algorithms applied to a simple dynamical programming problem. Features beautiful visualizations of those algorithms.
- uchū (uchu.style)
uchū is a minimalistic color palette based on OKLCH color space. I personally find it very aesthetically pleasing.
- flywhl/logis (github.com)
An interesting library to record ML experiments metadata through commit messages. Even better, it supports a query language to find which commit satisfies a given criterion.
- jj init (v5.chriskrycho.com)
I've been more and more tempted by jujutsu as a drop-in replacement for git. Its default way to handle changes seems so sane compared to git. This article is a very thorough and accessible introduction to how it works, and it definitely nudged me further along the jj train.
- Luxa CSS (www.luxacss.com)
This is an interesting CSS framework which picks some parts out of Tailwind while also being more minimalistic. Bonus point: it was made by the creator of the fantastic Dracula theme.
- How I Use Git Worktrees (matklad.github.io)
Useful example of a git worktree workflow. Worktrees help avoiding all the stashing and branch hopping a typical workflow would have. You can pull the repository multiple times on different branches and work on different features, review pull requests, run automated tests, etc..., without having to break your flow.
- Binary vector embeddings are so cool (emschwartz.me)
A description of the effect of binary quantization on embeddings. By restricting the dtype of embedding vectors, you can get a tradeoff between accuracy in latent space and size of the embedding. Using binary dtype seems to conserve a surprisingly high amount of the original information content (about 97%) while yielding a gigantic amount of saving in space (about 97% too here).
- efugier/smartcat (github.com)
An interesting CLI tool designed to call on to LLMs from the CLI with isolated short prompts. Seems to adhere to core Unix philosophy unlike most AI tools out there. Handles both local and hosted LLMs.
- How I program with LLMs (crawshaw.io)
Some interesting reflections on how to use LLMs in daily development work. I personally adhere mostly to the "autocomplete" part with Github Copilot, and I'm getting used to the "search" part where the LLM helps me find information on some language or coding paradigm faster than I can search it. I'm not yet onboard with "Chat-driven programming".
- A Deep Dive into Memorization in Deep Learning (blog.kjamistan.com)
An interesting series of articles explaining how machine learning models memorize data.
- ben-nour/SQL-tips-and-tricks (github.com)
I'm not a great SQL user, I have experience (mainly from database management in web development and now as a data scientist) but I don't consider myself an SQL wizard. This list of opinionated "tips" was quite useful to me.
- How to fine-tune open LLMs in 2025 with Hugging Face (www.philschmid.de)
An in-depth example of how to fine-tune an LLM using the Hugging Face ecosystem.
- aneeshnaik/lintsampler (github.com)
A useful Python library to sample custom probability distributions. Looks useful if the PDF is expensive to compute.
- Skforecast (skforecast.org)
A Python library for timeseries forecasting with very extensive features. The documentation also features some in-depth pedagogical explanations of how to properly forecast data and what methods can be used to improve results.
- Bayesian Methods for Hackers (dataorigami.net)
An illustrated introduction to Bayesian statistics using Jupyter notebooks. I was always confused about the difference between Bayesian and Frequentist approaches until I read this.
- Helix (helix-editor.com)
A rust-based alternative to neovim with opinionated defaults. After setting up an LSP for Python, it immediately became my daily driver.
- LoRA (jaketae.github.io)
Explanation of LoRA methods for LLMs.
- Thinking About Recipe Formats More Than Anyone Should (rknight.me)
An interesting reflection on markup languages for recipes. I was surprised other people spent as much time as I did pondering on recipe formats.
- Rust for the Polyglot Programmer (www.chiark.greenend.org.uk)
A book introducing Rust for programmers with experience in other languages. I'm not polyglot enough, but some of it helped me better understand the design choices in the language.
- Effective Simulated Annealing with Python (nathan.fun)
Fantastic introduction to the simulated annealing metaheuristic in Python. This is a powerful method to build good approximate solutions to optimization problems.
- Taking a Look at Compression Algorithms (cefboud.com)
A short blog post summarizing the main compression algorithms. It's incredible how little I knew about something I use so much.
- Jujutsu VCS Introduction and Patterns (kubamartin.com)
Introduction to using Jujutsu and replacing the git frontend. I'm still on the fence about it, but the commit system looks really interesting.
- Linklog (ewintr.nl)
Example of what a linklog should look like, and what I am for with my own.
- Solving differential equations using neural networks (labpresse.com)
Toy example of how to use neural networks to solve differential equations. This blew my mind when I first read it.
- Blogging in Djot instead of Markdown (www.jonashietala.se)
Interesting dive on how to handle multiple markup languages in a Rust-based static website generator. My Rust journey hasn't taken me there yet, but it probably will eventually!
- A Visual Guide to How Diffusion Models Work (towardsdatascience.com)
An interesting dive into what makes diffusion models work. The summary is that diffusion models are models trained on data with noise to find the original data, at various level of noise. They eventually learn the probability distribution of the images in the space of all possible pixel arrangements. You can then iteratively denoise a pure Gaussian noise picture until you generate a new image: this is like sampling the learned probability distribution.
January 2025
- Understanding LSTM Networks (colah.github.io)
In-depth explanation of LSTM Networks. The figures on this blog are incredible and truly help explaining what happens inside the network.
- A Brief Introduction to Recurrent Neural Networks (jaketae.github.io)
Introduction and example of how to build a recurrent neural network from scratch.
- Data Contracts as Therapy (benrutter.github.io)
Musings about the use of data contracts to validate data sources. If you've ever been frustrated by a data source suddenly changing its schema or sending unexpected data, this is for you!
- Polars for initial data analysis, Polars for production (pythonspeed.com)
Article about the use of Polars for both production and development stages. When starting with Polars, I found it easy to write production code (usually a long pipeline of LazyFrames ending with a collect), but struggled with writing optimal development code.
- Modern Polars (kevinheavey.github.io)
Great online book about Polars targeted to Pandas users. If you haven't heard about Polars yet, do yourself a favor and read this.
- Einsum in Depth (einsum.joelburget.com)
A guide on how to use "einsum" in Python for tensor manipulation. Einstein notation made working with algebra a much nicer experience in physics, and for anyone doing heavy tensorial operations, they should do the same. But I always found the python implementation a bit awkward and difficult to understand. This article really helped with that.
- Building effective agents (www.anthropic.com)
Advice on agentic workflow for practical applications from Anthropic. A good read to better understand what structure you should use when establishing your project.
- Hyperparameter Tuning LightGBM (macalusojeff.github.io)
A useful guide for hyperparameter tuning of LGBM models. Mostly, if like me you always forget what parameter range is sensible, you can find it in there.
December 2024
- Software design principles for machine learning applications (github.com)
A series of examples of proper software design in data science beyond Jupyter notebooks. Very good examples of proper refactoring, step by step, from a messy script to a properly encapsulated program.
- Quick software tips for new ML researchers (www.eugenevinitsky.com)
A short list of best practices. Some are obvious from a software development perspective (VCS, package manager, linter), but some others have some good recommendations on ML specific tools (Hydra for configs, Optuna for hyperparameter tuning).
- Hands-on Optimization with OR-Tools in Python (kunlei.github.io)
Detailed use cases of the OR-Tools library for optimization problems. Many problems can be solved in a more efficient way with linear programming, and this library makes it a breeze to do so.
- GitHub Actions by Example (www.actionsbyexample.com)
I always have to google Github actions format and snippets, or prompt an LLM for it. This is a collection of examples so you never have to google it again.
- Data Science at the Command Line (jeroenjanssens.com)
Online book on how to use command-line tools for quick data science results. This is for when your boss asks you about some statistics of your recent data output and you don't want to write a whole script for it.
November 2024
- Perspectives on diffusion (sander.ai)
Some interesting thoughts on diffusion models.
- Thoughts on Riemannian metrics and its connection with diffusion/score matching [Part I] (blog.christianperone.com)
An in-depth description of the connections between diffusion models and Riemannian geometry.
- shshemi/tabiew (github.com)
A handy rust-based TUI application to view and manipulate data from CSV and databases. Supports SQL syntax to query the data regardless of its sources.
- Algorithm Afternoon (algorithmafternoon.com)
This a collection of all the optimization metaheuristic you can possibly imagine, with comments on how to implement them and what parameters can be tuned. The aim is to take it one algorithm per afternoon.
October 2024
- First aid for figures: all resources (helenajamborwrites.netlify.app)
A collection of resources to help make better data visualizations. Definitely useful as a refresher or reference before making a report or a presentation.
- Transformers From Scratch (blog.matdmiller.com)
Thorough explanation of the Transformers model. If like me you've been confused about what's so special about transformers compared to RNNs or LSTMs, this might help.
- dry-python/returns (github.com)
Bring some sanity to Python and remove null checks. Clearly inspired by Haskell's Maybe or Rust's Option type. I am mostly familiar with the latter, and I often wish it existed in Python, and now it does.
- Blog of Claudio Jolowicz (cjolowicz.github.io)
Series of articles on best practices around Python coding and tooling. Definitely worth checking it out if you're still building your workflow.
- shap/shap (github.com)
Useful library to estimate feature importance of machine learning models, based on game theory principles. The main idea is to estimate the importance of each feature to take a sample from the mean prediction value to a given prediction value. It can also be aggregated over samples to understand global feature importance, conditional on feature value.
- Modern Good Practices for Python Development (www.stuartellis.name)
A set of best-practices in Python development. Given the permissiveness of Python in terms of syntax and design, I find that following community accepted best practices is the best way to learn how to write good code too.
September 2024
- Introduction to Data Science (rafalab.dfci.harvard.edu)
An online book focusing on the fundamentals of data science (statistics, traditional machine learning). I don't know much about R (on which this book is based) but most of the theory in there is relevant for any junior data scientist.
- Visualizing Algorithms (bost.ocks.org)
A beautiful set of visualizations of common algorithms. Perfect to truly understand what happens in a quicksort algorithm, or to compare different sampling algorithms.
- Was Michael Scott the World’s Best Boss? (datacream.substack.com)
I always love when data scientists take it too far on their hobbies. This is a cool example of data science applied to "The Office", to figure out through sentiment analysis if Michael Scott was truly appreciated.
- dleemiller/WordLlama (github.com)
Natural language processing toolkit optimized for CPU hardware. I haven't tested it yet but it looks really useful for quick clustering, deduplication, similarity search, etc...
- Pico CSS (picocss.com)
A minimalistic take on CSS frameworks which is simple and lightweight. Hopefully I one day have the time to rewrite this blog with it. Update: it looks semi-abandoned, but some forks are keeping the torch alive.
- posit-dev/great-tables (github.com)
Library to make great-looking tables from Polars dataframes. It works with Pandas too but there you can just generate HTML directly, while Polars currently does not have many more options.
August 2024
- sharkdp/hyperfine (github.com)
A very useful CLI tool to perform benchmarking tests. Very useful to test a bash script of a simple script file without any complicated profiling.
- REDOKU (padolsey.github.io)
A fun RegExp-based crossword. Not easy though, and you might see English differently afterwards.
- Modern SQL Style Guide (gist.github.com)
An interesting and opinionated take on SQL formatting. You might not be able to impose it at work, but you can always try!
- Column Names as Contracts (emilyriederer.netlify.app)
An interesting explanation of implicit data contracts through naming conventions.
July 2024
- A User’s Guide to Statistical Inference and Regression (mattblackwell.github.io)
Brief introductory book to essential statistics. This online book is very clear and helped me understand concepts I always found confusing.
March 2024
- Machine Learning Notebooks (sebastianraschka.com)
A collection of detailed Python notebooks written by Sebastian Raschka. It's like a big cheatsheet of machine learning methods.
February 2024
- Python Data Science Handbook | Python Data Science Handbook (jakevdp.github.io)
A must-read for anyone beginning in data science. Chapter 5 features some great in-depth notebooks on classical machine learning methods like SVM, random forests, etc...
- faif/python-patterns (github.com)
A detailed list of design patterns in Python. While I don't believe you should always look to insert design patterns everywhere you can, knowing them is often the key to writing more robust code when relevant.
- Scientific Computing with Python — Scientific Computing with Python (caam37830.github.io)
A reference on how to use Python for efficient computations in science. I wish I had read this before my PhD.