Linklog
A curated collection of links and resources I have found over time.
March 2025
- Succinct data structures (blog.startifact.com)
Succinct data structures are clever ways to pack a lot of information in lightweight structures like bit vectors. A very interesting read!
February 2025
- Binary vector embeddings are so cool (emschwartz.me)
A description of the effect of binary quantization on embeddings. By restricting the dtype of embedding vectors, you can get a tradeoff between accuracy in latent space and size of the embedding. Using binary dtype seems to conserve a surprisingly high amount of the original information content (about 97%) while yielding a gigantic amount of saving in space (about 97% too here).
January 2025
- Data Contracts as Therapy (benrutter.github.io)
Musings about the use of data contracts to validate data sources. If you've ever been frustrated by a data source suddenly changing its schema or sending unexpected data, this is for you!
- Polars for initial data analysis, Polars for production (pythonspeed.com)
Article about the use of Polars for both production and development stages. When starting with Polars, I found it easy to write production code (usually a long pipeline of LazyFrames ending with a collect), but struggled with writing optimal development code.
- Modern Polars (kevinheavey.github.io)
Great online book about Polars targeted to Pandas users. If you haven't heard about Polars yet, do yourself a favor and read this.
October 2024
- First aid for figures: all resources (helenajamborwrites.netlify.app)
A collection of resources to help make better data visualizations. Definitely useful as a refresher or reference before making a report or a presentation.
September 2024
- Was Michael Scott the World’s Best Boss? (datacream.substack.com)
I always love when data scientists take it too far on their hobbies. This is a cool example of data science applied to "The Office", to figure out through sentiment analysis if Michael Scott was truly appreciated.
August 2024
- Column Names as Contracts (emilyriederer.netlify.app)
An interesting explanation of implicit data contracts through naming conventions.