Linklog | Nicolas Chagnet

Week 21

Are you more likely to die on your birthday? (pudding.cool) - #data #statistics
A fun analysis of the birthday effect using actual data and thorough methodology.

Week 18

dataframely — A declarative, 🐻‍❄️-native data frame validation library (tech.quantco.com) - #data #library
I've been working a lot on our data pipelines at work, switching to polars mostly for performance and introducing rigorous checks and validations of data at various stages. I haven't yet used dataframely, but its principle really resonates with my use case, so I recommend checking it out.

Week 10

Succinct data structures (blog.startifact.com) - #data
Succinct data structures are clever ways to pack a lot of information in lightweight structures like bit vectors. A very interesting read!

Week 7

Binary vector embeddings are so cool (emschwartz.me) - #llm #deep-learning #data
A description of the effect of binary quantization on embeddings. By restricting the dtype of embedding vectors, you can get a tradeoff between accuracy in latent space and size of the embedding. Using binary dtype seems to conserve a surprisingly high amount of the original information content (about 97%) while yielding a gigantic amount of saving in space (about 97% too here).

Week 5

Data Contracts as Therapy (benrutter.github.io) - #data
Musings about the use of data contracts to validate data sources. If you've ever been frustrated by a data source suddenly changing its schema or sending unexpected data, this is for you!

Week 4

Polars for initial data analysis, Polars for production (pythonspeed.com) - #python #data
Article about the use of Polars for both production and development stages. When starting with Polars, I found it easy to write production code (usually a long pipeline of LazyFrames ending with a collect), but struggled with writing optimal development code.
Modern Polars (kevinheavey.github.io) - #python #data
Great online book about Polars targeted to Pandas users. If you haven't heard about Polars yet, do yourself a favor and read this.

Week 44

First aid for figures: all resources (helenajamborwrites.netlify.app) - #data
A collection of resources to help make better data visualizations. Definitely useful as a refresher or reference before making a report or a presentation.

Week 39

Was Michael Scott the World’s Best Boss? (datacream.substack.com) - #data
I always love when data scientists take it too far on their hobbies. This is a cool example of data science applied to "The Office", to figure out through sentiment analysis if Michael Scott was truly appreciated.

Week 31

Column Names as Contracts (emilyriederer.netlify.app) - #best-practices #data
An interesting explanation of implicit data contracts through naming conventions.