Projects Publications Blog Linklog

Linklog

A curated collection of links and resources I have found over time.

Tags: #algorithms (8)#best-practices (9)#cli (4)#data (10)#data-science (7)#deep-learning (8)#diffusion (3)#library (11)#llm (10)#LLM (1)#markdown (2)#optimization (6)#physics (3)#python (27)#rust (7)#SQL (2)#statistics (4)#tools (7)#vcs (6)#web-dev (4)

May 2025

Week 21
  • Are you more likely to die on your birthday? (pudding.cool) - #data#statistics

    A fun analysis of the birthday effect using actual data and thorough methodology.

Week 18
  • dataframely — A declarative, 🐻‍❄️-native data frame validation library (tech.quantco.com) - #data#library

    I've been working a lot on our data pipelines at work, switching to polars mostly for performance and introducing rigorous checks and validations of data at various stages. I haven't yet used dataframely, but its principle really resonates with my use case, so I recommend checking it out.

March 2025

Week 10
  • Succinct data structures (blog.startifact.com) - #data

    Succinct data structures are clever ways to pack a lot of information in lightweight structures like bit vectors. A very interesting read!

February 2025

Week 7
  • Binary vector embeddings are so cool (emschwartz.me) - #llm#deep-learning#data

    A description of the effect of binary quantization on embeddings. By restricting the dtype of embedding vectors, you can get a tradeoff between accuracy in latent space and size of the embedding. Using binary dtype seems to conserve a surprisingly high amount of the original information content (about 97%) while yielding a gigantic amount of saving in space (about 97% too here).

January 2025

Week 5
  • Data Contracts as Therapy (benrutter.github.io) - #data

    Musings about the use of data contracts to validate data sources. If you've ever been frustrated by a data source suddenly changing its schema or sending unexpected data, this is for you!

Week 4
  • Polars for initial data analysis, Polars for production (pythonspeed.com) - #python#data

    Article about the use of Polars for both production and development stages. When starting with Polars, I found it easy to write production code (usually a long pipeline of LazyFrames ending with a collect), but struggled with writing optimal development code.

  • Modern Polars (kevinheavey.github.io) - #python#data

    Great online book about Polars targeted to Pandas users. If you haven't heard about Polars yet, do yourself a favor and read this.

October 2024

Week 44
  • First aid for figures: all resources (helenajamborwrites.netlify.app) - #data

    A collection of resources to help make better data visualizations. Definitely useful as a refresher or reference before making a report or a presentation.

September 2024

Week 39
  • Was Michael Scott the World’s Best Boss? (datacream.substack.com) - #data

    I always love when data scientists take it too far on their hobbies. This is a cool example of data science applied to "The Office", to figure out through sentiment analysis if Michael Scott was truly appreciated.

August 2024

Week 31
  • Column Names as Contracts (emilyriederer.netlify.app) - #best-practices#data

    An interesting explanation of implicit data contracts through naming conventions.

You can follow me via RSS. Switch theme.
© 2025 Nicolas Chagnet. All rights reserved.