Large language models are a huge deal. Unlike any machine learning model ever, they encode a model of the world (like, they know that 07/18 is a valid American date, but 13/12 has to be a European date), they understand structure that not even humans are aware of in their own language, and they are a giant superposition of many skills and personalities (you only need 1,000 examples to elicit vastly different behavior from a model trained with many trillions of examples). When I read a paper on large language models, I take notes. Here are those notes (and here is the link to that document).