plugyawn's blog(?)

We can be more than Wizards

I have always been fascinated by those wizardly folks who seem to know how to talk to their PID controllers, hold it for seconds and tune it nearly perfectly.

It's always nearly, though. Of course, the right way is to mathematically find a optimal set of parameters. The physical intuition isn't really necessary. That sounds very true of how we do pre/post-training these days. We seem to be doing it by hand, and there are some people who do it rather well and tell others how to.

All I claim is there should instead be a theory (surprise, surprise). But perhaps... a Newtonian, rather than statistical mechanical look at NNs, in the wake of just humungously parameterized models. This is in the sense that we should perhaps strive towards being able to write deterministic statements about large-scale systems that have been stabilized by scale, rather than try to look at the system as an assemblage of random particles.

Then again, we're only roughly at a 1 trillion BF16 models. That's what, 16 trillion bits? We are yet orders of magnitude away from having an Avogadro number of bits in our models, which is where we assume this "zoom-out" happens in Physics. We've already noticed scaling laws, but perhaps our LLMs today are still "gaseous" intelligences. Our intelligences are still too small and amorphous to be consolidated by theory. As we scale more, the sea of randomness will recede, and we shall have more deterministic intelligences, hopefully more amenable to the harnesses of scientific inquiry.

But, we may perhaps hope it will only get easier and not harder over time.