Eight Things to Know about Large Language Models
A bunch of stuff that maybe was somewhat surprising a year ago but by now should be common knowledge for anybody even half following the developments in this field.
Some interesting bits in there but for the rest it’s a bit rah-rah because the author works at Anthropic.
In particular, models can misinterpret ambiguous prompts or incentives in unreason- able ways, including in situations that appear unambiguous to humans, leading them to behave unexpectedly.
Our techniques for controlling systems are weak and are likely to break down further when applied to highly capable models. Given all this, it is reasonable to expect a substantial increase and a substantial qualitative change in the range of misuse risks and model misbehaviors that emerge from the development and deployment of LLMs.
The recent trend toward limiting access to LLMs and treating the details of LLM training as proprietary information is also an obstacle to scientific study.