Research Paper
The Reversal Curse- LLMs Trained On “A IS B” Fail To Learn "B IS A"

Abstract - We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically general...

Content given below are raw annotations with comments. It is suggested to view the paper PDF first.

The Reversal Curse- LLMs Trained On “A IS B” Fail To Learn "B IS A"

One of the most well-received paper which highlights a major drawback of LLM mechanisms and how they fail at generalizing something that is so simple for humans.

Paper originally released in 2023, but also published as a conference paper in ICLR 2024.

The paper has left huge influence on Mechanistic Interpretability and general research focussed on how LLMs store facts.

Many papers after this have tried solving this problem but all we currently have aer workarounds, which are simply telling to include both the sentences in training.

I highly recommend checking out this1 discussion thread on twitter. Read all the tweets around the original post.

Also I myself might end up writing a blog post summarizing 2-3 connected papers soon.

Footnotes

  1. Neel Nanda On Twitter