Memory beats full context on LongMemEval — and the wins we don't get - DEV Community
Our first official benchmark runs — +14.2 points over a full-context baseline on LongMemEval at ~39× fewer tokens, plus the LoCoMo case where full context still wins.