Video Presentation
Abstract
Navigating complex environments requires robots to effectively store observations as memories and leverage them to answer human queries about spatial locations—a critical yet underexplored research challenge. While prior work has made progress in constructing robotic memory, few have addressed the principled mechanisms needed for efficient memory retrieval and integration. To bridge this gap, we propose Meta-Memory, a large language model (LLM)-driven agent that constructs a high-density memory representation of the environment. The key innovation of Meta-Memory lies in its capacity to retrieve and integrate relevant memories through joint reasoning over semantic and spatial modalities in response to natural language location queries, thereby empowering robots with robust and accurate spatial reasoning capabilities. To evaluate its performance, we introduce SpaceLocQA, a large-scale dataset encompassing diverse real-world spatial question-answering scenarios. Experimental results show that Meta-Memory significantly outperforms state-of-the-art methods on both the SpaceLocQA and the public NaVQA benchmarks. Furthermore, we successfully deployed Meta-Memory on real-world robotic platforms, demonstrating its practical utility in complex environments.

During its operation, Meta-Memory first constructs observations into memories. When it receives a location query from a user, Meta-Memory utilizes two distinct tools to retrieve memories along both semantic and spatial dimensions. Once sufficient information has been retrieved, Meta-Memory integrates the data to construct a cognitive map. Finally, it infers the target location based on this map.

The Meta-Memory framework consists of two main modules: In the Memory Building phase, the Memory Database is constructed. In the Memory Retrieval Integration and Inference phase, the LLM interacts with the Memory Database using three tools to answer the query.
_page-0001.jpg)