Mind2Matter: Creating 3D Models from EEG Signals

Xia Deng*1, Shen Chen*1, Jiale Zhou†1, Lei Li†2,3
1East China University of Science and Technology
2University of Washington
3University of Copenhagen
*Indicates Equal Contribution Corresponding author

Code arXiv
MY ALT TEXT

A subject receives a visual input(left), and EEG signals are recorded. These signals are processed, interpreted as text (middle), and used to generate a 3D model of the scene (right), translating brain activity into a visual representation.

Abstract

The reconstruction of 3D objects from brain signals has gained significant attention in brain-computer interface (BCI) research. Current research predominantly utilizes functional magnetic resonance imaging (fMRI) for 3D reconstruction tasks due to its excellent spatial resolution. Nevertheless, the clinical utility of fMRI is limited by its prohibitive costs and inability to support real-time operations. In comparison, electroencephalography (EEG) presents distinct advantages as an affordable, non-invasive, and mobile solution for real-time brain-computer interaction systems. While recent advances in deep learning have enabled remarkable progress in image generation from neural data, decoding EEG signals into structured 3D representations remains largely unexplored. In this paper, we propose a novel framework that translates EEG recordings into 3D object reconstructions by leveraging neural decoding techniques and generative models. Our approach involves training an EEG encoder to extract spatiotemporal visual features, fine-tuning a large language model to interpret these features into descriptive multimodal outputs, and leveraging generative 3D Gaussians with layout-guided control to synthesize the final 3D structures. Experiments demonstrate that our model captures salient geometric and semantic features, paving the way for applications in braincomputer interfaces (BCIs), virtual reality, and neuroprosthetics.


Method

MY ALT TEXT

Architecture of Mind2Matter. EEG signals are processed by a trainable EEG Encoder to extract spatiotemporal features, generating EEG embeddings aligned with image embeddings from a frozen CLIP Encoder. These embeddings are transformed by a trainable Mapping Network and fed into a frozen LLM, which generates a textual description (e.g., "A colorful butterfly is perched on a flower") using a prompt. The text is then used by another LLM to create an initial 3D layout, followed by object-level and scene-level optimization with 3D Gaussian splatting and diffusion priors, producing a high-fidelity 3D scene.

Text-to-3D Generation

BibTeX

@misc{deng2025mind2mattercreating3dmodels,
      title={Mind2Matter: Creating 3D Models from EEG Signals}, 
      author={Xia Deng and Shen Chen and Jiale Zhou and Lei Li},
      year={2025},
      eprint={2504.11936},
      archivePrefix={arXiv},
      primaryClass={cs.GR},
      url={https://arxiv.org/abs/2504.11936}, 
}