Talk from Nicola Toschi, Medical Physics, University of Rome
Short abstract:
Studying the brain can be though of as finding the common story that different measurements, tasks, and species are all telling. Large pretrained foundational models give us a potential language for that story: they supply semantic spaces in which perception, action, and cognition can meet. Such spaces can be used to align heterogeneous neural data, generate naturalistic readouts, and probe the structure of neural codes themselves. Across a set of studies, we pursue a single idea: map brain signals into a shared, semantically grounded space and use that space to test what is common, what is interpretable, and what composes. In vision, we align EEG, MEG, and fMRI responses to a frozen space with a signle model that supports image decoding, image to brain encoding, and translation across neural modalities, providing evidence for modality invariant codes. IN teh process, we show how on large heterogeneous fMRI corpora, contrastive alignment paired with a simple linear decoder is often sufficient, favoring reproducible pipelines over bespoke architectures. To open the black box, we insert an explicit semantic bottleneck using a VQA derived concept space, yielding maps of human interpretable categories while retaining reconstructive power. We also then probe algebraic structure directly in brain space, by adding concept selective average patterns to held out fMRI signals before any decoding. We deomnstrate how resulting reconstructions behave as compositional blends, consistent with vector like operations over high level representations. Beyond non invasive human vision, we show show how temporally aware mappings from invasive primate recordings to CLIP is able to separate semantic decoding from image generation, enabling retrieval and controlled synthesis. We extend to audition where fMRI to music decoding is used to condition a diffusion prior to reconstruct what the brain hears across individuals, while invasive (ECoG) encoding of naturalistic speech reveals a progression from acoustic tracking to phonemic structure. Taken together, these studies sketch a high level storyline: a single semantic scaffold can align signals across methods, tasks, and species; support generative, human interpretable readouts in multiple modalities; and reveal structure—compositionality—within the code.