Abstract | IEEE International MTT Symposia

Multimodal LLMs for Electromagnetic Waves

We propose a multimodal large language model for electromagnetic-wave reasoning and understanding. EM data (eg field maps and radiation patterns) are ingested as images and fused with text via a bridge model (BLIP) that aligns vision embeddings to a pretrained LLM. LLMs such as Mistral and LLaMA have been evaluated as backbones with this BLIP-based fusion, outlining the architecture, dataset preparation, and prompting strategies. The talk will share results and ablations across backbones and fusion settings, and conclude with lessons for building trustworthy, physics-aware multimodal EM assistants.