[R] Seeking Publicly Available Paired MRI + Genomic/Structured Data for Multimodal ML (Human/Animal/Plant)

I'm working on a multimodal machine learning pipeline that combines image data with structured/genomic-like data for prediction task. I'm looking for publicly available datasets where MRI/Image data and Genomic/Structured data are explicitly paired for the same individual/subject. My ideal scenario would be human cancer (like Glioblastoma Multiforme, where I know TCGA exists), but given recent data access changes (e.g., TCIA policies), I'm open to other domains that fit this multimodal structure:

What I'm looking for (prioritized):

Human Medical ata (e.g., Cancer): MRI/Image: Brain MRI (T1, T1Gd, T2, FLAIR). Genomic: Gene expression, mutations, methylation. Crucial: ata must be for the same patients, linked by I (like TCGA Is).

I'm aware of TCGA-GBM via TCIA/GC, but access to the BraTS-TCGA-GBM imaging seems to be undergoing changes as of July 2025. Any direct links or advice on navigating the updated TCIA/NIH ata Commons policies for this specific type of paired data would be incredibly helpful.

Animal ata:

Image: Animal MRI, X-rays, photos/video frames of animals (e.g., for health monitoring, behavior).

Genomic/Structured: Genetic markers, physiological sensor data (temp, heart rate), behavioral data (activity), environmental data (pen conditions), individual animal I/metadata.

Crucial: Paired for the same individual animal.

I understand animal MRI+genomics is rare publicly, so I'm also open to other imaging (e.g., photos) combined with structured data.

Plant ata:

Image: Photos of plant leaves/stems/fruits (e.g., disease symptoms, growth).

Structured: Environmental sensor data (temp, humidity, soil pH), plant species/cultivar genetics, agronomic metadata. Crucial: Paired for the same plant specimen/plot.

I'm aware of PlantVillage for images, but seeking datasets that explicitly combine images with structured non-image data per plant.

What I'm NOT looking for:

atasets with only images or only genomic/structured data.

atasets where pairing would require significant, unreliable manual matching.

ata that requires extremely complex or exclusive access permissions (unless it's the only viable option and the process is clearly outlined).

Any pointers to specific datasets, data repositories, research groups known for sharing such data, or advice on current access methods for TCGA-linked imaging would be immensely appreciated!

Thank you!

Leave a Reply