MedGemma 1.5: Open-Source Medical AI with 3D Imaging Support

Introduction
#

Google just released MedGemma 1.5, an open-source multimodal medical AI model that brings state-of-the-art medical imaging capabilities to local hardware. With just 4 billion parameters, this model can interpret chest X-rays, CT scans, MRI volumes, and extract information from medical documents - all running on consumer GPUs.

What makes MedGemma 1.5 significant:

3D volumetric imaging - First open-source model supporting CT/MRI volume interpretation
4B parameters - Runs locally on consumer hardware (~8GB VRAM)
Free commercial use - No API costs, full data privacy
Near state-of-the-art - 80% accuracy vs GPT-4’s 70% on medical imaging tasks

What’s New in Version 1.5
#

MedGemma 1.5 brings major improvements over the previous version:

Capability	v1.0	v1.5	Improvement
MRI classification	51%	65%	+14%
CT classification	58%	61%	+3%
Medical Q&A (MedQA)	64%	69%	+5%
EHR question-answering	68%	90%	+22%
Chest X-ray localization	-	+35%	New

The headline feature is 3D volumetric image support. Previously, interpreting CT and MRI scans required proprietary models or cloud APIs. Now you can process full 3D medical volumes locally.

Model Sizes
#

MedGemma 1.5 comes in two variants:

4B parameters - Efficient, runs on consumer GPUs (T4, RTX 3080+)
27B parameters - More powerful, requires enterprise hardware

For most applications, the 4B model provides excellent results while fitting in 8GB of VRAM.

Setup
#

You’ll need a GPU with at least 8GB VRAM. In Google Colab, select Runtime > Change runtime type > T4 GPU.

Install dependencies:

!pip install transformers torch pillow

Authenticate with Hugging Face:

from huggingface_hub import notebook_login
notebook_login()

You’ll need a Hugging Face token with access to gated repos. Create one at huggingface.co/settings/tokens.

Load the model:

from transformers import pipeline

pipe = pipeline(
    "image-text-to-text",
    model="google/medgemma-1.5-4b-it",
    device_map="auto",
)

The first download takes a few minutes as it fetches the model weights.

Demo: Chest X-Ray Analysis
#

Let’s analyze a chest X-ray using three different tasks: general description, disease classification, and anatomical localization.

Task 1: General Description
#

from PIL import Image

# Load your chest X-ray image
image = Image.open("chest_xray.jpg")

# Get a general description
response = pipe(
    images=image,
    text="Describe this chest X-ray. What do you see?"
)

print(response)

Example output:

There is a noticeable opacity or increased density in the right lung. This could indicate consolidation like pneumonia, fluid collection, or another abnormality. The left lung appears relatively clear. The heart size appears normal.

Task 2: Disease Classification
#

Ask targeted questions about specific conditions:

response = pipe(
    images=image,
    text="Are there any signs of pneumonia, cardiomegaly, or pleural effusion in this X-ray? Provide a detailed analysis."
)

The model returns structured analysis:

Pneumonia: There are patchy opacities in the right lung, particularly in the right lower lobe. This could suggest pneumonia. However, it’s difficult to definitively diagnose from a single image.
Cardiomegaly: The heart size appears within normal limits.
Pleural Effusion: No obvious signs of pleural effusion are visible.

Task 3: Anatomical Localization
#

Identify structures and their positions:

response = pipe(
    images=image,
    text="Identify and describe the location of the heart, lungs, and any abnormalities in this X-ray."
)

Output includes spatial descriptions:

Heart: Located in the center of the chest, slightly to the left. Appears normal size.
Lungs: Occupy the majority of the chest cavity. Right lung shows increased opacity.
Abnormalities: The most notable abnormality is the right lung opacity, which could indicate pneumonia, pulmonary edema, or pleural effusion.

Model Capabilities
#

MedGemma 1.5 excels at:

Task	Description
Medical image classification	Identify conditions from X-rays, CT, MRI
3D volumetric analysis	Process full CT/MRI volumes (new in 1.5)
Anatomical localization	Identify and locate structures
Medical Q&A	Answer clinical questions
Document extraction	Pull structured data from medical records

Performance vs Proprietary Models
#

The fine-tuned MedGemma 1.5 achieves 80.37% accuracy on medical imaging tasks compared to GPT-4’s 69.58%. This is remarkable for a 4B parameter model running locally.

Key advantages over cloud-based models:

Data privacy - Medical data never leaves your infrastructure
No API costs - Run unlimited inferences
Low latency - No network round-trips
Customizable - Fine-tune for specific conditions or modalities

Limitations
#

Important caveats:

Not for clinical diagnosis - Always have results reviewed by medical professionals
Single image context - Works best with individual images, not full patient histories
Prompt sensitivity - Results vary based on how you phrase questions
Validation required - Fine-tune and validate on your specific use case before deployment

Full Code
#

The complete tutorial notebook is available on GitHub:

huggingface/hub-tutorials - MedGemma 1.5 Notebook

Introduction#

What’s New in Version 1.5#

Model Sizes#

Setup#

Demo: Chest X-Ray Analysis#

Task 1: General Description#

Task 2: Disease Classification#

Task 3: Anatomical Localization#

Model Capabilities#

Performance vs Proprietary Models#

Limitations#

Full Code#

References#