Skip to main content
MedGemma 1.5: Open-Source Medical AI with 3D Imaging Support

MedGemma 1.5: Open-Source Medical AI with 3D Imaging Support

·758 words·4 mins
Alejandro AO
Author
Alejandro AO

Software Engineer and Educator. Developer Advocate at Hugging Face 🤗

I help you build AI Apps that just work.

Introduction
#

Google just released MedGemma 1.5, an open-source multimodal medical AI model that brings state-of-the-art medical imaging capabilities to local hardware. With just 4 billion parameters, this model can interpret chest X-rays, CT scans, MRI volumes, and extract information from medical documents - all running on consumer GPUs.

What makes MedGemma 1.5 significant:

  • 3D volumetric imaging - First open-source model supporting CT/MRI volume interpretation
  • 4B parameters - Runs locally on consumer hardware (~8GB VRAM)
  • Free commercial use - No API costs, full data privacy
  • Near state-of-the-art - 80% accuracy vs GPT-4’s 70% on medical imaging tasks

What’s New in Version 1.5
#

MedGemma 1.5 brings major improvements over the previous version:

Capabilityv1.0v1.5Improvement
MRI classification51%65%+14%
CT classification58%61%+3%
Medical Q&A (MedQA)64%69%+5%
EHR question-answering68%90%+22%
Chest X-ray localization-+35%New

The headline feature is 3D volumetric image support. Previously, interpreting CT and MRI scans required proprietary models or cloud APIs. Now you can process full 3D medical volumes locally.

Model Sizes
#

MedGemma 1.5 comes in two variants:

  • 4B parameters - Efficient, runs on consumer GPUs (T4, RTX 3080+)
  • 27B parameters - More powerful, requires enterprise hardware

For most applications, the 4B model provides excellent results while fitting in 8GB of VRAM.

Setup
#

You’ll need a GPU with at least 8GB VRAM. In Google Colab, select Runtime > Change runtime type > T4 GPU.

Install dependencies:

!pip install transformers torch pillow

Authenticate with Hugging Face:

from huggingface_hub import notebook_login
notebook_login()

You’ll need a Hugging Face token with access to gated repos. Create one at huggingface.co/settings/tokens.

Load the model:

from transformers import pipeline

pipe = pipeline(
    "image-text-to-text",
    model="google/medgemma-1.5-4b-it",
    device_map="auto",
)

The first download takes a few minutes as it fetches the model weights.

Demo: Chest X-Ray Analysis
#

Let’s analyze a chest X-ray using three different tasks: general description, disease classification, and anatomical localization.

Task 1: General Description
#

from PIL import Image

# Load your chest X-ray image
image = Image.open("chest_xray.jpg")

# Get a general description
response = pipe(
    images=image,
    text="Describe this chest X-ray. What do you see?"
)

print(response)

Example output:

There is a noticeable opacity or increased density in the right lung. This could indicate consolidation like pneumonia, fluid collection, or another abnormality. The left lung appears relatively clear. The heart size appears normal.

Task 2: Disease Classification
#

Ask targeted questions about specific conditions:

response = pipe(
    images=image,
    text="Are there any signs of pneumonia, cardiomegaly, or pleural effusion in this X-ray? Provide a detailed analysis."
)

The model returns structured analysis:

Pneumonia: There are patchy opacities in the right lung, particularly in the right lower lobe. This could suggest pneumonia. However, it’s difficult to definitively diagnose from a single image.

Cardiomegaly: The heart size appears within normal limits.

Pleural Effusion: No obvious signs of pleural effusion are visible.

Task 3: Anatomical Localization
#

Identify structures and their positions:

response = pipe(
    images=image,
    text="Identify and describe the location of the heart, lungs, and any abnormalities in this X-ray."
)

Output includes spatial descriptions:

Heart: Located in the center of the chest, slightly to the left. Appears normal size.

Lungs: Occupy the majority of the chest cavity. Right lung shows increased opacity.

Abnormalities: The most notable abnormality is the right lung opacity, which could indicate pneumonia, pulmonary edema, or pleural effusion.

Model Capabilities
#

MedGemma 1.5 excels at:

TaskDescription
Medical image classificationIdentify conditions from X-rays, CT, MRI
3D volumetric analysisProcess full CT/MRI volumes (new in 1.5)
Anatomical localizationIdentify and locate structures
Medical Q&AAnswer clinical questions
Document extractionPull structured data from medical records

Performance vs Proprietary Models
#

The fine-tuned MedGemma 1.5 achieves 80.37% accuracy on medical imaging tasks compared to GPT-4’s 69.58%. This is remarkable for a 4B parameter model running locally.

Key advantages over cloud-based models:

  • Data privacy - Medical data never leaves your infrastructure
  • No API costs - Run unlimited inferences
  • Low latency - No network round-trips
  • Customizable - Fine-tune for specific conditions or modalities

Limitations
#

Important caveats:

  • Not for clinical diagnosis - Always have results reviewed by medical professionals
  • Single image context - Works best with individual images, not full patient histories
  • Prompt sensitivity - Results vary based on how you phrase questions
  • Validation required - Fine-tune and validate on your specific use case before deployment

Full Code
#

The complete tutorial notebook is available on GitHub:

huggingface/hub-tutorials - MedGemma 1.5 Notebook

References
#