Introduction#
Google just released MedGemma 1.5, an open-source multimodal medical AI model that brings state-of-the-art medical imaging capabilities to local hardware. With just 4 billion parameters, this model can interpret chest X-rays, CT scans, MRI volumes, and extract information from medical documents - all running on consumer GPUs.
What makes MedGemma 1.5 significant:
- 3D volumetric imaging - First open-source model supporting CT/MRI volume interpretation
- 4B parameters - Runs locally on consumer hardware (~8GB VRAM)
- Free commercial use - No API costs, full data privacy
- Near state-of-the-art - 80% accuracy vs GPT-4’s 70% on medical imaging tasks
What’s New in Version 1.5#
MedGemma 1.5 brings major improvements over the previous version:
| Capability | v1.0 | v1.5 | Improvement |
|---|---|---|---|
| MRI classification | 51% | 65% | +14% |
| CT classification | 58% | 61% | +3% |
| Medical Q&A (MedQA) | 64% | 69% | +5% |
| EHR question-answering | 68% | 90% | +22% |
| Chest X-ray localization | - | +35% | New |
The headline feature is 3D volumetric image support. Previously, interpreting CT and MRI scans required proprietary models or cloud APIs. Now you can process full 3D medical volumes locally.
Model Sizes#
MedGemma 1.5 comes in two variants:
- 4B parameters - Efficient, runs on consumer GPUs (T4, RTX 3080+)
- 27B parameters - More powerful, requires enterprise hardware
For most applications, the 4B model provides excellent results while fitting in 8GB of VRAM.
Setup#
You’ll need a GPU with at least 8GB VRAM. In Google Colab, select Runtime > Change runtime type > T4 GPU.
Install dependencies:
!pip install transformers torch pillow
Authenticate with Hugging Face:
from huggingface_hub import notebook_login
notebook_login()
You’ll need a Hugging Face token with access to gated repos. Create one at huggingface.co/settings/tokens.
Load the model:
from transformers import pipeline
pipe = pipeline(
"image-text-to-text",
model="google/medgemma-1.5-4b-it",
device_map="auto",
)
The first download takes a few minutes as it fetches the model weights.
Demo: Chest X-Ray Analysis#
Let’s analyze a chest X-ray using three different tasks: general description, disease classification, and anatomical localization.
Task 1: General Description#
from PIL import Image
# Load your chest X-ray image
image = Image.open("chest_xray.jpg")
# Get a general description
response = pipe(
images=image,
text="Describe this chest X-ray. What do you see?"
)
print(response)
Example output:
There is a noticeable opacity or increased density in the right lung. This could indicate consolidation like pneumonia, fluid collection, or another abnormality. The left lung appears relatively clear. The heart size appears normal.
Task 2: Disease Classification#
Ask targeted questions about specific conditions:
response = pipe(
images=image,
text="Are there any signs of pneumonia, cardiomegaly, or pleural effusion in this X-ray? Provide a detailed analysis."
)
The model returns structured analysis:
Pneumonia: There are patchy opacities in the right lung, particularly in the right lower lobe. This could suggest pneumonia. However, it’s difficult to definitively diagnose from a single image.
Cardiomegaly: The heart size appears within normal limits.
Pleural Effusion: No obvious signs of pleural effusion are visible.
Task 3: Anatomical Localization#
Identify structures and their positions:
response = pipe(
images=image,
text="Identify and describe the location of the heart, lungs, and any abnormalities in this X-ray."
)
Output includes spatial descriptions:
Heart: Located in the center of the chest, slightly to the left. Appears normal size.
Lungs: Occupy the majority of the chest cavity. Right lung shows increased opacity.
Abnormalities: The most notable abnormality is the right lung opacity, which could indicate pneumonia, pulmonary edema, or pleural effusion.
Model Capabilities#
MedGemma 1.5 excels at:
| Task | Description |
|---|---|
| Medical image classification | Identify conditions from X-rays, CT, MRI |
| 3D volumetric analysis | Process full CT/MRI volumes (new in 1.5) |
| Anatomical localization | Identify and locate structures |
| Medical Q&A | Answer clinical questions |
| Document extraction | Pull structured data from medical records |
Performance vs Proprietary Models#
The fine-tuned MedGemma 1.5 achieves 80.37% accuracy on medical imaging tasks compared to GPT-4’s 69.58%. This is remarkable for a 4B parameter model running locally.
Key advantages over cloud-based models:
- Data privacy - Medical data never leaves your infrastructure
- No API costs - Run unlimited inferences
- Low latency - No network round-trips
- Customizable - Fine-tune for specific conditions or modalities
Limitations#
Important caveats:
- Not for clinical diagnosis - Always have results reviewed by medical professionals
- Single image context - Works best with individual images, not full patient histories
- Prompt sensitivity - Results vary based on how you phrase questions
- Validation required - Fine-tune and validate on your specific use case before deployment
Full Code#
The complete tutorial notebook is available on GitHub:
huggingface/hub-tutorials - MedGemma 1.5 Notebook


