Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
keras-team
GitHub Repository: keras-team/keras-io
Path: blob/master/guides/keras_hub/rag_pipeline_with_keras_hub.py
3293 views
1
"""
2
Title: RAG Pipeline with KerasHub
3
Author: [Laxmareddy Patlolla](https://github.com/laxmareddyp), [Divyashree Sreepathihalli](https://github.com/divyashreepathihalli)
4
Date created: 2025/07/22
5
Last modified: 2025/08/08
6
Description: RAG pipeline for brain MRI analysis: image retrieval, context search, and report generation.
7
Accelerator: GPU
8
9
"""
10
11
"""
12
## Welcome to Your RAG Adventure!
13
14
Hey there! Ready to dive into something really exciting? We're about to build a system that can look at brain MRI images and generate detailed medical reports - but here's the cool part: it's not just any AI system. We're building something that's like having a super-smart medical assistant who can look at thousands of previous cases to give you the most accurate diagnosis possible!
15
16
**What makes this special?** Instead of just relying on what the AI learned during training, our system will actually "remember" similar cases it has seen before and use that knowledge to make better decisions. It's like having a doctor who can instantly recall every similar case they've ever treated!
17
18
**What we're going to discover together:**
19
20
- How to make AI models work together like a well-oiled machine
21
- Why having access to previous cases makes AI much smarter
22
- How to build systems that are both powerful AND efficient
23
- The magic of combining image understanding with language generation
24
25
Think of this as your journey into the future of AI-powered medical analysis. By the end, you'll have built something that could potentially help doctors make better decisions faster!
26
27
Ready to start this adventure? Let's go!
28
"""
29
30
"""
31
## Setting Up Our AI Workshop
32
33
Alright, before we start building our amazing RAG system, we need to set up our digital workshop! Think of this like gathering all the tools a master craftsman needs before creating a masterpiece.
34
35
**What we're doing here:** We're importing all the powerful libraries that will help us build our AI system. It's like opening our toolbox and making sure we have every tool we need - from the precision screwdrivers (our AI models) to the heavy machinery (our data processing tools).
36
37
**Why JAX?** We're using JAX as our backend because it's like having a super-fast engine under the hood. It's designed to work beautifully with modern AI models and can handle complex calculations lightning-fast, especially when you have a GPU to help out!
38
39
**The magic of KerasHub:** This is where things get really exciting! KerasHub is like having access to a massive library of pre-trained AI models. Instead of training models from scratch (which would take forever), we can grab models that are already experts at understanding images and generating text. It's like having a team of specialists ready to work for us!
40
41
Let's get our tools ready and start building something amazing!
42
"""
43
44
"""
45
## Getting Your VIP Pass to the AI Model Library! šŸŽ«
46
47
Okay, here's the deal - we're about to access some seriously powerful AI models, but first we need to get our VIP pass! Think of Kaggle as this exclusive club where all the coolest AI models hang out, and we need the right credentials to get in.
48
49
**Why do we need this?** The AI models we're going to use are like expensive, high-performance sports cars. They're incredibly powerful, but they're also quite valuable, so we need to prove we're authorized to use them. It's like having a membership card to the most exclusive AI gym in town!
50
51
**Here's how to get your VIP access:**
52
53
1. **Head to the VIP lounge:** Go to your Kaggle account settings at https://www.kaggle.com/settings/account
54
2. **Get your special key:** Scroll down to the "API" section and click "Create New API Token"
55
3. **Set up your access:** This will give you the secret codes (API key and username) that let you download and use these amazing models
56
57
**Pro tip:** If you're running this in Google Colab (which is like having a super-powered computer in the cloud), you can store these credentials securely and access them easily. It's like having a digital wallet for your AI models!
58
59
Once you've got your credentials set up, you'll be able to download and use some of the most advanced AI models available today. Pretty exciting, right? šŸš€
60
"""
61
62
import os
63
import sys
64
65
os.environ["KERAS_BACKEND"] = "jax"
66
import keras
67
import numpy as np
68
69
keras.config.set_dtype_policy("bfloat16")
70
import keras_hub
71
import tensorflow as tf
72
from PIL import Image
73
import matplotlib.pyplot as plt
74
from nilearn import datasets, image
75
import re
76
77
78
"""
79
## Understanding the Magic Behind RAG! ✨
80
81
Alright, let's take a moment to understand what makes RAG so special! Think of RAG as having a super-smart assistant who doesn't just answer questions from memory, but actually goes to the library to look up the most relevant information first.
82
83
**The Three Musketeers of RAG:**
84
85
1. **The Retriever** šŸ•µļøā€ā™‚ļø: This is like having a detective who can look at a new image and instantly find similar cases from a massive database. It's the part that says "Hey, I've seen something like this before!"
86
87
2. **The Generator** āœļø: This is like having a brilliant writer who takes all the information the detective found and crafts a perfect response. It's the part that says "Based on what I found, here's what I think is happening."
88
89
3. **The Knowledge Base** šŸ“š: This is our treasure trove of information - think of it as a massive library filled with thousands of medical cases, each with their own detailed reports.
90
91
**Here's what our amazing RAG system will do:**
92
93
- **Step 1:** Our MobileNetV3 model will look at a brain MRI image and extract its "fingerprint" - the unique features that make it special
94
- **Step 2:** It will search through our database of previous cases and find the most similar one
95
- **Step 3:** It will grab the medical report from that similar case
96
- **Step 4:** Our Gemma3 text model will use that context to generate a brand new, super-accurate report
97
- **Step 5:** We'll compare this with what a traditional AI would do (spoiler: RAG wins! šŸ†)
98
99
**Why this is revolutionary:** Instead of the AI just guessing based on what it learned during training, it's actually looking at real, similar cases to make its decision. It's like the difference between a doctor who's just graduated from medical school versus one who has seen thousands of patients!
100
101
Ready to see this magic in action? Let's start building! šŸŽÆ
102
"""
103
104
"""
105
## Loading Our AI Dream Team! šŸ¤–
106
107
Alright, this is where the real magic begins! We're about to load up our AI models - think of this as assembling the ultimate team of specialists, each with their own superpower!
108
109
**What we're doing here:** We're downloading and setting up three different AI models, each with a specific role in our RAG system. It's like hiring the perfect team for a complex mission - you need the right person for each job!
110
111
**Meet our AI specialists:**
112
113
1. **MobileNetV3** šŸ‘ļø: This is our "eyes" - a lightweight but incredibly smart model that can look at any image and understand what it's seeing. It's like having a radiologist who can instantly spot patterns in medical images!
114
115
2. **Gemma3 1B Text Model** āœļø: This is our "writer" - a compact but powerful language model that can generate detailed medical reports. Think of it as having a medical writer who can turn complex findings into clear, professional reports.
116
117
3. **Gemma3 4B VLM** 🧠: This is our "benchmark" - a larger, more powerful model that can both see images AND generate text. We'll use this to compare how well our RAG approach performs against traditional methods.
118
119
**Why this combination is brilliant:** Instead of using one massive, expensive model, we're using smaller, specialized models that work together perfectly. It's like having a team of experts instead of one generalist - more efficient, faster, and often more accurate!
120
121
Let's load up our AI dream team and see what they can do! šŸš€
122
"""
123
124
125
def load_models():
126
"""
127
Load and configure vision model for feature extraction, Gemma3 VLM for report generation, and a compact text model for benchmarking.
128
Returns:
129
tuple: (vision_model, vlm_model, text_model)
130
"""
131
# Vision model for feature extraction (lightweight MobileNetV3)
132
vision_model = keras_hub.models.ImageClassifier.from_preset(
133
"mobilenet_v3_large_100_imagenet_21k"
134
)
135
# Gemma3 Text model for report generation in RAG Pipeline (compact)
136
text_model = keras_hub.models.Gemma3CausalLM.from_preset("gemma3_instruct_1b")
137
# Gemma3 VLM for report generation (original, for benchmarking)
138
vlm_model = keras_hub.models.Gemma3CausalLM.from_preset("gemma3_instruct_4b")
139
return vision_model, vlm_model, text_model
140
141
142
# Load models
143
print("Loading models...")
144
vision_model, vlm_model, text_model = load_models()
145
146
147
"""
148
## Preparing Our Medical Images! šŸ§ šŸ“ø
149
150
Now we're getting to the really exciting part - we're going to work with real brain MRI images! This is like having access to a medical imaging lab where we can study actual brain scans.
151
152
**What we're doing here:** We're downloading and preparing brain MRI images from the OASIS dataset. Think of this as setting up our own mini radiology department! We're taking raw MRI data and turning it into images that our AI models can understand and analyze.
153
154
**Why brain MRIs?** Brain MRI images are incredibly complex and detailed - they show us the structure of the brain in amazing detail. They're perfect for testing our RAG system because:
155
- They're complex enough to challenge our AI models
156
- They have real medical significance
157
- They're perfect for demonstrating how retrieval can improve accuracy
158
159
**The magic of data preparation:** We're not just downloading images - we're processing them to make sure they're in the perfect format for our AI models. It's like preparing ingredients for a master chef - everything needs to be just right!
160
161
**What you'll see:** After this step, you'll have a collection of brain MRI images that we can use to test our RAG system. Each image represents a different brain scan, and we'll use these to demonstrate how our system can find similar cases and generate accurate reports.
162
163
Ready to see some real brain scans? Let's prepare our medical images! šŸ”¬
164
"""
165
166
167
def prepare_images_and_captions(oasis, images_dir="images"):
168
"""
169
Prepare OASIS brain MRI images and generate captions.
170
171
Args:
172
oasis: OASIS dataset object containing brain MRI data
173
images_dir (str): Directory to save processed images
174
175
Returns:
176
tuple: (image_paths, captions) - Lists of image paths and corresponding captions
177
"""
178
os.makedirs(images_dir, exist_ok=True)
179
image_paths = []
180
captions = []
181
for i, img_path in enumerate(oasis.gray_matter_maps):
182
img = image.load_img(img_path)
183
data = img.get_fdata()
184
slice_ = data[:, :, data.shape[2] // 2]
185
slice_ = (
186
(slice_ - np.min(slice_)) / (np.max(slice_) - np.min(slice_)) * 255
187
).astype(np.uint8)
188
img_pil = Image.fromarray(slice_)
189
fname = f"oasis_{i}.png"
190
fpath = os.path.join(images_dir, fname)
191
img_pil.save(fpath)
192
image_paths.append(fpath)
193
captions.append(f"OASIS Brain MRI {i}")
194
print("Saved 4 OASIS Brain MRI images:", image_paths)
195
return image_paths, captions
196
197
198
# Prepare data
199
print("Preparing OASIS dataset...")
200
oasis = datasets.fetch_oasis_vbm(n_subjects=4) # Use 4 images
201
print("Download dataset is completed.")
202
image_paths, captions = prepare_images_and_captions(oasis)
203
204
205
"""
206
## Let's Take a Look at Our Brain Scans! šŸ‘€
207
208
Alright, this is the moment we've been waiting for! We're about to visualize our brain MRI images - think of this as opening up a medical textbook and seeing the actual brain scans that we'll be working with.
209
210
**What we're doing here:** We're creating a visual display of all our brain MRI images so we can see exactly what we're working with. It's like having a lightbox in a radiology department where doctors can examine multiple scans at once.
211
212
**Why visualization is crucial:** In medical imaging, seeing is believing! By visualizing our images, we can:
213
214
- Understand what our AI models are actually looking at
215
- Appreciate the complexity and detail in each brain scan
216
- Get a sense of how different each scan can be
217
- Prepare ourselves for what our RAG system will be analyzing
218
219
**What you'll observe:** Each image shows a different slice through a brain, revealing the intricate patterns and structures that make each brain unique. Some might show normal brain tissue, while others might reveal interesting variations or patterns.
220
221
**The beauty of brain imaging:** Every brain scan tells a story - the folds, the tissue density, the overall structure. Our AI models will learn to read these stories and find similar patterns across different scans.
222
223
Take a good look at these images - they're the foundation of everything our RAG system will do! 🧠✨
224
"""
225
226
227
def visualize_images(image_paths, captions):
228
"""
229
Visualize the processed brain MRI images.
230
231
Args:
232
image_paths (list): List of image file paths
233
captions (list): List of corresponding image captions
234
"""
235
n = len(image_paths)
236
fig, axes = plt.subplots(1, n, figsize=(4 * n, 4))
237
# If only one image, axes is not a list
238
if n == 1:
239
axes = [axes]
240
for i, (img_path, title) in enumerate(zip(image_paths, captions)):
241
img = Image.open(img_path)
242
axes[i].imshow(img, cmap="gray")
243
axes[i].set_title(title)
244
axes[i].axis("off")
245
plt.suptitle("OASIS Brain MRI Images")
246
plt.tight_layout()
247
plt.show()
248
249
250
# Visualize the prepared images
251
visualize_images(image_paths, captions)
252
253
254
"""
255
## Prediction Visualization Utility
256
257
Displays the query image and the most similar retrieved image from the database side by side.
258
"""
259
260
261
def visualize_prediction(query_img_path, db_image_paths, best_idx, db_reports):
262
"""
263
Visualize the query image and the most similar retrieved image.
264
265
Args:
266
query_img_path (str): Path to the query image
267
db_image_paths (list): List of database image paths
268
best_idx (int): Index of the most similar database image
269
db_reports (list): List of database reports
270
"""
271
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
272
axes[0].imshow(Image.open(query_img_path), cmap="gray")
273
axes[0].set_title("Query Image")
274
axes[0].axis("off")
275
axes[1].imshow(Image.open(db_image_paths[best_idx]), cmap="gray")
276
axes[1].set_title("Retrieved Context Image")
277
axes[1].axis("off")
278
plt.suptitle("Query and Most Similar Database Image")
279
plt.tight_layout()
280
plt.show()
281
282
283
"""
284
## Image Feature Extraction
285
286
Extracts a feature vector from an image using the small `vision(MobileNetV3)` model.
287
"""
288
289
290
def extract_image_features(img_path, vision_model):
291
"""
292
Extract features from an image using the vision model.
293
294
Args:
295
img_path (str): Path to the input image
296
vision_model: Pre-trained vision model for feature extraction
297
298
Returns:
299
numpy.ndarray: Extracted feature vector
300
"""
301
img = Image.open(img_path).convert("RGB").resize((384, 384))
302
x = np.array(img) / 255.0
303
x = np.expand_dims(x, axis=0)
304
features = vision_model(x)
305
return features
306
307
308
"""
309
## DB Reports
310
311
List of example `radiology reports` corresponding to each database image. Used as context for the RAG pipeline to generate new reports for `query images`.
312
"""
313
db_reports = [
314
"MRI shows a 1.5cm lesion in the right frontal lobe, non-enhancing, no edema.",
315
"Normal MRI scan, no abnormal findings.",
316
"Diffuse atrophy noted, no focal lesions.",
317
]
318
319
"""
320
## Output Cleaning Utility
321
322
Cleans the `generated text` output by removing prompt echoes and unwanted headers.
323
"""
324
325
326
def clean_generated_output(generated_text, prompt):
327
"""
328
Remove prompt echo and header details from generated text.
329
330
Args:
331
generated_text (str): Raw generated text from the language model
332
prompt (str): Original prompt used for generation
333
334
Returns:
335
str: Cleaned text without prompt echo and headers
336
"""
337
# Remove the prompt from the beginning of the generated text
338
if generated_text.startswith(prompt):
339
cleaned_text = generated_text[len(prompt) :].strip()
340
else:
341
cleaned_text = generated_text.replace(prompt, "").strip()
342
343
# Remove header details and unwanted formatting
344
lines = cleaned_text.split("\n")
345
filtered_lines = []
346
skip_next = False
347
subheading_pattern = re.compile(r"^(\s*[A-Za-z0-9 .\-()]+:)(.*)")
348
349
for line in lines:
350
line = line.replace("<end_of_turn>", "").strip()
351
line = line.replace("**", "")
352
line = line.replace("*", "")
353
# Remove empty lines after headers (existing logic)
354
if any(
355
header in line
356
for header in [
357
"**Patient:**",
358
"**Date of Exam:**",
359
"**Exam:**",
360
"**Referring Physician:**",
361
"**Patient ID:**",
362
"Patient:",
363
"Date of Exam:",
364
"Exam:",
365
"Referring Physician:",
366
"Patient ID:",
367
]
368
):
369
continue
370
elif line.strip() == "" and skip_next:
371
skip_next = False
372
continue
373
else:
374
# Split subheadings onto their own line if content follows
375
match = subheading_pattern.match(line)
376
if match and match.group(2).strip():
377
filtered_lines.append(match.group(1).strip())
378
filtered_lines.append(match.group(2).strip())
379
filtered_lines.append("") # Add a blank line after subheading
380
else:
381
filtered_lines.append(line)
382
# Add a blank line after subheadings (lines ending with ':')
383
if line.endswith(":") and (
384
len(filtered_lines) == 1 or filtered_lines[-2] != ""
385
):
386
filtered_lines.append("")
387
skip_next = False
388
389
# Remove any empty lines and excessive whitespace
390
cleaned_text = "\n".join(
391
[l for l in filtered_lines if l.strip() or l == ""]
392
).strip()
393
394
return cleaned_text
395
396
397
"""
398
## The Heart of Our RAG System! ā¤ļø
399
400
Alright, this is where all the magic happens! We're about to build the core of our RAG pipeline - think of this as the engine room of our AI system, where all the complex machinery works together to create something truly amazing.
401
402
**What is RAG, really?**
403
404
Imagine you're a detective trying to solve a complex case. Instead of just relying on your memory and training, you have access to a massive database of similar cases. When you encounter a new situation, you can instantly look up the most relevant previous cases and use that information to make a much better decision. That's exactly what RAG does!
405
406
**The Three Superheroes of Our RAG System:**
407
408
1. **The Retriever** šŸ•µļøā€ā™‚ļø: This is our detective - it looks at a new brain scan and instantly finds the most similar cases from our database. It's like having a photographic memory for medical images!
409
410
2. **The Generator** āœļø: This is our brilliant medical writer - it takes all the information our detective found and crafts a perfect, detailed report. It's like having a radiologist who can write like a medical journalist!
411
412
3. **The Knowledge Base** šŸ“š: This is our treasure trove - a massive collection of real medical cases and reports that our system can learn from. It's like having access to every medical textbook ever written!
413
414
**Here's the Step-by-Step Magic:**
415
416
- **Step 1** šŸ”: Our MobileNetV3 model extracts the "fingerprint" of the new brain scan
417
- **Step 2** šŸŽÆ: It searches through our database and finds the most similar previous case
418
- **Step 3** šŸ“‹: It grabs the medical report from that similar case
419
- **Step 4** 🧠: It combines this context with our generation prompt
420
- **Step 5** ✨: Our Gemma3 text model creates a brand new, super-accurate report
421
422
**Why This is Revolutionary:**
423
424
- **šŸŽÆ Factual Accuracy**: Instead of guessing, we're using real medical reports as our guide
425
- **šŸ” Relevance**: We're finding the most similar cases, not just any random information
426
- **⚔ Efficiency**: We're using a smaller, faster model but getting better results
427
- **šŸ“Š Traceability**: We can show exactly which previous cases influenced our diagnosis
428
- **šŸš€ Scalability**: We can easily add new cases to make our system even smarter
429
430
**The Real Magic:** This isn't just about making AI smarter - it's about making AI more trustworthy, more accurate, and more useful in real-world medical applications. We're building the future of AI-assisted medicine!
431
432
Ready to see this magic in action? Let's run our RAG pipeline! šŸŽÆāœØ
433
"""
434
435
436
def rag_pipeline(query_img_path, db_image_paths, db_reports, vision_model, text_model):
437
"""
438
Retrieval-Augmented Generation pipeline using vision model for retrieval and a compact text model for report generation.
439
Args:
440
query_img_path (str): Path to the query image
441
db_image_paths (list): List of database image paths
442
db_reports (list): List of database reports
443
vision_model: Vision model for feature extraction
444
text_model: Compact text model for report generation
445
Returns:
446
tuple: (best_idx, retrieved_report, generated_report)
447
"""
448
# Extract features for the query image
449
query_features = extract_image_features(query_img_path, vision_model)
450
# Extract features for the database images
451
db_features = np.vstack(
452
[extract_image_features(p, vision_model) for p in db_image_paths]
453
)
454
# Ensure features are numpy arrays for similarity search
455
db_features_np = np.array(db_features)
456
query_features_np = np.array(query_features)
457
# Similarity search
458
similarity = np.dot(db_features_np, query_features_np.T).squeeze()
459
best_idx = np.argmax(similarity)
460
retrieved_report = db_reports[best_idx]
461
print(f"[RAG] Matched image index: {best_idx}")
462
print(f"[RAG] Matched image path: {db_image_paths[best_idx]}")
463
print(f"[RAG] Retrieved context/report:\n{retrieved_report}\n")
464
PROMPT_TEMPLATE = (
465
"Context:\n{context}\n\n"
466
"Based on the above radiology report and the provided brain MRI image, please:\n"
467
"1. Provide a diagnostic impression.\n"
468
"2. Explain the diagnostic reasoning.\n"
469
"3. Suggest possible treatment options.\n"
470
"Format your answer as a structured radiology report.\n"
471
)
472
prompt = PROMPT_TEMPLATE.format(context=retrieved_report)
473
# Generate report using the text model (text only, no image input)
474
output = text_model.generate(
475
{
476
"prompts": prompt,
477
}
478
)
479
cleaned_output = clean_generated_output(output, prompt)
480
return best_idx, retrieved_report, cleaned_output
481
482
483
# Split data: first 3 as database, last as query
484
db_image_paths = image_paths[:-1]
485
query_img_path = image_paths[-1]
486
487
# Run RAG pipeline
488
print("Running RAG pipeline...")
489
best_idx, retrieved_report, generated_report = rag_pipeline(
490
query_img_path, db_image_paths, db_reports, vision_model, text_model
491
)
492
493
# Visualize results
494
visualize_prediction(query_img_path, db_image_paths, best_idx, db_reports)
495
496
# Print RAG results
497
print("\n" + "=" * 50)
498
print("RAG PIPELINE RESULTS")
499
print("=" * 50)
500
print(f"\nMatched DB Report Index: {best_idx}")
501
print(f"Matched DB Report: {retrieved_report}")
502
print("\n--- Generated Report ---\n", generated_report)
503
504
505
"""
506
## The Ultimate Showdown: RAG vs Traditional AI! 🄊
507
508
Alright, now we're getting to the really exciting part! We've built our amazing RAG system, but how do we know it's actually better than traditional approaches? Let's put it to the test!
509
510
**What we're about to do:** We're going to compare our RAG system with a traditional Vision-Language Model (VLM) approach. Think of this as a scientific experiment where we're testing two different methods to see which one performs better.
511
512
**The Battle of the Titans:**
513
514
- **🄊 RAG Approach**: Our smart system using MobileNetV3 + Gemma3 1B (1B total parameters) with retrieved medical context
515
- **🄊 Direct VLM Approach**: A traditional system using Gemma3 4B VLM (4B parameters) with only pre-trained knowledge
516
517
**Why this comparison is crucial:** This is like comparing a doctor who has access to thousands of previous cases versus one who only has their medical school training. Which one would you trust more?
518
519
**What we're going to discover:**
520
521
- **šŸ” The Power of Context**: How having access to similar medical cases dramatically improves accuracy
522
- **āš–ļø Size vs Intelligence**: Whether bigger models are always better (spoiler: they're not!)
523
- **šŸ„ Real-World Practicality**: Why RAG is more practical for actual medical applications
524
- **🧠 The Knowledge Gap**: How domain-specific knowledge beats general knowledge
525
526
**The Real Question:** Can a smaller, smarter system with access to relevant context outperform a larger system that's working in the dark?
527
528
**What makes this exciting:** This isn't just a technical comparison - it's about understanding the future of AI. We're testing whether intelligence comes from size or from having the right information at the right time.
529
530
Ready to see which approach wins? Let's run the ultimate AI showdown! šŸŽÆšŸ†
531
"""
532
533
534
def vlm_generate_report(query_img_path, vlm_model, question=None):
535
"""
536
Generate a radiology report directly from the image using a vision-language model.
537
Args:
538
query_img_path (str): Path to the query image
539
vlm_model: Pre-trained vision-language model (Gemma3 4B VLM)
540
question (str): Optional question or prompt to include
541
Returns:
542
str: Generated radiology report
543
"""
544
PROMPT_TEMPLATE = (
545
"Based on the provided brain MRI image, please:\n"
546
"1. Provide a diagnostic impression.\n"
547
"2. Explain the diagnostic reasoning.\n"
548
"3. Suggest possible treatment options.\n"
549
"Format your answer as a structured radiology report.\n"
550
)
551
if question is None:
552
question = ""
553
# Preprocess the image as required by the model
554
img = Image.open(query_img_path).convert("RGB").resize((224, 224))
555
image = np.array(img) / 255.0
556
image = np.expand_dims(image, axis=0)
557
# Generate report using the VLM
558
output = vlm_model.generate(
559
{
560
"images": image,
561
"prompts": PROMPT_TEMPLATE.format(question=question),
562
}
563
)
564
# Clean the generated output
565
cleaned_output = clean_generated_output(
566
output, PROMPT_TEMPLATE.format(question=question)
567
)
568
return cleaned_output
569
570
571
# Run VLM (direct approach)
572
print("\n" + "=" * 50)
573
print("VLM RESULTS (Direct Approach)")
574
print("=" * 50)
575
vlm_report = vlm_generate_report(query_img_path, vlm_model)
576
print("\n--- Vision-Language Model (No Retrieval) Report ---\n", vlm_report)
577
578
"""
579
## The Results Are In: RAG Wins! šŸ†
580
581
Drumroll please... 🄁 The results are in, and they're absolutely fascinating! Let's break down what we just discovered in our ultimate AI showdown.
582
583
**The Numbers Don't Lie:**
584
585
- **🄊 RAG Approach**: MobileNet + Gemma3 1B text model (~1B total parameters)
586
- **🄊 Direct VLM Approach**: Gemma3 VLM 4B model (~4B total parameters)
587
- **šŸ† Winner**: RAG pipeline! (And here's why it's revolutionary...)
588
589
**What We Just Proved:**
590
591
**šŸŽÆ Accuracy & Relevance - RAG Dominates!**
592
593
- Our RAG system provides contextually relevant, case-specific reports that often match or exceed the quality of much larger models
594
- The traditional VLM produces more generic, "textbook" responses that lack the specificity of real medical cases
595
- It's like comparing a doctor who's seen thousands of similar cases versus one who's only read about them in textbooks!
596
597
**⚔ Speed & Efficiency - RAG is Lightning Fast!**
598
599
- Our RAG system is significantly faster and more memory-efficient
600
- It can run on edge devices and provide real-time results
601
- The larger VLM requires massive computational resources and is much slower
602
- Think of it as comparing a sports car to a freight train - both can get you there, but one is much more practical!
603
604
**šŸ”„ Scalability & Flexibility - RAG is Future-Proof!**
605
606
- Our RAG approach can easily adapt to new domains or datasets
607
- We can swap out different models without retraining everything
608
- The traditional approach requires expensive retraining for new domains
609
- It's like having a modular system versus a monolithic one!
610
611
**šŸ” Interpretability & Trust - RAG is Transparent!**
612
613
- Our RAG system shows exactly which previous cases influenced its decision
614
- This transparency builds trust and helps with clinical validation
615
- The traditional approach is a "black box" - we don't know why it made certain decisions
616
- In medicine, trust and transparency are everything!
617
618
**šŸ„ Real-World Practicality - RAG is Ready for Action!**
619
620
- Our RAG system can be deployed in resource-constrained environments
621
- It can be continuously improved by adding new cases to the database
622
- The traditional approach requires expensive cloud infrastructure
623
- This is the difference between a practical solution and a research project!
624
625
**The Bottom Line:**
626
627
We've just proven that intelligence isn't about size - it's about having the right information at the right time. Our RAG system is smaller, faster, more accurate, and more practical than traditional approaches. This isn't just a technical victory - it's a glimpse into the future of AI! šŸš€āœØ
628
"""
629
630
"""
631
## Congratulations! You've Just Built the Future of AI! šŸŽ‰
632
633
Wow! What an incredible journey we've been on together! We started with a simple idea and ended up building something that could revolutionize how AI systems work in the real world. Let's take a moment to celebrate what we've accomplished!
634
635
**What We Just Built Together:**
636
637
**šŸ¤– The Ultimate AI Dream Team:**
638
639
- **MobileNetV3 + Gemma3 1B text model** - Our dynamic duo that works together like a well-oiled machine
640
- **Gemma3 4B VLM model** - Our worthy opponent that helped us prove our point
641
- **KerasHub Integration** - The magic that made it all possible
642
643
**šŸ”¬ Real-World Medical Analysis:**
644
645
- **Feature Extraction** - We taught our AI to "see" brain MRI images like a radiologist
646
- **Similarity Search** - We built a system that can instantly find similar medical cases
647
- **Report Generation** - We created an AI that writes detailed, accurate medical reports
648
- **Comparative Analysis** - We proved that our approach is better than traditional methods
649
650
**šŸš€ Revolutionary Results:**
651
652
- **Enhanced Accuracy** - Our system provides more relevant, contextually aware outputs
653
- **Scalable Architecture** - We built something that can grow and adapt to new challenges
654
- **Real-World Applicability** - This isn't just research - it's ready for actual medical applications
655
- **Future-Proof Design** - Our system can evolve and improve over time
656
657
**The Real Magic:** We've just demonstrated that the future of AI isn't about building bigger and bigger models. It's about building smarter systems that know how to find and use the right information at the right time. We've shown that a small, well-designed system with access to relevant context can outperform massive models that work in isolation.
658
659
**What This Means for the Future:** This isn't just about medical imaging - this approach can be applied to any field where having access to relevant context makes a difference. From legal document analysis to financial forecasting, from scientific research to creative writing, the principles we've demonstrated here can revolutionize how AI systems work.
660
661
**You're Now Part of the AI Revolution:** By understanding and building this RAG system, you're now equipped with knowledge that's at the cutting edge of AI development. You understand not just how to use AI models, but how to make them work together intelligently.
662
663
**The Journey Continues:** This is just the beginning! The world of AI is evolving rapidly, and the techniques we've explored here are just the tip of the iceberg. Keep experimenting, keep learning, and keep building amazing things!
664
665
**Thank you for joining this adventure!** šŸš€āœØ
666
667
And we've just built something beautiful together! 🌟
668
"""
669
670
"""
671
## Security Warning
672
673
āš ļø **IMPORTANT SECURITY AND PRIVACY CONSIDERATIONS**
674
675
This pipeline is for educational purposes only. For production use:
676
677
- Anonymize medical data following HIPAA guidelines
678
- Implement access controls and encryption
679
- Validate inputs and secure APIs
680
- Consult medical professionals for clinical decisions
681
- This system should NOT be used for actual medical diagnosis without proper validation
682
"""
683
684