Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
aswintechguy
GitHub Repository: aswintechguy/Deep-Learning-Projects
Path: blob/main/Image & Video Segmentation using SAM2.1/Image & Video Segmentation - SAM2.1.ipynb
578 views
Kernel: Python 3 (ipykernel)
!pip install -qU ultralytics

Initialize SAM Model

from ultralytics import SAM import matplotlib.pyplot as plt # load the model model = SAM('sam2.1_b.pt') # display model info model.info()
Model summary: 403 layers, 80,850,178 parameters, 80,850,178 gradients
(403, 80850178, 80850178, 0.0)
# url - https://ultralytics.com/images/bus.jpg

Segment Image

# define bounding box regions bboxes = [[55, 400, 230, 900]] image_path = 'test_image.jpg' results = model(image_path, bboxes=bboxes)
image 1/1 D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_image.jpg: 1024x1024 1 0, 340.4ms Speed: 34.1ms preprocess, 340.4ms inference, 13.1ms postprocess per image at shape (1, 3, 1024, 1024)
for result in results: result.show()
# define single points points = [[350, 370]] results = model(image_path, points=points, labels=[1])
image 1/1 D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_image.jpg: 1024x1024 1 0, 388.6ms Speed: 8.1ms preprocess, 388.6ms inference, 0.4ms postprocess per image at shape (1, 3, 1024, 1024)
for result in results: result.show()
# define multiple points points = [[350, 370], [100, 650]] results = model(image_path, points=points, labels=[1, 0])
image 1/1 D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_image.jpg: 1024x1024 1 0, 1 1, 319.8ms Speed: 12.7ms preprocess, 319.8ms inference, 0.4ms postprocess per image at shape (1, 3, 1024, 1024)
for result in results: result.show()

Extract BBox Image from the Original Image

import cv2 import torch import numpy as np
result.boxes.xyxy
tensor([[ 17., 232., 800., 726.], [ 57., 401., 205., 896.]], device='cuda:0')
image = cv2.imread(image_path) for i, result in enumerate(results): if hasattr(result, 'boxes') and result.boxes is not None: boxes = result.boxes.xyxy.cpu().numpy() if isinstance(result.boxes.xyxy, torch.Tensor) else np.array(result.boxes.xyxy) # iterate through the bounding boxes for j, box in enumerate(boxes): x1, y1, x2, y2 = map(int, box[:4]) cropped_img = image[y1:y2, x1: x2] # show the image cv2.imshow(f"Cropped Image {i}_{j}", cropped_img) cv2.waitKey(0) cv2.destroyAllWindows()

Segment Video

from ultralytics.models.sam import SAM2VideoPredictor # define model parameters overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model='sam2.1_b.pt') predictor = SAM2VideoPredictor(overrides=overrides)
video_path = 'test_video.mp4' results = predictor(source=video_path, points=[900, 820], labels=[1])
Ultralytics 8.3.91 Python-3.12.3 torch-2.5.1 CUDA:0 (NVIDIA GeForce RTX 4070, 12282MiB) WARNING inference results will accumulate in RAM unless `stream=True` is passed, causing potential out-of-memory errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help. Example: results = model(source=..., stream=True) # generator of Results objects for r in results: boxes = r.boxes # Boxes object for bbox outputs masks = r.masks # Masks object for segment masks outputs probs = r.probs # Class probabilities for classification outputs video 1/1 (frame 1/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 178.3ms video 1/1 (frame 2/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 133.1ms video 1/1 (frame 3/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 120.7ms video 1/1 (frame 4/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 125.5ms video 1/1 (frame 5/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 132.5ms video 1/1 (frame 6/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.0ms video 1/1 (frame 7/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 151.5ms video 1/1 (frame 8/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 160.3ms video 1/1 (frame 9/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.5ms video 1/1 (frame 10/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.2ms video 1/1 (frame 11/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.5ms video 1/1 (frame 12/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.5ms video 1/1 (frame 13/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.4ms video 1/1 (frame 14/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.1ms video 1/1 (frame 15/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.3ms video 1/1 (frame 16/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.2ms video 1/1 (frame 17/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms video 1/1 (frame 18/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.9ms video 1/1 (frame 19/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.6ms video 1/1 (frame 20/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.8ms video 1/1 (frame 21/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.8ms video 1/1 (frame 22/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.1ms video 1/1 (frame 23/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.2ms video 1/1 (frame 24/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.0ms video 1/1 (frame 25/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 149.0ms video 1/1 (frame 26/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 149.9ms video 1/1 (frame 27/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 174.6ms video 1/1 (frame 28/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.3ms video 1/1 (frame 29/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.7ms video 1/1 (frame 30/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.7ms video 1/1 (frame 31/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms video 1/1 (frame 32/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 164.3ms video 1/1 (frame 33/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.2ms video 1/1 (frame 34/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.1ms video 1/1 (frame 35/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.8ms video 1/1 (frame 36/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.6ms video 1/1 (frame 37/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.5ms video 1/1 (frame 38/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.3ms video 1/1 (frame 39/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.7ms video 1/1 (frame 40/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.5ms video 1/1 (frame 41/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 151.6ms video 1/1 (frame 42/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.0ms video 1/1 (frame 43/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.5ms video 1/1 (frame 44/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.3ms video 1/1 (frame 45/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.6ms video 1/1 (frame 46/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.3ms video 1/1 (frame 47/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.7ms video 1/1 (frame 48/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.7ms video 1/1 (frame 49/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 150.7ms video 1/1 (frame 50/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.0ms video 1/1 (frame 51/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.8ms video 1/1 (frame 52/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.0ms video 1/1 (frame 53/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.5ms video 1/1 (frame 54/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.0ms video 1/1 (frame 55/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 143.9ms video 1/1 (frame 56/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms video 1/1 (frame 57/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.3ms video 1/1 (frame 58/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms Speed: 4.3ms preprocess, 146.9ms inference, 0.4ms postprocess per image at shape (1, 3, 1024, 1024) Results saved to runs\segment\predict