CoCalc -- Image & Video Segmentation

GitHub Repository: aswintechguy/Deep-Learning-Projects
Path: blob/main/Image & Video Segmentation using SAM2.1/Image & Video Segmentation - SAM2.1.ipynb
⁵⁷⁸ views

Kernel: Python 3 (ipykernel)

In [ ]:

!pip install -qU ultralytics

Initialize SAM Model

In [1]:

from ultralytics import SAM
import matplotlib.pyplot as plt

# load the model
model = SAM('sam2.1_b.pt')

# display model info
model.info()

Out[1]:

Model summary: 403 layers, 80,850,178 parameters, 80,850,178 gradients

(403, 80850178, 80850178, 0.0)

In [2]:

# url - https://ultralytics.com/images/bus.jpg

Segment Image

In [2]:

# define bounding box regions
bboxes = [[55, 400, 230, 900]]

image_path = 'test_image.jpg'
results = model(image_path, bboxes=bboxes)

Out[2]:

image 1/1 D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_image.jpg: 1024x1024 1 0, 340.4ms
Speed: 34.1ms preprocess, 340.4ms inference, 13.1ms postprocess per image at shape (1, 3, 1024, 1024)

In [10]:

for result in results:
    result.show()

In [11]:

# define single points
points = [[350, 370]]
results = model(image_path, points=points, labels=[1])

Out[11]:

image 1/1 D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_image.jpg: 1024x1024 1 0, 388.6ms
Speed: 8.1ms preprocess, 388.6ms inference, 0.4ms postprocess per image at shape (1, 3, 1024, 1024)

In [12]:

for result in results:
    result.show()

In [3]:

# define multiple points
points = [[350, 370], [100, 650]]
results = model(image_path, points=points, labels=[1, 0])

Out[3]:

image 1/1 D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_image.jpg: 1024x1024 1 0, 1 1, 319.8ms
Speed: 12.7ms preprocess, 319.8ms inference, 0.4ms postprocess per image at shape (1, 3, 1024, 1024)

In [14]:

for result in results:
    result.show()

Extract BBox Image from the Original Image

In [4]:

import cv2
import torch
import numpy as np

In [7]:

result.boxes.xyxy

Out[7]:

tensor([[ 17., 232., 800., 726.],
        [ 57., 401., 205., 896.]], device='cuda:0')

In [6]:

image = cv2.imread(image_path)

for i, result in enumerate(results):
    if hasattr(result, 'boxes') and result.boxes is not None:
        boxes = result.boxes.xyxy.cpu().numpy() if isinstance(result.boxes.xyxy, torch.Tensor) else np.array(result.boxes.xyxy)

        # iterate through the bounding boxes
        for j, box in enumerate(boxes):
            x1, y1, x2, y2 = map(int, box[:4])

            cropped_img = image[y1:y2, x1: x2]

            # show the image
            cv2.imshow(f"Cropped Image {i}_{j}", cropped_img)
            cv2.waitKey(0)

cv2.destroyAllWindows()

Segment Video

In [8]:

from ultralytics.models.sam import SAM2VideoPredictor

# define model parameters
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model='sam2.1_b.pt')

predictor = SAM2VideoPredictor(overrides=overrides)

In [9]:

video_path = 'test_video.mp4'

results = predictor(source=video_path, points=[900, 820], labels=[1])

Out[9]:

Ultralytics 8.3.91  Python-3.12.3 torch-2.5.1 CUDA:0 (NVIDIA GeForce RTX 4070, 12282MiB)

WARNING  inference results will accumulate in RAM unless `stream=True` is passed, causing potential out-of-memory
errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 178.3ms
video 1/1 (frame 2/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 133.1ms
video 1/1 (frame 3/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 120.7ms
video 1/1 (frame 4/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 125.5ms
video 1/1 (frame 5/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 132.5ms
video 1/1 (frame 6/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.0ms
video 1/1 (frame 7/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 151.5ms
video 1/1 (frame 8/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 160.3ms
video 1/1 (frame 9/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.5ms
video 1/1 (frame 10/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.2ms
video 1/1 (frame 11/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.5ms
video 1/1 (frame 12/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.5ms
video 1/1 (frame 13/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.4ms
video 1/1 (frame 14/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.1ms
video 1/1 (frame 15/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.3ms
video 1/1 (frame 16/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.2ms
video 1/1 (frame 17/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms
video 1/1 (frame 18/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.9ms
video 1/1 (frame 19/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.6ms
video 1/1 (frame 20/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.8ms
video 1/1 (frame 21/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.8ms
video 1/1 (frame 22/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.1ms
video 1/1 (frame 23/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.2ms
video 1/1 (frame 24/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.0ms
video 1/1 (frame 25/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 149.0ms
video 1/1 (frame 26/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 149.9ms
video 1/1 (frame 27/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 174.6ms
video 1/1 (frame 28/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.3ms
video 1/1 (frame 29/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.7ms
video 1/1 (frame 30/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.7ms
video 1/1 (frame 31/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms
video 1/1 (frame 32/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 164.3ms
video 1/1 (frame 33/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.2ms
video 1/1 (frame 34/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.1ms
video 1/1 (frame 35/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.8ms
video 1/1 (frame 36/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.6ms
video 1/1 (frame 37/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.5ms
video 1/1 (frame 38/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.3ms
video 1/1 (frame 39/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.7ms
video 1/1 (frame 40/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.5ms
video 1/1 (frame 41/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 151.6ms
video 1/1 (frame 42/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.0ms
video 1/1 (frame 43/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.5ms
video 1/1 (frame 44/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 148.3ms
video 1/1 (frame 45/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.6ms
video 1/1 (frame 46/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.3ms
video 1/1 (frame 47/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.7ms
video 1/1 (frame 48/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.7ms
video 1/1 (frame 49/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 150.7ms
video 1/1 (frame 50/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 146.0ms
video 1/1 (frame 51/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.8ms
video 1/1 (frame 52/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.0ms
video 1/1 (frame 53/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.5ms
video 1/1 (frame 54/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 147.0ms
video 1/1 (frame 55/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 143.9ms
video 1/1 (frame 56/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms
video 1/1 (frame 57/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 144.3ms
video 1/1 (frame 58/58) D:\notebooks\temp projects\youtube\Image & Video Segmentation using SAM2.1\test_video.mp4: 1024x1024 1 0, 145.4ms
Speed: 4.3ms preprocess, 146.9ms inference, 0.4ms postprocess per image at shape (1, 3, 1024, 1024)
Results saved to runs\segment\predict

In [ ]:

Initialize SAM Model

Segment Image

Extract BBox Image from the Original Image

Segment Video

Product

Resources

Company