Your images,
intelligently sorted

ImageSieve uses CLIP to understand the semantic content of your photos and automatically sort them into categories you define with natural language. On-device inference via CoreML means your images never leave your machine.

View on GitHub Explore features

ImageSieve — Categorizing 847 images...

FileBest CategoryDistanceStatus

IMG_4821.heiclandscapes0.82COPY

DSC_0034.jpgportraits0.91COPY

photo_2024.pngcats0.76COPY

DCIM_1087.jpgarchitecture0.95COPY

Screenshot.pngscreenshots1.02COPY

random_img.jpguncategorized1.41SKIP

5 categorized|1 uncategorized|33 categories active

Native macOS & iOS App

Go CLI for Batch Processing

On-Device CoreML Inference

AI Categorization

Define categories with natural language descriptions like "a photo of a cat" or "scenic mountain landscape." ImageSieve uses CLIP ViT-L-14 to match images semantically -- no manual tagging, no keyword matching. 33 pre-configured categories included out of the box.

Semantic Search

Search your library with text, images, or both. Type "sunset over the ocean" and find matching photos ranked by CLIP similarity. Drop a reference image to find visually similar ones. Adaptive filtering uses mean + standard deviation for smart result thresholds.

Vision Tagging

Apple Vision framework automatically detects scenes, animals, faces, text, and objects in your images with 1000+ classification labels. Tags are searchable and visible in the image inspector alongside CLIP categories and EXIF data.

EXIF Extraction

Camera make, model, lens, focal length, aperture, shutter speed, ISO, GPS coordinates, and more. All metadata is extracted from JPEG, HEIC, and RAW files. Browse by camera or lens, and explore shooting patterns in the EXIF Explorer visualization.

Duplicate Detection

Find duplicate and visually similar images using CLIP vector comparison. Three threshold presets -- exact, similar, and loose -- let you dial in how aggressively to flag matches. Review side-by-side and keep only the shots you want.

3D Visualizations

Four Metal-rendered interactive views to explore your library in 3D. Cluster Space projects CLIP embeddings via t-SNE. Tag Nebula maps Vision labels. Photo Map plots GPS coordinates. EXIF Explorer visualizes camera settings. All rendered at 60fps with Metal.

On-Device Inference

CoreML runs CLIP ViT-L-14 entirely on your device using the Apple Neural Engine and Metal GPU. No cloud, no server, no API keys. Your photos never leave your machine. The CLI tool can also connect to a local or remote CLIP server for batch processing.

Spotlight & Siri

Categorized images are indexed in Spotlight search -- find photos by category name from anywhere on your Mac. Siri Shortcuts for "Get Library Statistics" and "List Categories" give you voice access to your library metadata.

CLI Tool

Batch-process thousands of images from the command line with the Go-based CLI. Connects to a CLIP server (Python or Swift) over HTTP for maximum throughput. Supports YAML configuration for categories, thresholds, and output formats.

Define categories in natural language

Create categories using plain English descriptions. Be as specific or broad as you like -- CLIP understands semantic meaning, not just keywords. Each category has a tuned distance threshold that controls how strict the matching is.

categories.yaml

categories:
  - name: landscapes
    query: "a landscape photo of nature, mountains, or scenic view"
    maxDistance: 1.30

  - name: portraits
    query: "a portrait photo of a person, headshot, or face"
    maxDistance: 1.28

  - name: cats
    query: "a photo of a cat or kitten"
    maxDistance: 1.25

  - name: food
    query: "a photo of food, a meal, or cooking"
    maxDistance: 1.28

  - name: architecture
    query: "a photo of a building, architecture, or interior design"
    maxDistance: 1.30

CLIP encodes images & text into vectors

Both your images and category descriptions are converted into 768-dimensional vectors by the CLIP ViT-L-14 model. Images and text share the same embedding space, so semantically similar content ends up near each other. On the native app, this runs entirely on-device via CoreML and the Apple Neural Engine.

beach.jpg→[0.023, -0.156, 0.891, 0.342, -0.018, ... ]768-dim

"landscapes"→[0.041, -0.142, 0.877, 0.315, -0.033, ... ]768-dim

"portraits"→[-0.312, 0.654, 0.119, -0.445, 0.287, ... ]768-dim

"food"→[0.187, 0.033, -0.562, 0.711, 0.094, ... ]768-dim

angular_distance = sqrt(2 * (1 - cosine_similarity))

Angular distance determines the match

The angular distance between each image vector and all category vectors is calculated. The closest match below the threshold wins. Images with no close match go to "uncategorized." Lower thresholds (1.20-1.25) mean stricter matching for very specific categories, while higher thresholds (1.32-1.38) allow broader, more abstract categories.

CategoryDistanceThresholdResult

landscapes0.821.30MATCH

food1.211.28skip

portraits1.311.28skip

cats1.451.25skip

architecture1.481.30skip

Landscapes32

Portraits28

Cats22

Architecture25

Food20

Animals26

Travel22

Sports18

Each point represents a photo. Colors indicate the assigned category. Nearby points share semantic meaning -- even if they look nothing alike to the human eye. The native app renders this in real-time using Metal shaders and Apple's Accelerate framework for vectorized t-SNE.

cats1.25

guitars1.25

motorbikes1.25

dreadlocks1.25

animals1.28

boxing1.28

cannabis1.28

instruments1.28

electronics1.28

books1.28

tickets1.28

tattoos1.28

food1.28

portraits1.28

landscapes1.30

architecture1.30

furniture1.30

hairstyles1.30

transport1.30

cooking1.32

sports1.32

restaurants1.32

travel1.32

festivals1.32

fashion1.32

art1.32

programming1.32

golang1.32

events1.35

datascience1.35

hacking1.35

memes1.38

screenshots1.38

1.20 - 1.25Very StrictOnly very close semantic matches

1.25 - 1.28StrictSpecific visual objects

1.28 - 1.30ModerateBalanced precision and recall

1.30 - 1.35RelaxedBroader categories

1.35 - 1.38LooseAbstract or diverse categories

Categories are fully customizable via YAML configuration or the native app's category editor

Ready to organize your photo library?

Stop manually sorting thousands of images. Let CLIP AI understand your photos and categorize them with natural language -- on-device, private, and fast.

Get Started on GitHub

Native macOS & iOS App

On-Device CoreML Inference

33 Pre-configured Categories

Your images,
intelligently sorted

Everything you need to
understand your images

AI Categorization

Semantic Search

Vision Tagging

EXIF Extraction

Duplicate Detection

3D Visualizations

On-Device Inference

Spotlight & Siri

CLI Tool

How CLIP matching works

Define categories in natural language

CLIP encodes images & text into vectors

Angular distance determines the match

Explore the vector space

See it in action

Ready to organize your photo library?

Your images,intelligently sorted

Everything you need tounderstand your images

AI Categorization

Semantic Search

Vision Tagging

EXIF Extraction

Duplicate Detection

3D Visualizations

On-Device Inference

Spotlight & Siri

CLI Tool

How CLIP matching works

Define categories in natural language

CLIP encodes images & text into vectors

Angular distance determines the match

Explore the vector space

See it in action

Ready to organize your photo library?

Your images,
intelligently sorted

Everything you need to
understand your images