Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

Conference on Neural Information Processing Systems (NeurIPS) 2025

1OriginAI, 2The Hebrew University of Jerusalem, Israel 3Bar-Ilan University, Israel
*Equal contribution

Top-1 Retrieval Results

Can you find the object?

Result 1
Query Image
Gallery Candidates
Result 1
Query Image
Gallery Candidates
Result 2
Query Image
Gallery Candidates
Result 2
Query Image
Gallery Candidates
Result 3
Query Image
Gallery Candidates
Result 3
Query Image
Gallery Candidates
Result 4
Query Image
Gallery Candidates
Result 4
Query Image
Gallery Candidates
Result 5
Query Image
Gallery Candidates
Result 5
Query Image
Gallery Candidates

Can you find the object? Each example shows a query image on the left with the target instance marked in red, alongside four candidate gallery images on the right. One of these four images contains the exact same instance as retrieved by our MaO method (top-1 result from a gallery of 1,580 images). Can you spot which one? This interactive challenge demonstrates why finding these small instances in cluttered, real-world scenes is such a difficult task—objects appear at different scales, viewpoints, and contexts, truly making it a search for a needle in a haystack.

Example retrieval results
Queries and the top-1 retrieved images using our MultiObject Attention Optimization (MaO) method. Note the small size and clutter of objects in the gallery images.
Retrieval performance analysis on VoxDet benchmark
Retrieval performance analysis on our VoxDet benchmark. The results demonstrate key challenges in small object retrieval, showing the impact of object size (a) and image resolution (b) on retrieval accuracy. Performance declines as object size decreases. Higher image resolutions enhance retrieval effectiveness for MaO, whereas other alternatives marginally benefit from increased resolution. MaO consistently outperforms existing methods, showing robustness across all conditions.

Abstract

We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object in a cluttered scene. The key challenge in this setting is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image.

In this paper, we first analyze the limitations of existing methods on this challenging task and then introduce new benchmarks to support SoIR evaluation. Next, we introduce Multi-object Attention Optimization (MaO), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase. This is followed by a refinement process that leverages attention-based feature extraction with object masks, integrating them into a single unified image descriptor.

Our MaO approach significantly outperforms existing retrieval methods and strong baselines, achieving notable improvements in both zero-shot and lightweight multi-object fine-tuning. We hope this work will lay the groundwork and inspire further research to enhance retrieval performance for this highly practical task.

VoxDet-SoIR Benchmark

Examples from our dataset containing multiple small objects. Each annotated object is shown with its relative size (with respect to the image dimensions) displayed above its bounding box.

BibTeX

@inproceedings{green2025findyourneedle,
  author    = {Green, Michael and Levy, Matan and Tzachor, Issar and Samuel, Dvir and Darshan, Nir and Ben-Ari, Rami},
  title     = {Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2025}
}