Yet Another Computer Vision Index To Datasets (YACVID)

This website provides a list of frequently used computer vision datasets. Wait, there is more!
There is also a description containing common problems, pitfalls and characteristics and now a searchable TAG cloud.
Plus, this is open for crowd editing (if you pass the ultimate turing test)! - Questions? yacvid [at] hayko [dot] at

Content, Design and Idea © by Hayko Riemenschneider, 2011-2016. Texts and Images are subject of copyright by the respective authors.

Hey! If you're reading this, why not help and update the description of the dataset you're working on?

Add a new dataset



2d   3d   4d   aachen   abdomen   abrupt   accelerometer   action   actions   activities   activity   address   adhead   adjustment   aerial   aesthetics   age   aircraft   airplane   airport   alignment   amazon   ambiguous   analysis   and   anger   animal   animation   annotation   anomaly   apartment   api   appearance   applelogo   architecture   articulation   aspect   attention   attribute   attributes   authentication   automatic   autonomous   avoid   axis   babyface   background   balance   baseline   behavior   belgium   benchmark   benchmarking   bike   bilateral   binary   biology   biometric   biometry   blender   blur   boat   body   bone   bottle   boundingbox   brand   bremen   buffy   building   bullseye   bundle   bunny   byu   cad   calibration   california   caltech   camera   canada   captioning   captions   capture   car   cardinal   categorization   category   celebrity   cell   centered   chair   challenge   change   chemistry   chest   chromaticity   church   circle   cities   city   classification   clothing   clustering   clutter   cnn   co-segmentation   co-skeletonization   coco   code   codebook   coffee   color   community   comparison   computer   conditions   constancy   context   contour   cooking   copyright   cosegmentation   counting   cover   cow   crepe   cross-view   crowd   ct   cutting   daily   dance   data   dataset   day   daylight   decomposition   deep   defocus   deformation   dense   depth   description   descriptor   detail   detection   dichromatic   disgust   disparity   dogs   domain   dped   driving   drone   dubrovnik   duplicate   dynamic   ear   edge   egocentric   ellipse   emotion   endtoend   enhancement   estimation   evaluation   event   expression   eye   facade   face   facial   fashion   fear   feature   field   fine-grained   fingerprint   fingertip   first-person   fish   fisheye   fitting   flickr   flight   floorplan   flow   fly   flying   food   foot   footprint   foreground   fov   frames   frontview   fundus   gait   game   gan   gaze   gender   genetic   genome   geography   geometry   geotag   geotagged   germany   gesture   getry   gif   giraffe   gis   global   google   gps   grammar   graphics   graz   ground   groundtruth   group   gsd   hand   handwritten   hd   head   heart   heat   hierarchy   high-definition   highlight   highway   holes   horse   house   human   humans   identification   illumination   image   imagenet   images   imdb   indoor   inertial   initialization   inserts   instance   intake   interaction   interactive   interest   internet   invariance   ir   isar   joy   kernels   keyframe   kimia   kinect   label   labeling   laboratory   land   landmark   lane   language   large   large-scale   laser   lattice   layout   learning   letter   leuven   lidar   light   lightfield   lighting   limited   line   lip   lisbon   liver   local   localization   location   logo   lowlevel   machine   manhattan   map   maritime   mask   match   matching   material   medial   medical   medicine   memorability   mesh   metadata   milling   mirror   mobile   model   modeling   modelling   monitoring   mono   montage   motion   motion-capture-data   motorbike   mouse   mouth   movement   movie   mpeg   mug   multi-camera   multi-class   multi-human   multi-mode   multi-sensor   multi-spectral   multi-view   multilabel   multimodal   multiple   multitarget   multiview   naming   natural   nature   navigation   network   neutral   newyork   night   nir   noise   normal   nude   number   object   objects   occlusion   ocr   odometry   omnidirection   omnidirectional   open-view   operation   optical   optimization   organ   original   osnabrueck   outdoor   overhead   overlap   oxford   pair   pairwise   pan   panorama   panoramio   parallel   paris   parsing   part   partial   pasadena   pascal   patch   path   pattern   pedestrian   people   person   perspective   phase   photo   photogrammetry   physics   pittsburgh   place   plane   planning   point   pointcloud   polygon   popularity   pornography   pose   presentation   pressure   primitive   privacy   procedural   profile   proposal   ptz   quality   question   radar   random   rank   ranking   ransac   rate   ratio   re-identification   reading   real   realism   recipe   recognition   reconstruction   rectification   rectified   reflection   registration   regression   regular   reidentification   remote   removal   rendering   repetition   resolution   retina   retinal   retrieval   rgb   rgb-d   rgbd   road   robot   robust   rome   room   ros   rotation   sad   saliency   sampling   sanfrancisco   satellite   scale   scan   scanner   scene   scenes   search   segmentation   selfdriving   semantic   sense   sensing   sequence   sfm   shadow   shadows   shape   sheffield   shoes   shots   shutter   sideview   sign   similarity   simultaneous   single   singleview   skeleton   skeletonization   sketch   skin   sky   slam   soccer   social   software   source   space   spain   spanish   speaker   speech   sphere   sport   stability   stabilization   static   stationary   stereo   stereovision   stochastic   street   structure   structured   study   stuff   stylization   subpixel   subtraction   summarization   summary   superpixel   superresolution   supervised   surface   surgery   surprise   surveillance   swan   switzerland   sydney   symmetry   synthetic   table   target   taxonomy   temporal   text   texture   texture-less   therapy   thermal   things   time   time-series   tiny   tool   tools   top-view   tracking   traffic   trajectory   transfer   transportation   triangulation   truth   tuberculosis   type   uas   uav   udacity   ultrasound   understanding   uneven   unmanned   unsupervised   urban   user   vanishing   variation   vehicle   vessel   video   view   viewpoint   visible   vision   visual   volleyball   vqa   vt   water   wavelength   weakly   wear   wearable   weather   webcam   white   wide   wikipedia   wild   workflow   world   xray   year   zoom   zurich  
«showing 591 tags of 591 total tags for 420 datasets (1.41) »


classification
DID Name Description Tags URL Date Views
419 UC Merced Land Use Dataset The UC Mercet dataset is a 21 class land use image dataset meant for research purposes. There are 100 images for each of the following classes: agricultural ... semantic segmentation classification aerial land building urban link 2017-11-28 35
418 Udacity Annotated Driving Datasets Udacity Annotated Driving Datasets have two datasets: Dataset 1 The dataset includes driving in Mountain View California and neighboring cities during dayli... classification segmentation urban street selfdriving autonomous udacity annotation california city daylight link 2017-11-08 64
402 GeoFaces A large dataset of geotagged face images collected from Flickr. The zip file contains text files containing urls of the images. Face2GPS: Estimating Geograph... face localization geotagged classification gender age human link 2017-09-06 104
397 MPI-I VISPR (Visual Privacy) We present a dataset to address the problem of visual privacy - where users unintentionally leak private information when sharing personal images online, such a... privacy multilabel classification flickr scene regression link 2017-08-08 75
395 AWS Public Datasets AWS hosts a variety of public datasets that anyone can access for free. Previously, large datasets such as satellite imagery or genomic data have required hour... amazon classification deep learning segmentation recognition satellite human biology space image resolution link 2017-07-28 166
388 Open Images Dataset Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried ... classification large-scale category real image deep annotation automatic link 2017-07-02 195
357 udacity self-driving-car At Udacity, we believe in democratizing education. How can we provide opportunity to everyone on the planet? We also believe in teaching really amazing and usef... car robot driving autonomous street urban video recognition detection classification segmentation time synthetic link 2017-03-15 573
356 The Oxford RobotCar Dataset The Oxford RobotCar Dataset contains over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year. The dataset captures ... car robot driving autonomous street urban video recognition detection classification segmentation time year link 2017-01-04 504
354 Facial Expression Research Group Database (FERG-DB), University of Washington, Seattle FERG-DB is a database of stylized characters with annotated facial expressions. The database contains multiple face images of six stylized characters. The chara... Face, Facial expression, Animation, Stylization, annotation emotion, deep learning, anger, sad, joy, disgust, surprise, neutral, fear, cardinal classification, human transfer, image retrieval link 2017-02-27 514
342 ICS-FORTH + Modelling of 2D Shapes with Ellipses The dataset contains more than 4,536 2D shapes included in standard as well as in home-build datasets. Our goal is to represent a given 2D shape with an au... shape ellipse fitting modelling 2d object classification link 2017-11-28 282
325 Synthesized Inverse Synthetic Aperture Radar (ISAR) Images of Aircrafts The database contains synthesized inverse synthetic aperture radar images of seven aircraft models. Reference: Hari Kishan Kondaveeti, Valli Kumari Va... ISAR, image, classification link 2016-03-17 610
321 Webcam Interestingness The Webcam Interestingness dataset consists of 20 different webcam streams, with 159 images each. It is annotated with interestingness ground truth, acquired in... webcam interest classification retrieval ranking video weather link 2016-03-02 551
316 Extreme Classification Repository The Extreme Classification Repository: Multi-label Datasets & Code Kush Bhatia Himanshu Jain Prateek Jain Manik Varma The objective in extreme multi... machine learning multilabel classification benchmark evaluation link 2017-10-25 710
307 HandNet annotated hand dataset The HandNet dataset contains depth images of 10 participants hands non-rigidly deforming infront of a RealSense RGB-D camera. This dataset includes 214971 a... hand articulation segmentation classification detection pose fingertip rgbd video link 2017-09-12 937
304 Ian Dworkin (McMaster University) This is the database of biological images (from the genetics model system, Drosophila melanogaster, a fruit fly) across multiple levels of variation. we have... biology genetic variation fly animal classification link 2016-02-11 613
278 Comprehensive Cars (CompCars) The Comprehensive Cars (CompCars) dataset contains data from two scenarios, including images from web-nature and surveillance-nature. The web-nature data contai... car vehicle recognition attribute classification fine-grained urban object link 2017-12-07 1528
274 UBO 2014 Materials The UBO 2014 consists of 7 semantic categories. Each of these 7 material categories contains measurements of 12 different material instances for being capable t... material light illumination texture classification recognition link 2015-03-27 598
258 Visual Attributes dataset The Visual Attributes dataset contains visual attribute annotations for over 500 object classes (animate and inanimate) which are all represented in ImageNet. E... classification recognition attribute imagenet object link 2016-10-02 943
253 Street View House Number (SVHN) SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatti... street number recognition classification urban detection text real world link 2017-11-28 996
251 ETHZ CVL RueMonge 2014 This ETHZ CVL RueMonge 2014 dataset used for 3D reconstruction and semantic mesh labelling for urban scene understanding. It was first published in [1] and p... semantic segmentation 3d reconstruction architecture paris benchmark source code urban recognition classification outdoor pointcloud mesh link 2014-11-24 1365
248 VIDEO datasets overview Many different labeled video datasets have been collected over the past few years, but it is hard to compare them at a glance. So we have created a handy spread... video benchmark recognition classification detection object action link 2014-09-30 1138
242 Stanford Dogs Dataset The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for... classification, detection, fine-grained categorization, dogs link 2015-07-29 1343
234 UMD Dynamic Scene Recognition The UMD Dynamic Scene Recognition dataset consists of 13 classes and 10 videos per class and is used to classify dynamic scenes. The dataset has been describ... scene recognition classification dynamic video motion link 2017-01-05 982
230 FGVC-Aircraft Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft) is a benchmark dataset for the fine grained visual categorization of aircraft. Data, annotatio... fine-grained classification recognition benchmark evaluation aircraft airplane link 2017-02-16 1587
229 Paris Rue Madame Paris-rue-Madame dataset contains 3D Mobile Laser Scanning (MLS) data from rue Madame, a street in the 6th Parisian district (France). The test zone contains ap... semantic segmentation pointcloud 3d laser classification link 2014-06-10 732
228 MPI VehicleScenes Abstract Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. In thi... semantic segmentation scene understanding classification 3d car pedestrian link 2014-06-10 1134
226 Fish4Knowledge The Fish4Knowledge project (groups.inf.ed.ac.uk/f4k/) is pleased to announce the availability of 2 subsets of our tropical coral reef fish video and extracted... classification animal fish video motion nature recognition water camera link 2014-05-15 982
216 CVC Partial Occlusion Virtual Pedestrian The CVC Partial Occlusion Virtual Pedestrian datasets (CVC-01 to CVC-06) cover a range of scenarios of occluded pedestrians generated in a virtual and real envi... detection classification tracking pedestrian synthetic urban occlusion link 2016-03-15 1326
207 CASIA Gait Recognition Dataset Dataset A (former NLPR Gait Database) was created on Dec. 10, 2001, including 20 persons. Each person has 12 image sequences, 4 sequences for each of the three ... gait recognition biometry action classification motion human foot pressure link 2017-03-10 2259
206 GaTech VideoContext The GaTech VideoContext dataset consists of over 100 groundtruth annotated outdoor videos with over 20000 frames for the task of geometric context evaluation i... video geometry context classification semantic segmentation unsupervised supervised outdoor urban nature link 2014-04-06 933
201 50 Salads The dataset captures 25 people preparing 2 mixed salads each and contains over 4h of annotated accelerometer and RGB-D video data. Annotated activities correspo... action activity recognition classification detection tracking video link 2013-10-05 899
200 Landmark 3D This dataset provides a collection of web images and 3D models for research on landmark recognition (especially for methods based on 3D models). We hope it coul... landmark recognition classification retrieval 3d reconstruction codebook matching feature flickr link 2016-08-09 1078
197 Stanford Background Dataset The Stanford Background Dataset is a new dataset introduced in Gould et al. (ICCV 2009) for evaluating methods for geometric and semantic scene understanding. T... semantic segmentation urban classification nature geometry link 2016-01-21 1694
195 Yotta The Yotta dataset consists of 70 images for semantic labeling given in 11 classes. It also contains multiple videos and camera matrices for 14km or driving. ... semantic segmentation urban video camera 3d reconstruction classification link 2013-09-30 930
191 Daimler Mono Pedestrian Classification Benchmark The Daimler Mono Pedestrian Classification Benchmark dataset consists of two parts: a base data set. The base data set contains a total of 4000 pedestrian- a... pedestrian classification outdoor urban object scale illumination link 2013-09-18 866
179 CMP Facades The CMP Facade dataset consists of facade images assembled at the Center for Machine Perception, which includes 600 rectified images of facades from various sou... facade rectification urban semantic classification recognition structure similarity segmentation link 2015-06-19 752
177 SIPI textures The Textures volume currently contains 154 images, all monochrome, 129 512x512 and 25 1024x1024. For the Brodatz texture images, the number in parenthesis (i... texture, segmentation, classification, benchmark, synthetic, evaluation link 2013-08-20 925
176 Brodatz Album The Brodatz dataset consists of 112 textures in grayscale images of various texture types. http://www.ee.oulu.fi/research/imag/texture/image_data/Brodatz32.h... texture, segmentation, classification, benchmark, synthetic link 2014-12-23 1125
175 Outex texture bench The Outex dataset is part of a framework for empirical evaluation of texture classification and segmentation algorithms. The framework is being constructed acc... texture, segmentation, classification, benchmark, synthetic link 2015-11-17 735
174 Pittsburgh Fast-food Image dataset The Pittsburgh Fast-food Image dataset (PFID) consists of 4545 still images, 606 stereo pairs, 3033600 videos for structure from motion, and 27 privacy-preservi... food recognition classification reconstruction video laboratory real link 2017-05-27 1692
167 Text and Vision (TVGraz) Dataset The Text and Vision (TVGraz) dataset is an annotated multi-modal dataset which currently contains 10 visual object categories, 4030 images and associated text. ... text appearance classification evaluation link 2017-01-10 1187
162 ICG PRID 2011 The Person Re-ID (PRID) 2011 dataset was created in co-operation with the Austrian Institute of Technology for the purpose of testing person re-identification a... pedestrian classification identification multiview trajectory illumination appearance change graz link 2017-06-29 1258
159 Caltech Game Covers Dataset The Caltech Game Covers dataset consists of CD/DVD covers of video games. The set was downloaded from freecovers.net during the summer of 2008. The set includes... classification retrieval game cover caltech hierarchy taxonomy link 2014-02-20 756
156 KUL Belgium Traffic Signs BelgiumTS is a large dataset with 10000+ traffic sign annotations, thousands of physically distinct traffic signs. 4 video sequences recorded with 8 high resolu... traffic sign classification urban road belgium camera calibration link 2017-11-28 1326
155 KUL Belgium Traffic Sign Classification BelgiumTSC dataset is built for traffic sign classification purposes. Is is a subset of BelgiumTS dataset and contains cropped images around annotations for 62 ... traffic sign classification urban road belgium link 2017-03-27 1033
154 WordNet WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a di... language, hierarchy, imagenet, classification link 2013-08-07 697
147 FlickrLogos-32 The FlickrLogos-32 dataset contains photos showing brand logos and is meant for the evaluation of multi-class logo recognition as well as logo retrieval methods... flickr, logo, detection, retrieval, image, object recognition, machine learning, classification brand boundingbox link 2017-11-14 1228
146 Multiple Instance Learning dataset MIL data sets used in our 2002 NIPS paper for Elepphant, Musk, TREC http://www.cs.cmu.edu/~juny/MILL/MIL-experiments.htm... machine learning, classification link 2013-05-30 806
145 KnapSack KNAPSACK_01 is a dataset directory which contains some examples of data for 01 Knapsack problems. In the 01 Knapsack problem, we are given a knapsack of fixe... machine learning, classification link 2013-05-31 841
144 MNIST hand-written letters The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of ... text, classification, letter link 2017-06-03 1697
141 Berkeley Multimodal Human Action Database (MHAD) The Berkeley Multimodal Human Action Database (MHAD) contains 11 actions performed by 7 male and 5 female subjects in the range 23-30 years of age except for on... action classification multiview motion recognition link 2014-02-03 945
140 RGB-D Person Re-identification The RGB-D Person Re-identification dataset is for person re-identification using depth information. The main motivation is that the standard techniques (such as... identification, classification, shape, depth, pedestrian, 3d link 2014-10-08 1032
126 ISPRS Urban Classification ISPRS Test Project on Urban Classification and 3D Building Reconstruction The ISPRS working group III/4 announces the release of the 2D semantic labeling ben... 3d, reconstruction, building, urban, city, semantic, classification, recognition link 2014-11-24 793
116 Sheffield Building Sheffield Building Image Dataset consists of over 3,000 low-resolution images of forty different buildings typically between 70 and 120 images per building. T... retrieval, classification, urban, sheffield link 2013-03-12 775
115 Pankrac Marseille Our repetitive pattern dataset with 106 images of app. 30 buildings from Pankrac, Prague and Marseille appearing in more than one image, number of appearances r... classification, retrieval, symmetry, repetition, urban link 2013-03-13 703
114 TUD Shapes 1+2 This material is supplementary to Michael Stark, Bernt Schiele. How Good are Local Features for Classes of Geometric Objects. Eleventh IEEE International C... shape object classification tool binary link 2013-08-08 769
103 COIL-100 The COIL-100 (Columbia University Image Library) consists of 100 objects. For formal documentation look at the corresponding compressed technical report, [gzipp... classification, retrieval link 2013-03-12 770
102 Tiny Images The Tiny Images dataset consists of 79,302,017 images, each being a 32x32 color image. This data is stored in the form of large binary files which can be accese... classification, tiny, color, retrieval link 2013-03-12 758
101 CIFAR-10 / 100 The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. ... classification, tiny, color, patch, scene, object link 2013-08-08 851
97 KU Leuven Facade The KU Leuven Facade dataset is used for architectural styles classification. M. Mathias, A. Martinovic, J. Weissenberg, S. Haegler, L. Van Gool: Automatic A... classification, architecture, urban link 2014-11-20 830
96 USPS Handwritten Digits Name: Classes Train. Ex. Test. Ex. Features USPS 10 7291 2007 256 8-bit grayscale images of "0" through "9"; handwritten digits; ... text, recognition, classification, handwritten link 2013-03-12 1158
95 Stroke Width Transform Text Stroke Width Transform Text dataset is by Boris Epstein and consists of 307 images and XXX text instances. Detecting Text in Natural Scenes with Stroke Wid... text, detection, recognition, classification link 2015-04-24 1020
94 Chars74K The Chars74K dataset consists of 64 classes (0-9, A-Z, a-z), 7705 characters obtained from natural images, 3410 hand drawn characters using a tablet PC, 62992 s... text, detection, recognition, classification link 2017-08-03 1532
93 Street View Text The Street View Text (SVT) dataset contains 647 words and 3796 letters in 249 images harvested from Google Street View. The dataset is more challenging becaus... text, detection, recognition, classification, outdoor, urban link 2014-01-13 1109
92 ICDAR 2011 This challenge is set up around three tasks: Text Localisation, Text Segmentation and Word Recognition. Participation in any or all tasks is welcome. Check the ... text, detection, recognition, classification link 2016-06-01 826
91 ICDAR 2003 The ICDAR 2003 datasets available for download on this site: Robust Reading , Robust Word Recognition , Robust OCR , Text Locating and Cursive Script . Pleas... text, detection, recognition, classification link 2017-08-15 1023
76 Daimler Pedestrian Classification Daimler Multi-Cue, Occluded Pedestrian Classification Benchmark Training and test samples have a resolution of 48 x 96 pixels with a 12-pixel border around t... detection, classification, pedestrian, urban link 2013-03-11 869
59 Near-Regular Textures The Near-Regular Textures dataset contains textures from completely regular to completely irregular patterns, with a focus on near-regular textures. It also inc... texture, segmentation, classification, symmetry, regular, stochastic link 2013-03-11 785
55 Prague Texture Segmentation The Prague Texture Segmentation Datagenerator and Benchmark is designed to mutually compare and rank different (dynamic/static) texture segmenters (supervised o... texture, segmentation, classification, benchmark, synthetic link 2013-08-08 753
42 Hollywood Videos Hollywood-2 datset contains 12 classes of human actions and 10 classes of scenes distributed over 3669 video clips and approximately 20.1 hours of video in t... action, classification, video, segmentation link 2013-03-12 1081
41 KTH Action The current video database containing six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 2... action, classification, video, segmentation link 2013-03-12 730
40 Weizmann Action The Weizmann actions dataset by Blank, Gorelick, Shechtman, Irani, and Basri consists of ten different types of actions: bending, jumping jack, jumping, jump in... video, segmentation, action, classification link 2015-07-14 808
21 ImageNET The ImageNET dataset is the latest dataset by Li Fei-Fei containing various dataset ranging from 1000 to 10000 categories.... retrieval, segmentation, classification link 2013-03-11 969
20 CALTECH 256 The CALTECH 256 dataset by Li Fei-Fei contains 30607 images for 256 categories.... classification centered object scene image link 2013-08-08 845
19 CALTECH 101 The CALTECH 101 dataset by Li Fei-Fei contains images for 101 categories with about 40 to 800 images per category. Most categories have about 50 images at rough... classification centered object scene image link 2013-08-08 864


total views: 67902 5 queries in 0.00012397766113281s 0.00011992454528809s 0.00019097328186035s 0.00011301040649414s 0.0021440982818604s and total 0.0089120864868164s