Depth perception from stereo vision presents a significant challenge for both biological and artificial vision systems, as it requires establishing correspondences in different views captured by visual sensors. While animals excel at this task, machine vision has historically struggled to develop algorithms that match the efficiency and robustness of biological systems. A key factor contributing to this performance gap is the nature of the vision sensors used. Biological systems utilize self-timed, continuous sensing, whereas machine vision predominantly relies on frame-based cameras that capture static images at regular intervals. Consequently, traditional machine stereo vision algorithms are designed to extract depth information from pairs of static images, typically at frequencies of a few tens of Hertz. This leads to a trade-off between latency and computational cost, as redundant data from subsequent images complicates processing. Recently, a new class of event-based vision sensors, known as silicon retinas, has emerged. These sensors mimic the mammalian visual retina by producing continuous streams of spikes (or events) that encode only changes in the scene, resulting in sparse visual output that eliminates redundancy. This innovation allows for the development of efficient, frame-less machine vision algorithms that align more closely with biological systems.
Marc Osswald Bücher
