Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction

Jianxiong Tang*, Jian-Huang Lai*, Lingxiao Yang, Xiaohua Xie ;

Abstract


"Event-to-Video (E2V) reconstruction aims to recover grayscale video from neuromorphic event streams, with Spiking Neural Networks (SNNs) being promising energy-efficient models for this task. Event voxels effectively compress event streams for E2V reconstruction, yet their temporal latent representation is rarely considered in SNN-based approaches. In this paper, we propose a spike-temporal latent representation (STLR) model for SNN-based E2V reconstruction. The STLR solves the temporal latent coding of event voxels for video frame reconstruction. It is composed of two cascaded SNNs: a) Spike-based Voxel Temporal Encoder (SVT) and b) U-shape SNN Decoder. The SVT is a spike-driven spatial unfolding network with a specially designed coding dynamic. It encodes the event voxel into the layer-wise spiking features for latent coding, approximating the fixed point of the Iterative Shrinkage-Thresholding Algorithm. Then, the U-shape SNN decoder reconstructs the video based on the encoded spikes. Experimental results demonstrate that the STLR achieves performance comparable to popular SNNs on IJRR, HQF, and MVSEC datasets while significantly enhancing energy efficiency."

Related Material


[pdf] [DOI]