The presence of multiband amplitude and frequency modulations (AM-FM) in wideband signals, such as textured images or speech, has led to the development of efficient multicomponent modulation models for low-level image and sound analysis. Moreover, compact yet descriptive representations have emerged by tracking, through non-linear energy operators, the dominant model components across time, space or frequency. In this paper, we propose a generalization of such approaches in the 3D spatio-temporal domain and explore the potential of incorporating the Dominant Component Analysis scheme for interest point detection and human action recognition in videos. Within this framework, actions are implicitly considered as manifestations of spatio-temporal oscillations in the dynamic visual stream. Multiband filtering and energy operators are applied to track the source energy in both spatial and temporal frequency bands. A new measure for extracting keypoint locations is formulated as the temporal dominant energy computed over the spatial dominant components, in terms of their modulation energy, of input video frames. Theoretical formulation is supported by evaluation and comparisons in human action classification, which demonstrate the potential of the proposed spatio-temporal detector.
宽带信号(如纹理图像或语音)中多频带幅度和频率调制(AM - FM)的存在,促使了用于低级图像和声音分析的高效多分量调制模型的发展。此外,通过非线性能量算子在时间、空间或频率上跟踪主要模型分量,出现了简洁而具有描述性的表示形式。在本文中,我们提出在三维时空域对这类方法进行推广,并探索将主成分分析方案用于视频中的兴趣点检测和人体动作识别的潜力。在此框架内,动作被隐含地视为动态视觉流中时空振荡的表现形式。应用多频带滤波和能量算子来跟踪空间和时间频带上的源能量。一种提取关键点位置的新方法被表述为:根据输入视频帧的调制能量,在空间主成分上计算的时间主能量。在人体动作分类中的评估和比较支持了理论公式,这证明了所提出的时空探测器的潜力。