Article Preview
TopIntroduction
With the rapid development and popularization of financial technology, a large number of related advertisements appear in our lives. More and more users gain a deeper understanding of them by watching advertisements, and the role of advertising is becoming increasingly significant. Therefore, evaluating the quality of advertisements has become an important problem. Although user-engagement detection can be applied to evaluate the quality of advertisements, traditional engagement-detection methods often consume enormous computing resources and have poor practical application value. Therefore, designing and using an effective engagement-detection method to evaluate the quality of advertisements is of great value and significance.
To address this issue, we refine the engagement detection into multiple steps. First, we extract key-frame images through salient-object detection and perform image super-resolution reconstruction on the extracted key-frame images. Based on this, we obtain final engagement-detection results through spatial pyramid matching. Obviously, the research areas of engagement detection, salient-object detection, super-resolution reconstruction, and image-similarity matching are highly relevant to our method. Many experts have proposed multiple classic models in these fields. Quite a few models are widely used, which directly contributes to high-quality development of related fields.
Early engagement-detection methods used machine-learning methods to extract manual features using static, discrete images as datasets. In recent years, more and more people have paid attention to the concept of engagement (Shahbaznezhad et al., 2021; Nigam & Dewani, 2022). In particular, with the development of deep learning, many end-to-end engagement-detection models have been developed. As for detection of students' engagement in the classroom, DERN (Huang et al., 2019) uses a bidirectional long short-term memory (LSTM) network with added attention mechanism to analyze the features extracted by OpenFace through sequential convolution and obtains the degree of students' participation in online classes. Bhardwaj et al. (2021) paid attention to students' eye movements, using a deep-learning network to analyze students' eye focus to establish a feature matrix for judging students' concentration in online classes. Some deep-learning models effectively improve the performance of engagement detection by designing network architecture and attention mechanism in temporal and spatial domains, respectively. DFSTN (Liao et al., 2021) uses an SE-ResNet-50 network to extract spatial features of faces and uses a LSTM network with a global attention mechanism to extract temporal features. Combining the features of time and space, the model can capture the fine-grained features of face sequences better and improve detection effectively. Shen et al. (2022) proposed an engagement-detection model for massive open online courses to monitor students' learning engagement through a regionally adaptive facial expression recognition network. DenseNet (Mehta et al., 2022) improves the performance of concentration analysis by facial expression through the time, space, and spatiotemporal self-attention mechanism.
Salient-object detection can separate salient objects from the natural scene. Initially, salient-object detection methods (Li et al., 2013; Wang et al., 2016) worked by calculating the contrast difference in the neighborhood around a single or several pixels. After that, some deep learning–based salient-detection models were optimized by fully extracting multi-scale features. PAGE-Net (Wang et al., 2019) uses a pyramid enhancement module and a salient edge-detection module to optimize the salience-detection model. In the pyramid-enhancement module, the model gets multi-scale salience information and pays more attention to the content related to salient detection. In the salient edge-detection module, the authors stack multi-scale salience information and introduce attention mechanism to make the extracted feature receptive field larger and more expressive.