Research on Short Video Data Analysis Based on Multimodal Features

Yue Xie

doi:10.54691/x5jv3760

Authors

Yue Xie

DOI:

https://doi.org/10.54691/x5jv3760

Keywords:

Multimodal Features; Short Videos; Emotion Recognition; User Interest Prediction; Deep Learning.

Abstract

As an emerging media format, short videos have become an important means for people to obtain information and entertainment. However, the complexity and diversity of short video content pose significant challenges for data analysis. This paper provides a reference for short video data analysis research by investigating multimodal data feature extraction, emotion recognition, and user interest prediction. Firstly, this study explores multimodal data feature extraction methods, utilizing deep learning models to extract image, audio, and textual features. Secondly, an emotion recognition method based on a user feature-guided attention mechanism is proposed, which enhances the feature representation of emotional analysis by fusing multimodal features. Finally, a user interest prediction model is designed by integrating multimodal features and user interest evolution patterns.

Downloads

Download data is not yet available.

References

[1] REN Z Y, WANG Z C, KE Z W, et al. Survey of multimodal data fusion[J]. Computer Engineering and Applications, 2021,57(18): 49-64.

[2] Yao T, Zhai Z, Gao B. Text Classification Model Based on fastText[C]. 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), 2020: 154-157.

[3] Atliha V, Sesok D. Comparison of VGG and ResNet used as Encoders for Image Captioning[C]. 2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), 2020: 1-4.

[4] Henaff M , Bruna J , Lecun Y .Deep Convolutional Networks on Graph-Structured Data[J].Computer Science, 2015: 1-10.

[5] He K , Zhang X , Ren S ,et al.Deep Residual Learning for Image Recognition[J].IEEE, 2016.

[6] Huang G , Liu Z , Laurens V D M ,et al.Densely Connected Convolutional Networks[J].IEEE Computer Society, 2016.

[7] Fan Y, Zhou Q, Chen W, et al. User connection method based on multimodal information fusion [J]. Computer Engineering and Design, 2024, 45(9): 2641-2648.

[8] Xie X, Ding C, Wang X, et al. Multimodal emotion recognition integrating text, speech, and expression [J]. Journal of Qingdao University (Engineering & Technology Edition), 2024, 39(3): 20-30.

[9] Zadeh A, Zellers R, Pincus E. Mosi: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv preprint arXiv:1606.06259, 2016.

[10] Chen P, Fu X. Research on text sentiment polarity classification using SVM method [J]. Journal of Guangdong University of Technology, 2014, 31(3): 95-101.

[11] Ghosal D, Akhtar M, Chauhan D. Contextual inter-modal attention for multi-modal sentiment analysis [C]. Proceedings of the 2018 conference on empirical methods in natural language processing. 2018: 3454-3466.

[12] Li T. Multimodal discourse analysis of short video news driven by emotion [J]. News Lovers, 2024, (12): 28-30.