DOI: 10.14489/vkit.2025.07.pp.055-064
Пащенко Д. Э., Котельников Е. В. АДАПТАЦИЯ СЛОВАРНОГО МЕТОДА АНАЛИЗА ТОНАЛЬНОСТИ ТЕКСТОВ НА ОСНОВЕ ВЕСОВ ВНИМАНИЯ НЕЙРОСЕТЕВОЙ ЯЗЫКОВОЙ МОДЕЛИ (с. 55-64)
Аннотация. Представлены результаты адаптации словарного метода SO-CAL для анализа тональности русскоязычных текстов путем доработки словаря оценочной лексики. Для этого использовались значения весов внимания нейросетевой языковой модели ruRoBERTa-large, которые агрегировались по особому алгоритму. При формировании новых словарей оценочной лексики на основе существующего рассматривались различные варианты. Первые три касались добавления слов с высокими агрегированными весами внимания модели и различались между собой обработкой спорных случаев тональности слов. Четвертый метод заключался в удалении слов с низкими агрегированными весами внимания. Последний способ представлял собой комбинацию методов по замене слов с низкими агрегированными весами внимания на слова с высокими агрегированными весами внимания для выбранной предметной области. Результаты исследования показали, что комбинированный метод превосходит остальные: на валидационных данных корпуса SentiRuEval-2016 метрика macro F1-score увеличилась на 4,28 %, на тестовых данных корпуса SentiRuEval-2015 метрика выросла на 15,86 %.
Ключевые слова: обработка естественного языка; анализ тональности текстов; словарные методы; нейросетевые языковые модели.
Pashchenko D. E., Kotelnikov E. V. ADAPTATION OF THE LEXICON-BASED SENTIMENT ANALYSIS METHOD USING ATTENTION WEIGHTS OF A NEURAL LANGUAGE MODEL (pp. 55-64)
Abstract. The article presents results of SO-CAL lexicon-based method adaptation for sentiment analysis in Russian texts by modification lexicon using aggregated attention weights of ruRoBERTa-large neural language model. The proposed method includes four steps. At the first step of the algorithm, ruRoBERTa-large neural language model was fine-tuned on the first text corpus. The second step involves obtaining sentiment and attention weights of fine-tuned model for each text on the second text corpus. Aggregation of attention weights was carried out according to a special algorithm. At the third step, new lexicon are formed based on existing lexicon and aggregated attention weights of neural language model for SO-CAL lexicon-based method. The first three approaches involved incorporating words with high aggregated attention weights, differing in their handling of ambiguous sentiment cases. The fourth method removed terms with low aggregated attention weight. The fifth implemented a hybrid approach: replacement of low weight lexical items with their high attention counterparts. At the last step, the optimal lexicon was selected from the list of all received lexicons. Then optimal lexicon was evaluated and classification results for initial and optimal lexicons were compared. We used macro F1-score to compare lexicons. The results showed that combined method of lexicon formation was superior to the others: for validation data of SentiRuEval-2016 corpus macro F1-score was increased by 4,28 %, for test data of SentiRuEval-2015 corpus this metric was increased by 15,86 %.
Keywords: Natural language processing; Sentiment analysis; Lexicon-based methods; Neural language model.
Д. Э. Пащенко (Вятский государственный университет, Киров, Россия) E-mail:
Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript
Е. В. Котельников (Европейский университет в Санкт-Петербурге, Санкт-Петербург, Россия)
D. E. Pashchenko (Vyatka State University, Kirov, Russia) E-mail:
Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript
E. V. Kotelnikov (European University at St. Petersburg, St. Petersburg, Russia)
1. Liu B. Many Facets of Sentiment Analysis // Socio-Affective Computing. 2017. V. 5. P. 11–39. 2. Sentiment Analysis in the Era of Large Language Models: A Reality Check / W. Zhang, Y. Deng, B. Liu et al. // Findings of Association for Computational Linguistics: NAACL. 2024. P. 3881–3906. 3. Tang D., Qin B., Liu T. Deep learning for sentiment analysis: Successful approaches and future challenges // Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2015. P. 292–303. 4. Kumar A., Kathiravan S., Cheng W. H., Albert Y. Z. Hybrid context enriched deep learning model for finegrained sentiment analysis in textual and visual semiotic modality social data // Information Processing &. Management. 2020. V. 57. P. 102–141. 5. Sentiment strength detection in short informal text / M. Thelwall, K. Buckley, G. Paltoglou et al. // Journal of the American Society for Information Science and Technology. 2010. V. 61. P. 2544–2558. 6. Hutto C., Gilbert E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text // Proceedings of the International AAAI Conference on Web and Social Media. 2014. V. 8. P. 216–225. 7. Lexicon-based methods for sentiment analysis / M. Taboada, J. Brooke, M. Tofiloski et al. // Computational Linguistics. 2011. № 2. P. 267–307. 8. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding // Google AI Language. 2019. V. 1. P. 4171–4186. 9. Language models are fewshot learners / T. Brown, B. Mann, N. Ryder et al. // Advances in neural information processing systems. 2020. V. 33. P. 1877–1901. 10. Ai-forever/ruRoberta-large: сайт [Электронный ресурс]. URL: https://huggingface.co/ai-forever/ruRoberta-large (дата обращения: 15.01.2025). 11. Russian SuperGLUE: Leaderboard: сайт [Электронный ресурс]. URL: https://russiansuperglue.com/leaderboard/2 (дата обращения: 18.01.2025). 12. Raffel C., Shazeer N., Roberts A., Lee K. Exploring the limits of transfer learning with a unified text-to-text transformer // Journal of machine learning research. 2020. V. 140. P. 1–67. 13. A Survey on Incontext Learning / Q. Dong, L. Li, D. Dai et al. // In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. V. 1. P. 1107–1128. 14. Vaswani A., Shazeer N., Parmar N., Uszkoreit J. Attention is all you need // Advances in neural information processing systems. 2017. V. 30. 15. Liu Y., Ott M., Goyal N., Du J. RoBERTa: A Robustly Optimized BERT Pretraining Approach [Электронный ресурс] // arXiv. 2019. URL: https://arxiv.org/abs/1907.11692. 16. Razova E., Vychegzhanin S., Kotelnikov E. Does BERT look at sentiment lexicon? // 10th International Conference on Analysis of Images, Social Networks and Texts (AIST-2021). Tbilisi, Georgia. 16-18 december 2021. 2021. P. 55–67. 17. RuSentiment: An enriched sentiment analysis dataset for social media in Russian / A. Rogers, A. Romanov, A. Rumshisky et al. // 27th International Conference on Computational Linguistics. COLING 2018. Santa Fe, NM. 20–26 August 2018. 2018. P. 755–763. 18. Loukachevitch N. V., Rubtsova Y. V. Sentirueval-2016: Overcoming time gap and data sparsity in tweet sentiment analysis // Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue”. 27–30 May 2015. Moscow, Russia. 2016. P. 416–426. 19. Loukachevitch N. V., Blinov P. D., Kotelnikov E. V. SentiRuEval: Testing Object-oriented sentiment analysis systems in Russian // Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue”. 27–30 May 2015. Moscow, Russia. 2015. P. 3–15. 20. Kotelnikova A., Paschenko D., Bochenina K., Kotelnikov E. Lexicon-Based Methods vs. BERT for Text Sentiment Analysis // Analysis of Images, Social Networks and Texts. 16–18 December 2021. Tbilisi, Georgia. 2022. P. 71–83.
1. Liu, B. (2017). Many facets of sentiment analysis. Socio-Affective Computing, 5, 11–39. 2. Zhang, W., Deng, Y., Liu, B., et al. (2024). Sentiment analysis in the era of large language models: A reality check. Findings of Association for Computational Linguistics: NAACL, 3881–3906. 3. Tang, D., Qin, B., & Liu, T. (2015). Deep learning for sentiment analysis: Successful approaches and future challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 292–303. 4. Kumar, A., Kathiravan, S., Cheng, W. H., & Albert, Y. Z. (2020). Hybrid context enriched deep learning model for finegrained sentiment analysis in textual and visual semiotic modality social data. Information Processing & Management, 57, 102–141. 5. Thelwall, M., Buckley, K., Paltoglou, G., et al. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61, 2544–2558. 6. Hutto, C., & Gilbert, E. (2014). VADER: A parsimonious rulebased model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8, 216–225. 7. Taboada, M., Brooke, J., Tofiloski, M., et al. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, (2), 267–307. 8. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pretraining of deep bidirectional transformers for language understanding. Google AI Language, 1, 4171–4186. 9. Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. 10. Ai-forever/ruRoberta-large. (n.d.). Retrieved January 15, 2025, from https://huggingface.co/ai-forever/ruRoberta-large 11. Russian SuperGLUE: Leaderboard. (n.d.). Retrieved January 18, 2025, from https://russiansuperglue.com/leaderboard/2 12. Raffel, C., Shazeer, N., Roberts, A., & Lee, K. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 140, 1–67. 13. Dong, Q., Li, L., Dai, D., et al. (2024). A survey on incontext learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 1, 1107–1128. 14. Vaswani, A., Shazeer, N., Parmar, N., & Uszkoreit, J. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. 15. Liu, Y., Ott, M., Goyal, N., & Du, J. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv. https://arxiv.org/abs/1907.11692 16. Razova, E., Vychegzhanin, S., & Kotelnikov, E. (2021). Does BERT look at sentiment lexicon? Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts (AIST-2021), 55–67. 17. Rogers, A., Romanov, A., Rumshisky, A., et al. (2018). RuSentiment: An enriched sentiment analysis dataset for social media in Russian. Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), 755–763. 18. Loukachevitch, N. V., & Rubtsova, Y. V. (2016). Sentirueval-2016: Overcoming time gap and data sparsity in tweet sentiment analysis. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialogue", 416–426. 19. Loukachevitch, N. V., Blinov, P. D., & Kotelnikov, E. V. (2015). SentiRuEval: Testing object-oriented sentiment analysis systems in Russian. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialogue", 3–15. 20. Kotelnikova, A., Paschenko, D., Bochenina, K., & Kotelnikov, E. (2022). Lexicon-based methods vs. BERT for text sentiment analysis. Analysis of Images, Social Networks and Texts, 71–83.
Статью можно приобрести в электронном виде (PDF формат).
Стоимость статьи 700 руб. (в том числе НДС 20%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.
После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.
Для заказа скопируйте doi статьи:
10.14489/vkit.2025.07.pp.055-064
и заполните форму
Отправляя форму вы даете согласие на обработку персональных данных.
.
This article is available in electronic format (PDF).
The cost of a single article is 700 rubles. (including VAT 20%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.
After depositing your payment on our bank account we send you file of the article by e-mail.
To order articles please copy the article doi:
10.14489/vkit.2025.07.pp.055-064
and fill out the form
.
|