Development of an Automatic Summarization System based on Large Language Models for Annual Report Analysis

Muhammad Rizki; Yudi Wibisono; Eddy Prasetyo Nugroho

doi:10.47709/brilliance.v5i2.6772

Authors

Muhammad Rizki Universitas Pendidikan Indonesia, Indonesia
Yudi Wibisono Universitas Pendidikan Indonesia, Indonesia
Eddy Prasetyo Nugroho Universitas Pendidikan Indonesia, Indonesia

DOI:

https://doi.org/10.47709/brilliance.v5i2.6772

Keywords:

Annual Report, Automatic Text Summarization, Fine-Tuning, Large Language Models, Low-Rank Adaptation

Abstract

The increasing interest in stock market investment in Indonesia has highlighted a significant challenge for retail investors: the difficulty of analyzing lengthy and complex corporate annual reports. These documents, essential for fundamental analysis, are often hundreds of pages long and contain detailed narrative sections that require considerable time and effort to comprehend. This research addresses this issue by developing an automatic summarization system using a Large Language Model (LLM) to generate concise and insightful summaries of such reports. The primary objective was to develop and evaluate an LLM-based system specifically adapted for the structure and content of annual reports. The method involved creating a tailored dataset comprising 2,008 narrative text excerpts and their corresponding manual summaries sourced from the annual reports of companies listed on the Indonesia Stock Exchange (IDX). The open-source Llama-3.2-3B-Instruct model was then fine-tuned using the Parameter-Efficient Fine-Tuning (PEFT) technique, specifically Low-Rank Adaptation (LoRA). The research results demonstrated a significant improvement in the model's performance after fine-tuning. Quantitative evaluation using ROUGE metrics showed a relative increase of 18.63% in ROUGE-1, 44.45% in ROUGE-2, and 33.83% in ROUGE-L compared to the base model. Qualitative analysis confirmed that the fine-tuned model was capable of generating informative and relevant summaries aligned with the context of annual report analysis. In conclusion, this study demonstrates that fine-tuning LLMs with document-specific data is an effective approach for specialized tasks such as annual report summarization.

References

Avramelou, L., Passalis, N., Tsoumakas, G., & Tefas, A. (2023). Domain-Specific Large Language Model Finetuning using a Model Assistant for Financial Text Summarization. In 2023 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 381-386).

Azzi, A. A., & Kang, J. (2020). Extractive summarization system for annual reports. In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation (pp. 143-147).

Chaithra, C., & Mohan, B. R. (2024). Revealing Insights: Sentiment Analysis of Indian Annual Reports. In 2024 3rd International Conference for Innovation in Technology (INOCON) (pp. 1-5). IEEE.

El-Haj, M., Salzedo, C., Rayson, P., & Litvak, M. (2020). Overview of FNS 2020 shared task: Financial narrative summarisation. In Proceedings of the 1st Financial Narrative Processing Workshop (pp. 68-74).

El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679.

Gillioz, A., Casas, J., Mugellini, E., & Khaled, O. A. (2020). Overview of the Transformer-based Models for NLP Tasks. In Federated Conference on Computer Science and Information Systems (pp. 179-183).

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., & Hüllermeier, E. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274.

Liu, C., Wang, F., & Xue, W. (2023). The annual report tone and return Comovement Evidence from China's stock market. International Review of Financial Analysis, 88, 102610.

Maharani, D., Masrina, M., & Albanjari, M. F. (2022). Pengaruh Manfaat Dan Resiko Investasi Terhadap Minat Investasi. Jurnal Pendidikan, Sains Sosial, Dan Agama, 8(1), 179-186.

Muhammad, B., & Andika, M. (2022). Pengaruh Literasi Keuangan, Pengetahuan Investasi dan Pendapatan Terhadap Perilaku Keputusan Investasi di Pasar Modal pada Mahasiswa di Jabodetabek. Prosiding SNAM, 3, 1-10.

PT Bursa Efek Indonesia. (2023). Siaran Pers: Melalui Berbagai Pencapaian Tahun 2023, Pasar Modal Indonesia Tunjukkan Optimisme Hadapi Tahun 2024.

Shukla, A., Garcia, A., & Koutras, C. (2022). DiMSum: A distributed multi-lingual summarization system for annual reports. In Proceedings of the 3rd Financial Narrative Processing Workshop (pp. 66-72).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30).

Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K., & Hashimoto, T. B. (2024). Benchmarking Large Language Models for News Summarization. Transactions of the Association for Computational Linguistics, 12, 39-57.

Zhao, H., Liu, Z., Wu, Z., Li, Y., Yang, T., Shu, P., ... & Liu, T. (2024). Revolutionizing finance with llms: An overview of applications and insights. arXiv preprint arXiv:2401.11641.

Development of an Automatic Summarization System based on Large Language Models for Annual Report Analysis

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

index

Information

Language