This repository contains a collection of raw financial text data. The dataset encompasses a wide range of financial documents, including annual reports, news articles, and social media posts.
For now please refer to our Huggingface Repo
The dataset is organized into the following categories:
- Annual Reports: Financial reports issued by companies on an annual basis, providing insights into their financial performance and strategic outlook. (แบบแสดงรายการข้อมูลประจำปี (แบบ 56-1))
- News Articles: Articles from reputable financial news sources
- Social Media and Online Forums: Text from Internet Users
Researchers, data scientists, and developers can utilize this dataset for training langauge models as well as building corpora.
The data in this repository has been sourced from publicly available sources, including but not limited to:
- Financial news websites
- Regulatory agencies
- Social Media
The dataset is provided under the CC BY 4.0 License, allowing for unrestricted use and modification for both commercial and non-commercial purposes. However, users are encouraged to provide attribution to this repository if the data is used in their work.
Contributions to this repository are welcome. If you have additional financial text data that you would like to contribute, please submit a pull request.
We acknowledge the sources and providers of the data included in this repository, without whom this collection would not be possible.
For questions, suggestions, or inquiries regarding this dataset, please contact canu_pro@vistec.ac.th or open an issue in the repository. We appreciate your feedback and contributions.