Transformer, Foundation Models for Financial Time Series Forecasting (FTSF).
Of pre-training an LLM and fine-tuning on a custom dataset (e.g. the Financial Aid dataset) for downstream tasks.This short-paper is published in IEEE International Workshop on Large Language Models for Finance.
@article{islam2024large,
title={Large Language Models for Financial Aid in Financial Time-series Forecasting},
author={Islam, Md Khairul and Karmacharya, Ayush and Sue, Timothy and Fox, Judy},
journal={arXiv preprint arXiv:2410.19025},
year={2024}
}
Financial aid distributed to each US state by the Government to support student education and collected from years 2004 to 2020 from InformedStaets.org. Details of the available features are in the following Table. Aid is given based on financial needs, academic merit, or both. The sub-categories are simplified and describe multiple features.
Category | Sub-category | Description |
---|---|---|
Identifier | State id and name abbreviation. | |
Number | Total students receiving the award. | |
Public/Private | Whether the funds can be used for public or private sectors and how long (2 or 4 years). | |
Need, Merit, both | Flags | 0 or 1 based on whether the aid falls in a particular category. |
Program | Aid program with the most generous eligibility criteria. | |
Notes | Related text. | |
Threshold | GPA, SAT, income, and other academic or financial limits to qualify for the aid. | |
Time | Year | Fiscal or academic year. |
Target | Amount | Aid amount received by the students. |
Representative rates of US dollar for the period August 01, 2014 - August 01, 2024.
Collected from the IMF rates database.
These rates, normally quoted as currency units per U.S. dollar, are reported daily to the Fund by the issuing central bank. (The IMF does not maintain exchange rates on weekends and some holidays.) The collected data covers the following currencies:
- Australian Dollar (AUD)
- Candian Dollar (CAD)
- Chinese yuan (CNY)
- Euro (EUR)
- Indian rupee (INR)
- Japanese yen (JPY)
- U.K. pound (GBP)
Converted to csv using the following
df = pd.read_csv('./data/Exchange_Rate_Report.tsv', sep='\t')
df.drop(['Unnamed: 0', 'Unnamed: 9'], axis=1, inplace=True)
df.fillna(method='ffill').fillna(method='bfill').to_csv(
'./data/Exchange_Rate_Report.csv',
sep=',', index=False
)
Daily stock prices (Close, Open, High, Low) and volumes for each stock for upto 10 years from NASDAQ database.
Time Series models implemented using the Time Series Library
- DLinear - Are Transformers Effective for Time Series Forecasting? [AAAI 2023]
- iTransformer - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [ICLR 2024].
- TimeMixer - TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting [ICLR 2024].
- PatchTST - A Time Series is Worth 64 Words: Long-term Forecasting with Transformers [ICLR 2023].
- TimesNet - TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis [ICLR 2023].
Time Series LLM models
- GPT4TS - One Fits All (OFA) : Power General Time Series Analysis by Pretrained LM (NeurIPS 2023 Spotlight)
- CALF - CALF - Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning. (Under review 2024)
- TimeLLM - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models (ICLR 2024)
Few-shot learning performance with 10% training data. TimeLLM and PatchTST outperform the other models. The best and the second best results are in bold and underlined.
GPT4TS performs the best in zero shot performance. The best and the second best results are in bold and underlined. The traditional models are excluded here since they are not pre-trained.
Install the required libraries using
pip install -r requirements.txt
Use the run.py
script for the traditional models. The run_CALF
, run_OFA
and run_TimeLLM
are for the CALF
, GPT4TS
and TimeLLM
respectively. The sample scripts are available in scripts
folder.