There has been significant progress in automatic text summarization yet financial document summarization is still an emerging field of study, particularly for public company's earnings calls. Techniques to efficiently and effectively summarize company earnings calls in a comprehensive yet consumable manner are still in their infancy. In this paper, we present ECTQA, a novel dataset of the Q&A sections of earnings call transcripts (ECTs) hosted by publicly-traded companies and reference summaries generated by a salient sentence extractive model. These Q&A sections are typically 90% of any given earnings call and are largely under-analyzed by academic research and financial news organizations. We also present methods to fine-tune the ECT-BPS modeling framework that generates both extractive and abstractive paraphrased summaries of the Prepared Remarks of ECTs with the intent to apply it to our novel ECTQA dataset.
The ECTQA dataset can be found in the data
folder.
The extractive, abstractive, unsupervised, and long document summarization models dan be found in the models
folder.
The notebooks can be found in the code
folder.