BigBirdFLY: Financial Long text You can read

img
The development of new architectures allows to process long input windows of text at once, overcoming both memory and computational constraints. New developments pushed maximum input windows to 65k+ words compared to the 512 BERT limit. We aim to explore, compare and improve state-of-the-art long window architectures to summarize long texts. We consider BERT (512 words), GPT-3 (2,048 words), and BigBird (4,096 words), and focus on the financial narrative domain, summarizing 100- to 200-page documents. We aim to test models with different maximum input size, exploring benefits and limitations. Long input windows allow to include wider context in the summarization process, avoiding out-of-context sentence extraction that can lead to changes at the sentence-level semantic. We compare extractive and abstractive methods on key aspects in the financial context, such as numerical accuracy and summary semantic. We show extractive methods (BERT-based) can retain sentence-by-sentence accuracy from text; however, the extraction process can produce fragmented summaries which can lead to a misleading interpretation. We also show abstractive methods (by introducing BigBirdFLY, a wide context summarization method based on BigBird) can produce fluent summaries. By using human evaluation, we reveal BigBirdFLY can produce summaries more similar to human-generated summaries, and excel in the human evaluation criteria --- whereas extractive methods are able to score high in automatic metrics (ROUGE). Finally, we explore how enhanced greedy sentence-selection methods exploiting long input window in a single step compare to recursive solutions based on Reinforcement Learning.