Evaluating Extractive Text Summarization with BERTSUM

In this paper we dive into how effective a pre trained BERT trained on a CNN and DailyMail dataset can summarize news content. The Focus is on the evaluation of the algorithm BERTSUM using metrics such as ROGUE and LEAD3. This paper comes to the conclusion that evaluating these text summarizing models through ROUGUE metrics, or other metrics such as BLEU scoring for question answering, are hard to quantify. Additionally our results show how the BERTSUM model gives better ROGUE recall scores than LEAD3 summarizations.