Currently, deep learning systems have difficulty understanding human emotion in real-time. This difficulty has negative implications in a variety of real-word situations such as chatbots and virtual assistants. The goal of this project is to resolve this situation by building a system that can understand human emotion in real world dialogues. To tackle this problem, we take advantage of the EmotionLines corpus which consists of dialogues labeled by utterance. We define our task to be real-time utterance-level emotion recognition (ULER), with real-time meaning that our system only can see previous utterances within a dialogue. Ultimately, we were able to both build a series of multi-level models and fine-tune BERT on a few different tasks to achieve improvement on the CNN baseline from the EmotionLines paper. Finally, in anticipation of future work, we collected a dataset of brief dialogues between users and virtual assistants labeled by errors. Our hope is that by using real-time ULER, future systems can learn to associate user emotions, such as surprise or anger, with virtual assistant errors.