Dissecting Language's Effect on Bottleneck Models

Language has been shown to be helpful on improving models' ability to generalize on unseen abstract concepts. However, it's still unclear why and how language can help the model to generalize. Inspired by the results in Learning with Latent Language (L3), we aim to answer several crucial questions about language's role on facilitating learning and improve L3's performance on few-shot classification where models needs to learn how to quickly adapt to unseen tasks by learning from a set of similar tasks. We first demonstrate that the accurate descriptions of the spatial relationships can massively improve the models' performance on few-shot classification by providing correct guidance encoded in natural language. To improve model's performance, we focus on two directions: 1) enhancing model's visual reasoning via providing more informative language guidance on spatial relationships, 2) enhancing model's ability to fuse different modality. Our results demonstrated that 1) we can achieve comparable classification performance by using a simple concept retrieval mechanism 2) a larger image model can dramatically improve the classification accuracy.