KIP:Knowledge Injection Through Prompting For Image Captioning

Master Thesis project:an image captioning model that leverages a knowledge graph for transformer promptig.

Due to an NDA only a brief description of the project can be provided.

Abstract

With the rise of the Transformer architecture, the current paradigm for language and multimodal models has seen significant popularity in works that focus on the scaling of datasets and models. Although they have resulted in important achievements, research on efficient learning has remained limited. In this work, we take inspiration from priming in cognitive psychology and prompt our model with a novel rich knowledge graph called the context graph. Our framework includes an improved algorithm for graph to prompt conversion and a novel Transformer decoder designed for sentence generation with rich prompts. Our model, named Knowledge Injection through Prompting (KIP), is shown to use the words from the context graph for caption generation. Although performance after longer training stagnates, we demonstrate that the external knowledge in the context graph improves KIP’s image captioning performance.