Hidden in the Layers. Interpretation of Neural Networks for Natural
Language Processing
Mareček David - Libovický Jindřich - Musil Tomáš
vázaná, 175 str.
ISBN 9788088132103
In this book, we explore neural-network architectures and models that are used for Natural Language Processing (NLP). We analyze their internal representations (word-embeddings, hidden states, attention mechanism, and contextual embeddings) and review what properties these representations have and what kinds of linguistically interpretable features emerge in them. We use our own experimental results, as well as the results published by other research teams to present an overview of models and representations and their linguistic properties. In the beginning, we explain the basic concepts of deep learning and its usage in NLP and discuss details of the most prominent neural architectures and models. Then, we outline the concept of interpretability, different views on it, and introduce basic supervised and unsupervised methods that are used for interpreting trained neural-network models. The next part is devoted to static word embeddings. We show various methods for embeddings space visualization, component analysis and embedding space transformations for interpretation. Pretrained word embbedings contain information about both morphology and lexical semantics. When the embbedings are trained for a specific task, the embeddings tend to be organised by the information that is important for the given task (e.g. emotional polarity for sentiment analysis). We also analyze attention mechanisms, in which we can observe weighted links between representations of individual tokens. We show that the cross-lingual attentions mostly connect mutually corresponding tokens; however, in some cases, they may be very different from the traditional word-alignments. We mainly focus on self-attentions in Transformers. Some heads connect tokens with certain syntactic relations. This motivated researchers to infer syntactic trees from the self-attentions and compare them to the linguistic annotations. We summarize the amount of syntax in the attentions across the layers of several NLP models. We also point out the fact that attentions might sometimes be very misleading and may carry very different information from which we would think based on the attended tokens. In the last part, we look at contextual word embeddings and the linguistic features they capture. They constitute a clear improvement over static word embeddings, especially in terms of capturing morphological and syntactic features. However, some higher linguistic abstractions, such as semantics, seem to be reflected in the current contextual embeddings only very weakly or not at all.