Is it really possible to teach a computer to learn like a human? Find out how transfer learning for NLP is challenging traditional machine learning methods!
What is transfer learning?
The technique of transfer learning involves modifying pre-trained machine learning models to solve technical problems. To understand better, Transfer Learning involves taking a model that was trained to do one task and fine-tuning it to do another related yet different task. Transfer learning is learning a task by leveraging labeled data from related tasks or domains. Tasks are the model’s objective.
Typically, we aim to transfer as much knowledge from our source setting to our target setting. Depending on the data, this knowledge may take different forms: it may pertain to how objects are constructed to provide easier identification of strange things; it may pertain to the general words people use when expressing their views.
Classical supervised machine learning involves learning a single predictive model using a single dataset for a particular task. It performs best for well-defined and narrow tasks requiring many training examples.
We leverage data from additional domains or tasks in transfer learning to train a model with better generalization properties. Since outstanding pre-trained models trained on large amounts of data are available, transfer learning has been heavily used in computer vision.
Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMo) are both language models that rely on transfer learning.
How do we use transfer learning for NLP?
A growing use of transfer learning has led to rapid natural language processing (NLP) advances in recent years. In addition, using transfer learning can reduce the amount of time it takes to train every model. In NLP, the most intriguing aspect of Transfer Learning is that researchers discovered that if we train a model to predict the next word, we can take that model, chop off the prediction layer, replace it with a new one, and train just that last layer to predict sentiment quickly.
Transfer learning comes in a variety of forms in current NLP. The following three dimensions can be categorized into four categories:
– based on whether the source and target settings deal with the same task;
– based on the nature of the source and target domains;
– based on the order in which they are learned.
“A transductive transfer learning approach consists of the same source and target tasks. The difference between domain adoption (data from different domains) and cross-lingual learning (data from other languages) can also be made” (Ruder, 2019).
Alternatively, there is inductive transfer learning, where the task source and target differ. There are two types of inductive transfer learning: multi-task transfer learning and sequential transfer learning. Multiple tasks are learned simultaneously in multi-task transfer learning, and common knowledge is shared between them.
“The general knowledge derived from source data is transferred to only one task in sequential transfer learning” (Ruder, 2019).
The most significant improvements have been achieved through inductive sequential transfer learning.
Let’s find out the methods of transfer learning for NLP.
The following are the methods of transfer learning for Natural Language Processing (NLP).
- Parameter initialization (INIT): Using the INIT approach, the network is first trained on S. Then its tuned parameters are used directly to initialize it for T. This may be followed by fine-tuning the parameters in the target domain.
- Multi-task learning (MULT): By contrast, MULT trains samples in both domains simultaneously
- Combination (MULT+INIT): We first pre-train on source domain S for parameter initialization and then simultaneously train S and T.
Conclusion
Transfer learning NLP reduces the amount of time and specific data needed for training for a target task. Contextualization is added to word embeddings with ELMo. With ULMFiT, many new ideas are introduced, like fine-tuning, which lowers the error rate significantly, and a transformer model architecture is used in GPT, which is similar to the architecture used by cutting-edge NLP models.
Using pre-trained transformer models, clinical NLP gains significant performance gains; in some cases, these gains dwarf progress made over the previous decade. However, due to its highly specialized language, recent studies have shown that pre-training in the public domain does not always transfer adequately to the clinical domain.
Learn about the deep learning applications that changed the world.