Rhythmic lyrics translation: Customizing a pre-trained language model using stacked fine-tuning

Neural machine translation (NMT) is a software that uses neural network techniques to translate text from one language to another. As NMT models are on the rise, the focus is on translating everyday mundane sentences. However, it is also necessary to start paying attention to the translation of domain-specific text, such as lyrics or poetry. For example, even one of the most famous NMT models—Google Translate—failed to give an accurate English translation of a famous Korean nursery rhyme, "Airplane" (비행기). To teach the model to retain specific information other than semantics, we need specific data which contains the exact information that we are attempting to teach. In the case of rhythmically accurate lyrics translation—translated lyrics that can be used to sing along to the original melody—we need corresponding data, containing lyrical and rhythmical properties, all the while being semantically accurate. However, as there is not enough data that fits our criteria, we propose a novel method we call 'stacked fine-tuning'. We fine-tuned a pre-trained model first with a dataset from the lyrics domain, and then with a smaller dataset containing the rhythmical properties, to teach the model to translate rhythmically accurate lyrics. To evaluate the effectiveness of our approach, we translated two famous Korean nursery rhymes to English and matched them to the original melody. Our stacked fine-tuning method resulted in an NMT model that could maintain the rhythmical characteristics of lyrics during translation while single fine-tuned models failed to do so.