An Indonesian recipe generator Deep Learning model trained by fine-tuning pre-trained models such as T5, BART, and GPT-2
Hello Everyone! Welcome to my first technical post on my personal blog! In this post, I will write about one of my fun projects, Indonesia Recipe Generator. This is the continuation of my previous medium post with more modern approach.
This post will tell you about the details about my experiment in creating Indonesia recipe generator.
Repository and Demo
I will provide the best model and also the code to train it.
Feel free to try and download it :) .
- š¤ Huggingface Space (demo): Space
- š¤ Huggingface Model (download the model): Model
- āØ Repository (to train the model): GitHub repository
Introduction
In the past, I created an Indonesia recipe generator in my medium post by using a seq2seq Deep Learning approach such as Gated Recurrent Unit (GRU) and Transformers. I want to revisit my past work and improve it. So, this post will post some improvements from the previous one.
The model in my previous post was raw and not pre-trained. So, it didnāt have any prior knowledge to use in the training process. One of the improvements that can increase the quality of the model that I mentioned in my previous blog is using a pre-trained model. Many state-of-the-art research works implement it to improve the quality of the model from the non-pre-trained one. Moreover, currently, itās the era of the pre-trained model in Deep Learning. Nowadays, many pre-trained models appear and have outstanding results when the model is trained again (we call it fine-tune
) on the target dataset. Therefore, itās intriguing to try it for my recipe generator project.
In this experiment, I will use off-the-shelf publicly available pre-trained models. Since the data is in the Indonesian language, I need to use models pre-trained with the Indonesian data. They also need to handle sequence generation problems since the problem that I want to tackle is a text generation problem. I searched and only found T5, BART, and GPT models. Thus, I decided to experiment by using these models.
Data, Preprocessing, Exploration Data Analysis
Since this is the continuation of my previous project, I use the same data that I used previously. Feel free to read more detail about it through my medium post below (click the image).
Click the image to visit my previous post~
Method
In this section, I will describe the models that Iāve tried and mentioned above. All of the models that I used are transformer-based model. I will briefly descibe about them.
In the future, I plan to create a post about in-depth details about current popular pre-trained model š I will try to make a distinction about each of them, so it will be easier for you to understand them.
Bart
BART1 is a transformer-based model that is pre-trained through learning a corrupted input. It has an encoder-decoder architecture like the Transformer model with a few modifications such as the replacement of the activation function. There are several corrupted scenarios that the author of the BART tried. The released final system is a model which is trained through sentence shuffling and token masking. The idea is to make BART learn perturbation and have the capability of doing causal language modeling. To apply it to the system, I fine-tuned the model to the data. Here is an illustration of how pre-trained BART was built.
BART pre-training. It uses corrupted input (sentence shuffling and token masking) to predict the uncorrupted one
In this experiment, I used IndoBART2 as the pre-trained model, which is released in this paper. The model is pre-trained through the Indo4B dataset, Common Crawl, and Wikipedia. The data has Indonesian, Sundanese, and Javanese languages. It is available publicly in the Huggingface model.
T5
T53 is also a transformer-based model that is pre-trained with a corrupted input. T5 is pre-trained by doing token masking. different from the BART which uses the data for causal language model training, T5 uses the training data as a seq2seq problem. The data may contain Translation, Question Answering, and Classification problems. The model is fed through learning those tasks with the addition of learning a corrupted input. Here is an illustration of how the model is pre-trained.
T5 pre-training. It use several tasks with a promptings style as the input of the model.
I used the T5 model which is available in the HuggingFace4. It is pre-trained by using the Indonesian mC4 dataset.
GPT
GPT-25 is an auto-regressive pre-trained model that is pre-trained through causal language modeling that has no perturbation in the input (unlike BART and T5). Then, the pre-trained model is fine-tuned to our data. I used the IndoGPT model that is also released together with IndoBART in the same paper. The model is also pre-trained with the same data as IndoBART.
Since the model is not encoder-decoder architecture, we need to reshape our input and make it a language modeling problem.
Setup
I will split this section into code technical setup, model setup, and hyperparameter setup.
Code Technical Setup
To make the training script, I used Pytorch as the deep learning framework. I wrap them with Pytorch Lightning6. I used the implementation of Model Checkpoint, Early Stopping, 16-bit Precision from the Pytorch Lightning.
For metric calculation, I used BLEU Score. BLEU Score is a popular metric for sequence-to-sequence problems. I use off-the-shelf BLEU score implementation from the sacrebleu
Python package.
Model Setup
I applied several modifications to the input of the model. For the architecture, I used an off-the-shelf implementation that Huggingface provided.
For GPT, since it needs one input, I concated the food name and recipe into an input with a special symbol >>>
.
1
2
Input: <FOOD> >>> <INGREDIENTS>
Output: Shift the input (e.g.: Input: `Apple Fruit end_of_token`, Output: `Fruit end_of_token`)
T5 has seq2seq architecture, so I did a small modification to the input. From what Iāve read, T5 is pre-trained with a āpromptingā style of input. For example: input: summarize: <ARTICLE>
. So, I follow it and change the data to become like that. Below is how I present the input-output of the model
1
2
Input: resep: <FOOD>
Output: <INGREDIENTS>
I didnāt do any changes in the BART model, so I provide the input and the output as-is.
1
2
Input: <FOOD>
Output: <INGREDIENTS>
Hyperparameter Setup
I used Adam as the optimizer technique. The learning rate varies depending on the architecture. I handpicked several learning-rate values based on several resources 237 and I tried some of these values. I picked 1e-4
, 1e-5
, and 1e-4
as the learning rate of the model of GPT, BART, and T5 respectively. I used early stopping criteria to avoid model overfitting. It will stop training if the validation loss doesnāt increase for 5 epochs. To pick the best model, I used the model that has the lowest validation loss. I used AdamW as the optimizer of the model.
To make the training faster, I used Automatic Mixed Precision (AMP) that Pytorch provided. Unfortunately, T5 canāt use AMP. So, I didnāt use AMP when I fine-tuned the T5 model.
Following my past article, to make a fair comparison, I used the Greedy decoder as the decoding strategy to predict the output for each of the models. You can see the details about how a greedy decoder works in my past blog.
Experiment Results
Below is the result of my experiment.
Model | BLEU Score | Pretrained? |
---|---|---|
GRU Seq2seq + Attention (Past Experiment) | 5.70 | ā |
Transformer (Past Experiment) | 16.24 | ā |
IndoGPT | 24.05 | ā |
T5 | 19.21 | ā |
IndoBART | 27.03 | ā |
With my setup, IndoBART outperforms other models. T5, IndoBART, and IndoGPT have higher BLEU score than the transformer vanilla. It indicates that a pre-trained seq2seq model may help to increase the performance of the model. All of the models that are trained on Indobenchmarkās data outperform the model that is trained on the C4 model (T5). Itās interesting to see the potential of each pre-trained model.
Analysis
In this section, I will provide several output samples across different basic ingredients in the data. I will also provide out-of-domain food to test the transfer learning capability. The samples are provided with the greedy search decoding strategy. The foods are taken from the test set.
Food | Scenario | IndoGPT | IndoBART | T5 | Ground Truth |
---|---|---|---|---|---|
mie goreng telur keju | egg( telur ) | š³ bahan utama 1 bungkus mie goreng 1 butir telur 1 / 2 sdt garam 1 sdt lada bubuk 1 sdm kecap manis 1 siung bawang putih 1 buah bawang merah 1 batang daun bawang 1 lembar daun salam 1 ruas jahe 1 sachet saus tiram 1 sendok teh gula 1 gelas air | 1 bungkus mie burung dara ( mie burung puyuh ) 1 butir telur 1 buah sosis 1 / 2 buah keju cheddar 1 siung bawang putih 1 sdm kecap manis 1 sdt saos tiram secukupnya garam secukupnya merica secukupnya penyedap rasa secukupnya air | 1 bungkus mie goreng ( saya pakai mie instan ) 1 butir telur 2 siung bawang putih secukupnya garam | 1 bungkus indomie goreng 1 buah telur 1 1 / 2 gelas belimbing air 100 gram keju |
soup iga sapi | sapi (beef) | an tulang iga sapi 1 buah wortel 1 / 2 buah kentang 1 batang daun bawang 1 ikat sawi hijau 1 bungkus soup 1 liter air 1 sdm minyak goreng bumbu halus : 1 siung bawang putih 1 sdt lada 1 ruas jahe 1 butir kemiri 1 cm jahe | 1 kg iga sapi 2 buah wortel 2 batang daun bawang 2 siung bawang putih 1 / 2 sdt merica bubuk 1 sdt garam 1 sdm gula pasir 1 buah tomat 1 batang daun seledri 1 ruas jahe 1 btg daun bawang ( iris ) 1 siung bawang bombay ( iris tipis ) 2 sdm minyak goreng | 1 kg iga sapi 1 / 2 kg kentang 2 buah wortel 3 buah kentang @ 5 siung bawang putih 4 siung bawah merah secukupnya merica garam penyedap rasa air | 500 gram iga sapi 2 bh kentang 2 bh wortel 1 bh tomat 1 batang daun bawang 2 batang daun seledri 6 bh bawang merah 4 bh bawang putih 1 bh kayu manis 2 bh kapulaga 1 / 2 bh pala 1 bh bunga lawang 1 bh kemiri 15 butir merica 4 gelas air putih secukupnya garam |
tempe daging rolade cabai hijau | tempe | 1 / 2 papan tempe 1 / 4 kg daging sapi 1 bungkus rolade kecil 1 butir telur 1 buah bawang bombay 3 siung bawang putih 3 buah cabai hijau 1 sdt garam 1 sdm gula 1 sachet kaldu bubuk 1 batang daun bawang 1 lembar daun salam 1 ruas lengkuas 1 sendok teh ketumbar 1 gelas air | 1 papan tempe 1 / 2 kg daging sapi 1 buah bawang bombay 3 siung bawang putih 1 sdt garam 1 sdm gula pasir 1 bungkus penyedap rasa 1 butir telur 1 batang daun bawang 1 sachet saos cabai hijau 1 siung bawang merah 1 biji cabai rawit 1 lembar daun salam 1 ruas lengkuas 1 tangkai daun bawang ( iris ) secukupnya minyak goreng | 1 papan tempe, potong dadu 1 / 2 kg daging ayam, cincang halus 2 buah wortel, iris korek api 3 buah kentang, kupas potong dagu 5 buah cabai hijau besar, buang bijinya, rajang 4 siung bawang putih, geprek cincang halus ( saya pakai bawang bombay ) secukupnya garam, gula, merica bubuk | 1 tempe panjang ( sesuai ukuran yg ada dipasar ) 4 rolade yg sudah di goreng kemudian dipotong dadu 20 buah cabai keriting hijau 1 bawang merah 1 bawang putih 2 sdm saus tiram 1 lembar daun salam 1 / 2 sdt garam 1 / 2 sdt gula ( klw suka manis boleh dksh ) 1 / 4 air matang gelas kecil 3 sdm minyak |
pepes ikan gembung | ikan (fish) | 1 / 2 kg ikan gembus 1 ikat kemangi 1 buah tomat 1 batang sereh 1 ruas jari lengkuas 1 lembar daun salam 1 sdt garam 1 sdm gula 1 bungkus masako rasa ayam 1 liter air bumbu halus : 8 siung bawang merah 4 siung bawang putih 5 buah cabe merah 5 butir kemiri 1 cm jahe 1 butir kemiri ( sangrai ) 1 sendok teh ketumbar 1, 5 sendok teh merica | 1 / 2 kg ikan gembung 1 / 4 kg kelapa parut 1 buah tomat 1 batang serai 2 lembar daun salam 1 ruas lengkuas 1 ikat kemangi bumbu halus : 10 buah cabe merah 10 siung bawang merah 5 siung bawang putih 1 cm kunyit 1 sdt ketumbar 1 butir kemiri 1 sdm garam 1 bks penyedap rasa | 1 kg ikan gembung 1 / 2 kg kelapa parut bumbu halus : 10 siung bawang merah 5 siung bawah putih 3 butir kemiri 2 ruas kunyit 4 butir kemiri sangrai secukupnya ketumbar garam daun pisang untuk membungkus | 1 / 2 kg ikan gembung 5 bh bawang merah 1 bh bawang putih 5 bh cabe merah 1 cm jahe jeruk nipis dan garam kunyit dan daun kemangi kelapa parut daun pisang , lidi y utk membungkus |
pepes ayam | ayam (chicken) | 1 / 2 kg ayam 1 ikat kemangi 1 buah tomat 1 batang sereh 2 lembar daun salam 1 ruas jari lengkuas 1 lembar daun jeruk 1 sdt garam 1 sdm gula 1 bungkus masako rasa ayam bumbu halus : 8 siung bawang merah 4 siung bawang putih 1 cm jahe 1 butir kemiri 1 sendok teh ketumbar 1 bks masako 1 liter air | 1 / 2 kg ayam 1 / 4 kg kelapa parut 1 buah tomat 1 batang daun bawang 1 ikat kemangi bumbu halus : 10 siung bawang merah 5 siung bawang putih 1 ruas kunyit 1 sdt ketumbar 1 butir kemiri 1 cm jahe 1 sdm garam 1 bungkus penyedap rasa 1 lembar daun salam 1 btg serai 1 lbr daun jeruk 1 bh tomat | 1 kg ayam 1 / 2 kg kentang 2 buah tomat 5 siung bawang merah 3 siung bawah putih 10 buah cabe rawit secukupnya garam daun pisang untuk membungkus | 500 gr ayam , cuci bersih , lumuri air jeruk nipis , sisihkan secukupnya daun pisang , lap sampai bersih , jemur sebentar 1 batang serai , potong2 1 cm jahe , potong2 1 cm lengkuas , potong2 beberapa lembar daun salam iris cabe merah ( boleh cabe keriting ) secukupnya daun kemangi , siangi ( kali ini sy ngga pake ) secukupnya irisan daun bawang bumbu halus : 1 siung bawang putih 2 cm kunyit 1 / 2 sdt lada secukupnya garam 5 butir kemiri sangray |
ramen babi | out of domain | nasi putih 1 / 2 kg daging kambing 1 buah bawang bombay 2 buah tomat 1 batang daun bawang 1 sdm kecap manis 1 sdt saos tiram 1 bungkus penyedap rasa 1 liter air | 1 kg daging kambing 1 buah wortel 1 batang daun bawang 1 siung bawang putih 1 / 2 sdt merica 1 sdt garam 1 sdm gula 1 butir telur 1 sachet kaldu bubuk 1 liter air | 1 kg daging sapi / kambing, potong dadu 1 / 2 kg daging kambing / sapi, rebus sampai empuk 2 butir telor ayam bumbu halus : 5 siung bawang merah 4 siung bawah putih 3 butir kemiri secukupnya ketumbar garam gula penyedap rasa | - |
meong meong | random goes brr | ik sapi 1 / 2 kg daging sapi bumbu halus : 5 siung bawang putih 1 ruas jahe 1 sdm ketumbar 1 sdt lada 1 buah kemiri 1 batang sereh 1 lembar daun salam 1 cm ruas lengkuas 1 butir telur 1 bungkus santan kara secukupnya garam secukupnya gula secukupnya penyedap rasa secukupnya air | 1 kg daging kambing 1 / 2 kg kentang 1 buah kelapa parut 1 ruas jahe 1 batang serai 1 lembar daun salam 1 sdm gula merah 1 sdt garam 1 bungkus penyedap rasa bumbu halus : 5 siung bawang merah 3 siung bawang putih 1 cm kunyit 1 butir kemiri 1 1 / 4 sdt ketumbar 1 biji kemiri | 1 / 2 kg daging kambing 1 ons cabe merah 5 siung bawang merah @ 2 siung bawah putih secukupnya garam | - |
There are several interesting points that we can take from these samples. In mie goreng telur keju
ingredients, T5 and IndoGPT miss an important ingredient, which is cheese (keju). The funny part is that IndoGPT output a ā(gula) sugarā as the ingredient, which is weird for me. IndoBart predicts it perfectly with a branded cheese. Although, the ground truth has few ingredients, in my opinion, the ingredients that the model output can cook the mie goreng telur keju
.
For tempe daging rolade cabai hijau
, only IndoGPT can output rolade
as a part of the ingredient. Seems like the other models donāt remember about rolade
. I explored the data and found that there are only five foods that contain rolade
as its ingredient. So, itās expected for the model to not learn about it. Itās interesting to see that IndoGPT can learn it. Overall, the foodās ingredients are good, at least for me.
The other non-out-of-domain food is good too, although it has different ingredients as the ground truth. There are some interesting outputs such as IndoGPT outputting gembus
instead of gembung
and IndoGPT fails to output the quantity of the tulang iga sapi
in the ingredient of soup iga sapi
. From these samples, we can see that the ingredients may have different output from the target, yet, itās still the right choice. From this finding, we also think that BLEU score with a single target may not suitable to measure the quality of the recipe generator model. I think, we also need to measure with the help of experts to evaluate. There are multiple aspects to be evaluated such as the quantity of the ingredients and the correctness of the ingredients. Currently, as far as I know, there is no such automatic metric that can measure those things.
For the out-of-domain input, unfortunately, the models cannot perform well. It cannot output the ramen babi (pork ramen)
with the mandatory ingredients such as pork and noodles. It means that they cannot do zero-shot learning to output out-of-domain prediction. Instead, they output beef or lamb as the ingredients of the food. The nonsense input, meong meong
, also output those ingredients. This indicates that the models are biased toward foods that have those ingredients.
Anyone know any food that has
meong meong
andramen babi
ingredients?
Iām curious XD
Conclusion
Random cat~. Photo by Manja Vitolic on Unsplash .
In this post, I experimented with an Indonesian recipe generator using pre-trained models. IndoBART outperforms other models based on the BLEU score. We can also conclude that fine-tuning a pre-trained model is generally better than the non-pre-trained one. It is interesting to see that it really works!
Actually, there are many things to be explored here. For example, it is interesting to see the effect of pre-trained vs the non-pre-trained one of BART, T5, and GPT. I also need to do some rigorous analysis of the trained model. Sadly, because of my limited resources, I cannot do it for now.
In the future, I plan to write about the current progress of the seq2seq model. There are many new interesting published papers in Machine Learning conferences in 2022. I will study and write about it in my blog.
Feel free to ask and comment anything below. You can write in Indonesian or English language.
Cite
For anyone that want to cite my blog
1
2
3
4
5
6
7
@misc{wibowo_2022,
title={Create Indonesian Recipe Generator by Fine-tuning T5, BART, and GPT-2 },
url={https://haryoa.github.io/},
journal={Haryo blog},
author={Wibowo, Haryo Akbarianto},
year={2022},
month={May}}
Sources and Explanations
Preview Image is from Unsplash by Brooke Lark