๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

NLP

์‹ฌ๋ฆฌ ์ผ€์–ด ์ฑ—๋ด‡(kogpt2, kobert) ๊ตฌํ˜„ํ•ด ๋ฐฐํฌํ•ด๋ณด์ž

 

[1] kogpt2 ๊ธฐ๋ฐ˜ ์‹ฌ๋ฆฌ ์ผ€์–ด ์ฑ—๋ด‡

ํ•œ์ค„ ์„ค๋ช…

KoGPT2 ๋ชจ๋ธ์€ ๋ฌธ์žฅ์„ "์ƒ์„ฑ"ํ•ด๋‚ด๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์‹ฌ๋ฆฌ ์ผ€์–ด ๋ชฉ์ ์˜ ์ฑ—๋ด‡์„ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ์ž…๋ ฅ ๋ฐ›์€ ๋‚ด์šฉ์— ๋Œ€ํ•ด ์œ„๋กœํ•˜๊ฑฐ๋‚˜ ๊ณต๊ฐํ•˜๊ฑฐ๋‚˜ ๋ถ€๋“œ๋Ÿฌ์šด ํ‘œํ˜„์œผ๋กœ ๋ฐ˜์‘ํ•˜๊ณ  ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ด๋‚ด๋„๋ก ํŒŒ์ธ ํŠœ๋‹์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ์ฝ”๋“œ ๋งํฌ์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹

1. ai hub ์ œ๊ณต, ์›ฐ๋‹ˆ์Šค ๋Œ€ํ™” ์Šคํฌ๋ฆฝํŠธ ๋ฐ์ดํ„ฐ์…‹

AI hub > ์™ธ๋ถ€๋ฐ์ดํ„ฐ > KETI R&D๋ฐ์ดํ„ฐ >์ธ์‹๊ธฐ์ˆ (์–ธ์–ด์ง€๋Šฅ) > ์›ฐ๋‹ˆ์Šค ๋Œ€ํ™” ์Šคํฌ๋ฆฝํŠธ ๋ฐ์ดํ„ฐ์…‹

2. @songys (์†ก์˜์ˆ™๋‹˜) ์ œ๊ณต, ์ฑ—๋ด‡ ๋ฐ์ดํ„ฐ์…‹

๋‘ ๋ฐ์ดํ„ฐ๋ฅผ ์•„๋ž˜์˜ ๋ฐ์ดํ„ฐ ํ˜•์‹์— ๋งž๊ฒŒ ๊ฐ€๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ํ™˜๊ฒฝ

colab ๊ธฐ๋ณธ ํ™˜๊ฒฝ์—์„œ๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค๋งŒ, ์‹œ๊ฐ„์ด ๊ฝค ์˜ค๋ž˜๊ฑธ๋ฆฌ๋Š” ํŽธ์ž…๋‹ˆ๋‹ค!

1 epoch์— 15~16๋ถ„ ์ •๋„ ๊ฑธ๋ฆฌ๋Š”๋ฐ epoch ์ตœ๋Œ€ 5๋ฒˆ ์ •๋„๊นŒ์ง€๊ฐ€ ์ ์ •์„ ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ด์„œ ๋Ÿฐํƒ€์ž„์ด ๋Š์–ด์ง€์ง€๋งŒ ์•Š์œผ๋ฉด ์ถฉ๋ถ„ํžˆ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ํ•™๊ต์—์„œ ์ง€์›๋ฐ›์€ gpu ์„œ๋ฒ„์—์„œ ํ•™์Šต์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

์‚ฌ์‹ค ํ•™์Šต์‹œํ‚ฌ ๋•Œ ์˜์กด์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ์ œ๊ฐ€ ํ•™์Šต์‹œ์ผฐ๋˜ ํ™˜๊ฒฝ์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์ ์–ด๋‘๊ฒ ์Šต๋‹ˆ๋‹ค.

linux ํ™˜๊ฒฝ์—์„œ (๋” ์ž์„ธํžˆ๋Š” ubuntu)

์ €๋Š” python ๋ฒ„์ „์ด 3.8.10 ์ž…๋‹ˆ๋‹ค.

requirements_linux.txt ์— ํ•„์š”ํ•œ ์˜์กด์„ฑ์„ ์ž‘์„ฑํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

transformers==4.5.1
pytorch_lightning==1.2.10
pandas

์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋กœ ์‹คํ–‰์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

pip install -r requirements_linux.txt

์•„๋ž˜์˜ ๋ช…๋ น์–ด๋กœ torch์˜ ๋ฒ„์ „์œผ๋กœ ์žฌ์„ค์น˜ํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
๋”๋ณด๊ธฐ

graphic card(nvidia ๊ฐ™์€, ์ €๋Š”_RTX 3080)๊ฐ€ ์ง€์›ํ•˜๋Š” torch ๋ฒ„์ „์ด ๋ช‡ ๊ฐœ ์—†์–ด์„œ ๊ตฌ๊ธ€๋งํ•ด๋ณด๋ฉด ์• ๋ฅผ ๋จน๊ณ  ์žˆ๋Š” ์‚ฌ๋žŒ๋“ค์ด ๊ต‰์žฅํžˆ ๋งŽ๋‹ค.

capability sm_86 is not compatible

The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75

์ด๋Ÿฐ ๋ฌธ๊ตฌ๋“ค์„ ์‹ฌ์‹ฌ์ฐฎ๊ฒŒ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

https://pytorch.org/get-started/previous-versions/

pytorch ๊ณต์‹ ์‚ฌ์ดํŠธ์ด๊ณ  ์ด์ „ ๋ฒ„์ „ ์„ค์น˜ ๋ช…๋ น์–ด๋ฅผ ์ž์„ธํžˆ ๊ธฐ๋กํ•ด๋‘” ๋งํฌ์ด๋‹ค.

ํ˜ธํ™˜์ด ๋˜๋Š” torch๋ฅผ ์ฐพ์„ ๋•Œ๊นŒ์ง€ ์ผ์—†์ด ์„ค์น˜ํ•ด๋ณด๊ณ  ์‹คํ–‰ํ•ด๋ณด๋ฉด ๋œ๋‹ค.

cuda ๋ฒ„์ „์ด ์•ˆ๋งž์•„๋„ ์ผ๋‹จ ์‹œ๋„ํ•ด๋ณด๋Š” ๊ฒƒ์ด ์ข‹๋‹ค..!

์ฝ”๋“œ ์„ค๋ช…

๋ ˆํผ๋Ÿฐ์Šค ๋งํฌ์ž…๋‹ˆ๋‹ค.

์ €๋Š” ์‹ฌ๋ฆฌ ์ผ€์–ด ๋ชฉ์ ์— ๋งž์ถ”์–ด ๊ฐ€๊ณตํ•œ ๋ฐ์ดํ„ฐ์™€ ์กฐ๊ธˆ์”ฉ ์ˆ˜์ •ํ•ด ํ•™์Šต์‹œ์ผฐ๋˜ ์ฝ”๋“œ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด github repo๋ฅผ ์ƒ์„ฑํ•ด ์ €์žฅํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค. 

์ œ ๋ ˆํฌ์˜ trainer_s.py ์ฝ”๋“œ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

import argparse
import logging

import numpy as np
import pandas as pd
import torch
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.core.lightning import LightningModule
from torch.utils.data import DataLoader, Dataset
from transformers.optimization import AdamW, get_cosine_schedule_with_warmup
from transformers import PreTrainedTokenizerFast, GPT2LMHeadModel

pytorch lightning์„ ํ™œ์šฉํ•ด ๊ตฌํ˜„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

parser = argparse.ArgumentParser(description='Simsimi based on KoGPT-2')

parser.add_argument('--chat',
                    action='store_true',
                    default=False,
                    help='response generation on given user input')

parser.add_argument('--sentiment',
                    type=str,
                    default='0',
                    help='sentiment for system. 0 is neutral, 1 is negative, 2 is positive.')

parser.add_argument('--model_params',
                    type=str,
                    default='model_chp/model_-last.ckpt',
                    help='model binary for starting chat')

parser.add_argument('--train',
                    action='store_true',
                    default=False,
                    help='for training')

ArgumentParser๋Š” ์ปค๋งจ๋“œ ๋ผ์ธ์—์„œ ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•  ๋•Œ ์ธ์ˆ˜๋ฅผ ๋ฐ›์•„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€ํ•˜๋Š” ํ‘œ์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.

python trainer_s.py --train

ํ•™์Šต์„ ์‹œ์ž‘ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” commend line์—์„œ --train ์˜ต์…˜์„ ์ฃผ์–ด ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

python trainer_s.py --chat

๋ช…๋ น์œผ๋กœ ์ฑ„ํŒ…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

logger = logging.getLogger()
logger.setLevel(logging.INFO)

Python์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” logging ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•ด ๋กœ๊ทธ๋ฅผ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•ด ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

INFO level๋กœ ์„ค์ •ํ•ด ์ž‘์—…์ด ์ •์ƒ์ ์œผ๋กœ ์ž‘๋™ํ•˜๊ณ  ์žˆ๋‹ค๋Š” ํ™•์ธ ๋ฉ”์‹œ์ง€๋„ ๋ณด์—ฌ๋‹ฌ๋ผ๊ณ  ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋กœ๊ทธ๊ฐ€ ๊ธธ๊ฒŒ ๋œจ๋Š” ๊ฒŒ ์‹ซ๋‹ค๋ฉด ์ด ๋ถ€๋ถ„์„ ์ง€์›Œ์ฃผ์‹œ๊ฑฐ๋‚˜ ๋กœ๊ฑฐ level์„ ์กฐ์ ˆํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

U_TKN = '<usr>'
S_TKN = '<sys>'
BOS = '</s>'
EOS = '</s>'
MASK = '<unused0>'
SENT = '<unused1>'
PAD = '<pad>'

TOKENIZER = PreTrainedTokenizerFast.from_pretrained("skt/kogpt2-base-v2",
            bos_token=BOS, eos_token=EOS, unk_token='<unk>',
            pad_token=PAD, mask_token=MASK)

 

์„ค์ •ํ•˜๊ณ  ์‹ถ์€ ํ† ํฐ์„ ์ƒ์ˆ˜๋กœ ์„ค์ •ํ•˜๊ณ  hugging face์— ์˜ฌ๋ผ์™€์žˆ๋Š” skt/kogpt2-base-v2 ๋ฒ„์ „์˜ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

PreTrainedTokenizerFast ์˜ ์†์„ฑ๊ฐ’์€ hugging face์˜ ๊ณต์‹ ๋ฌธ์„œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

class CharDataset(Dataset):
    def __init__(self, chats, max_len=32):
        self._data = chats
        self.first = True
        self.q_token = U_TKN
        self.a_token = S_TKN
        self.sent_token = SENT
        self.bos = BOS
        self.eos = EOS
        self.mask = MASK
        self.pad = PAD
        self.max_len = max_len
        self.tokenizer = TOKENIZER 

    def __len__(self):
        return len(self._data)

    def __getitem__(self, idx):
        turn = self._data.iloc[idx]
        q = turn['Q']
        a = turn['A']
        sentiment = str(turn['label'])
        q_toked = self.tokenizer.tokenize(self.q_token + q + \
                                          self.sent_token + sentiment)   
        q_len = len(q_toked)
        a_toked = self.tokenizer.tokenize(self.a_token + a + self.eos)
        a_len = len(a_toked)
        if q_len + a_len > self.max_len:
            a_len = self.max_len - q_len
            if a_len <= 0:
                q_toked = q_toked[-(int(self.max_len/2)):]
                q_len = len(q_toked)
                a_len = self.max_len - q_len
                assert a_len > 0
            a_toked = a_toked[:a_len]
            a_len = len(a_toked)
            assert a_len == len(a_toked), f'{a_len} ==? {len(a_toked)}'
        # [mask, mask, ...., mask, ..., <bos>,..A.. <eos>, <pad>....]
        labels = [
            self.mask,
        ] * q_len + a_toked[1:]
        if self.first:
            logging.info("contexts : {}".format(q))
            logging.info("toked ctx: {}".format(q_toked))
            logging.info("response : {}".format(a))
            logging.info("toked response : {}".format(a_toked))
            logging.info('labels {}'.format(labels))
            self.first = False
        mask = [0] * q_len + [1] * a_len + [0] * (self.max_len - q_len - a_len)
        self.max_len
        labels_ids = self.tokenizer.convert_tokens_to_ids(labels)
        while len(labels_ids) < self.max_len:
            labels_ids += [self.tokenizer.pad_token_id]
        token_ids = self.tokenizer.convert_tokens_to_ids(q_toked + a_toked)
        while len(token_ids) < self.max_len:
            token_ids += [self.tokenizer.pad_token_id]
        return(token_ids, np.array(mask),
               labels_ids)

CharDataset์€ Dataset์„ ์ƒ์†๋ฐ›์•˜์œผ๋ฏ€๋กœ init, len, getitem ๋ฉ”์„œ๋“œ๋ฅผ ์˜ค๋ฒ„๋ผ์ด๋”ฉํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

Dataset์€ torch.utils.data.Dataset ์— ์œ„์น˜ํ•ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ถ”์ƒํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.

len์€ ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋ฅผ ๋ฆฌํ„ดํ•˜๊ณ  getitem์€ i๋ฒˆ์งธ ์ƒ˜ํ”Œ์„ ์ฐพ๋Š”๋ฐ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

class KoGPT2Chat(LightningModule):
    def __init__(self, hparams, **kwargs):
        super(KoGPT2Chat, self).__init__()
        self.hparams = hparams
        self.neg = -1e18
        self.kogpt2 = GPT2LMHeadModel.from_pretrained('skt/kogpt2-base-v2')
        self.loss_function = torch.nn.CrossEntropyLoss(reduction='none')

    @staticmethod
    def add_model_specific_args(parent_parser):
        # add model specific args
        parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
        parser.add_argument('--max-len',
                            type=int,
                            default=64,
                            help='max sentence length on input (default: 32)')

        parser.add_argument('--batch-size',
                            type=int,
                            default=96,
                            help='batch size for training (default: 96)')
        parser.add_argument('--lr',
                            type=float,
                            default=5e-5,
                            help='The initial learning rate')
        parser.add_argument('--warmup_ratio',
                            type=float,
                            default=0.1,
                            help='warmup ratio')
        return parser

    def forward(self, inputs):
        # (batch, seq_len, hiddens)
        output = self.kogpt2(inputs, return_dict=True)
        return output.logits

    def training_step(self, batch, batch_idx):
        token_ids, mask, label = batch
        out = self(token_ids)
        mask_3d = mask.unsqueeze(dim=2).repeat_interleave(repeats=out.shape[2], dim=2)
        mask_out = torch.where(mask_3d == 1, out, self.neg * torch.ones_like(out))
        loss = self.loss_function(mask_out.transpose(2, 1), label)
        loss_avg = loss.sum() / mask.sum()
        self.log('train_loss', loss_avg)
        return loss_avg

    def configure_optimizers(self):
        # Prepare optimizer
        param_optimizer = list(self.named_parameters())
        no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
        optimizer_grouped_parameters = [
            {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
            {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
        ]
        optimizer = AdamW(optimizer_grouped_parameters,
                          lr=self.hparams.lr, correct_bias=False)
        # warm up lr
        num_train_steps = len(self.train_dataloader()) * self.hparams.max_epochs
        num_warmup_steps = int(num_train_steps * self.hparams.warmup_ratio)
        scheduler = get_cosine_schedule_with_warmup(
            optimizer,
            num_warmup_steps=num_warmup_steps, num_training_steps=num_train_steps)
        lr_scheduler = {'scheduler': scheduler, 'name': 'cosine_schedule_with_warmup',
                        'monitor': 'loss', 'interval': 'step',
                        'frequency': 1}
        return [optimizer], [lr_scheduler]

    def _collate_fn(self, batch):
        data = [item[0] for item in batch]
        mask = [item[1] for item in batch]
        label = [item[2] for item in batch]
        return torch.LongTensor(data), torch.LongTensor(mask), torch.LongTensor(label)

    def train_dataloader(self):
        data = pd.read_csv('chatbot_dataset_s.csv')
        self.train_set = CharDataset(data, max_len=self.hparams.max_len)
        train_dataloader = DataLoader(
            self.train_set, batch_size=self.hparams.batch_size, num_workers=2,
            shuffle=True, collate_fn=self._collate_fn)
        return train_dataloader

    def chat(self, sent='0'):
        tok = TOKENIZER
        sent_tokens = tok.tokenize(sent)
        with torch.no_grad():
          p = input('user > ')
          q = p.strip()
          a = ''
          while 1:
            input_ids = torch.LongTensor(tok.encode(U_TKN + q + SENT + sent + S_TKN + a)).unsqueeze(dim=0)
            pred = self(input_ids)
            gen = tok.convert_ids_to_tokens(
              torch.argmax(
                pred,
                dim=-1).squeeze().numpy().tolist())[-1]
            if gen == EOS:
              break
            a += gen.replace('โ–', ' ')
          print("Chatbot > {}".format(a.strip()))
        return q

hyperparameter ๋“ค์„ ArgumentParser๋ฅผ ์‚ฌ์šฉํ•ด ์ปค๋งจ๋“œ ๋ผ์ธ์—์„œ ์˜ต์…˜์œผ๋กœ ์ง€์ •ํ•ด์ค„ ์ˆ˜ ์žˆ๋„๋ก ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

pytorch lightning์—์„œ๋Š” trainer์™€ ๋ชจ๋ธ์ด ์ƒํ˜ธ์ž‘์šฉ์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก pytorch์˜ nn.Module์˜ ์ƒ์œ„ ํด๋ž˜์Šค์ธ LightningModule์„ ๊ตฌํ˜„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. LightningModule์„ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด LightningModule ํด๋ž˜์Šค๋ฅผ ์ƒ์†๋ฐ›๊ณ  ๋ฉ”์†Œ๋“œ๋ฅผ ์˜ค๋ฒ„๋ผ์ด๋”ฉํ•˜์—ฌ ๊ตฌํ˜„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

forward๋Š” ๋ชจ๋ธ์˜ ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ผญ ์ •์˜ํ•ด์•ผ ํ•˜๋Š” ๋ฉ”์„œ๋“œ๋Š” ์•„๋‹™๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ self(์ž…๋ ฅ)๊ณผ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ๋ฏ€๋กœ ๊ตฌํ˜„ํ•ด์ฃผ๋ฉด ํŽธ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
training_step๊ณผ configure_optimizers ํ•„์ˆ˜์ ์œผ๋กœ ๊ตฌํ˜„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
training_step์€ ํ•™์Šต ๋ฃจํ”„์˜ body ๋ถ€๋ถ„์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด ๋ฉ”์†Œ๋“œ์—์„œ๋Š” ์ธ์ž๋กœ training dataloader๊ฐ€ ์ œ๊ณตํ•˜๋Š” batch์™€ ํ•ด๋‹น batch์˜ index๊ฐ€ ์ฃผ์–ด์ง€๊ณ  train loss๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
configure_optimizers์—์„œ๋Š” ๋ชจ๋ธ์˜ ์ตœ์  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ์„ ๋•Œ ์‚ฌ์šฉํ•  optimizer์™€ scheduler๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ์—์„œ๋Š” ํ•™์Šตํ•ด์•ผ ํ•  ๋ชจ๋ธ์ด ํ•˜๋‚˜์ด๋ฏ€๋กœ ํ•˜๋‚˜์˜ Adam optimzer๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
train_dataloader๋Š” batch ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ์…‹์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ batch size๋กœ ์Šฌ๋ผ์ด์‹ฑํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. 
DataLoader์—๋Š” batchsize๋ฅผ ํฌํ•จํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์ด ์žˆ๋Š”๋ฐ, ์ด ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋ฐ”๋กœ collate_fn์ž…๋‹ˆ๋‹ค.
_collate_fn ํ•จ์ˆ˜๋กœ ์ปค์Šคํ…€ ํ•ด์ค€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ฐ€๋ณ€ ๊ธธ์ด์˜ input์„ batch๋กœ ์ž˜ ๋ฌถ์–ด์„œ DataLoader๋กœ ๋„˜๊ฒจ์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

chat ์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅ์„ ํ•˜๋ฉด self(ํ† ํฐํ™”๋œ ์ž…๋ ฅ) ์œผ๋กœ forward๊ฐ€ ์‹คํ–‰๋˜์–ด ์ฑ—๋ด‡์˜ ์‘๋‹ต์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

parser = KoGPT2Chat.add_model_specific_args(parser)
parser = Trainer.add_argparse_args(parser)
args = parser.parse_args()
logging.info(args)

if __name__ == "__main__":
	# python trainer_s.py --train --gpus 1 --max_epochs 5
    if args.train:
        checkpoint_callback = ModelCheckpoint(
            dirpath='model_chp',
            filename='{epoch:02d}-{train_loss:.2f}',
            verbose=True,
            save_last=True,
            monitor='train_loss',
            mode='min',
            prefix='model_'
        )
        model = KoGPT2Chat(args)
        model.train()
        trainer = Trainer.from_argparse_args(
            args,
            checkpoint_callback=checkpoint_callback, gradient_clip_val=1.0)
        trainer.fit(model)
        logging.info('best model path {}'.format(checkpoint_callback.best_model_path))
    
    # python trainer_s.py --chat --gpus 1
    if args.chat:
        model = KoGPT2Chat.load_from_checkpoint(args.model_params)
        model.chat()

ํ•™์Šต์„ ์‹œํ‚ค๊ณ  ๋‚˜์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ๊ฒฝ๋กœ๋กœ 2๊ฐœ์˜ ๋ชจ๋ธ์ด ์ €์žฅ์ด ๋ฉ๋‹ˆ๋‹ค.

model_chp/model_-epoch=04-train_loss=16.33.ckpt
model_chp/model_-last.ckpt

์ฒซ๋ฒˆ์งธ ๋ชจ๋ธ์€ train_loss๊ฐ€ ๊ฐ€์žฅ ์ž‘์€ ๋ชจ๋ธ ํŒŒ์ผ์ด๊ณ ,

๋‘๋ฒˆ์งธ ํŒŒ์ผ์€ ์ง€์ •ํ•œ epoch ์ˆ˜๋งŒํผ ํ•™์Šต์‹œํ‚ค๊ณ  ๋‚œ ๋’ค ์–ป์€ ๋ชจ๋ธ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

python trainer_s.py --chat ๋ช…๋ น์–ด๋กœ ์ž…๋ ฅ์— ๋Œ€ํ•œ ์ฑ—๋ด‡์˜ ์‘๋‹ต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ถœ๋ ฅ ๊ฒฐ๊ณผ

์•„๋ž˜์˜ ๊ฒฐ๊ณผ๋Š” flask ์›น ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋‹ด์•„ ์‹คํ–‰ํ•ด๋ณธ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

1๋ฒˆ๋ถ€ํ„ฐ 7๋ฒˆ๊นŒ์ง€๋Š” ๋ถ€์ •์ ์ธ ์ž…๋ ฅ์ด๊ณ  8๋ฒˆ๋ถ€ํ„ฐ 10๋ฒˆ๊นŒ์ง€๋Š” ๊ธ์ •์ ์ธ ์ž…๋ ฅ์— ๋Œ€ํ•ด ํ…Œ์ŠคํŠธํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

์ „๋ฐ˜์ ์œผ๋กœ ์งง๊ฒŒ ์‘๋‹ตํ•ด์ค๋‹ˆ๋‹ค. ์ฃผ์ œ์— ๋ฒ—์–ด๋‚˜๋Š” ์‘๋‹ต๋„ ์žˆ์ง€๋งŒ ์ด์ •๋„๋ฉด ๊ฝค ์‘๋‹ต์„ ์ž˜ํ•ด์ฃผ๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ ์‚ฌ์šฉํ•ด๋ณด๋ฉด ๋ฐ์€ ๋ถ„์œ„๊ธฐ์˜ ๋ฌธ์žฅ์„ ๋„ฃ์œผ๋ฉด ์›ƒ๊ธด ๋Œ€๋‹ต์„ ๋ฐ›์„ ๋•Œ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต์‹œํ‚จ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋‹ค๋ฅธ ๋ฌธ์žฅ์œผ๋กœ ๋Œ€๋‹ตํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ํ›จ์”ฌ ๋งŽ์Šต๋‹ˆ๋‹ค. KoGPT2๋Š” ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ด๋‚ด๋Š” ๋ชจ๋ธ์ด๋ผ ๋‹น์—ฐํ•œ ์ด์•ผ๊ธฐ์ž…๋‹ˆ๋‹ค.


[2] kobert ๊ธฐ๋ฐ˜ ์‹ฌ๋ฆฌ ์ผ€์–ด ์ฑ—๋ด‡

ํ•œ์ค„ ์„ค๋ช…

kobert๋Š” pretrain ๋˜์–ด ์žˆ๋Š” ๊ธฐ๊ณ„๋ฒˆ์—ญ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

kobert๋Š” ๋‹ค์ค‘ ๋ถ„๋ฅ˜์— ๋งŽ์ด ํ™œ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

kobert ๊ธฐ๋ฐ˜ ์‹ฌ๋ฆฌ ์ผ€์–ด ์ฑ—๋ด‡์€ ์ž…๋ ฅ์„ 359๊ฐ€์ง€์˜ ํŠน์ • ์ƒํ™ฉ์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ ๋‹ค์Œ, ํ•ด๋‹น ํด๋ž˜์Šค์—์„œ ์ •ํ•ด์ง„ ๋‹ต๋ณ€ ์ค‘ ํ•˜๋‚˜๋ฅผ ๋žœ๋ค์œผ๋กœ ์‘๋‹ตํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ์ฝ”๋“œ ๋งํฌ์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹

359๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ์ƒํ™ฉ ์ค‘ ์ผ๋ถ€ ๋ฐœ์ทŒ

ai hub ์ œ๊ณต, ์›ฐ๋‹ˆ์Šค ๋Œ€ํ™” ์Šคํฌ๋ฆฝํŠธ ๋ฐ์ดํ„ฐ์…‹

๊ฐ•๋‚จ ์„ธ๋ธŒ๋ž€์Šค์—์„œ ์ „๋‹ฌ๋ฐ›์€ ์ƒ๋‹ด๋ฐ์ดํ„ฐ๊ฐ€ ๋Œ€ํ™” ์˜๋„์— ๋”ฐ๋ผ 359๊ฐœ ์ƒํ™ฉ์œผ๋กœ ๋ถ„๋ฅ˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

AI hub > ์™ธ๋ถ€๋ฐ์ดํ„ฐ > KETI R&D๋ฐ์ดํ„ฐ >์ธ์‹๊ธฐ์ˆ (์–ธ์–ด์ง€๋Šฅ) > ์›ฐ๋‹ˆ์Šค ๋Œ€ํ™” ์Šคํฌ๋ฆฝํŠธ ๋ฐ์ดํ„ฐ์…‹

ํ•™์Šต ํ™˜๊ฒฝ

colab pro ํ™˜๊ฒฝ์ด๊ฑฐ๋‚˜ gpu ์„œ๋ฒ„์—์„œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ํด๋ž˜์Šค๊ฐ€ 359๊ฐ€์ง€ ๋˜๊ธฐ ๋•Œ๋ฌธ์— epoch๋ฅผ 50๋ฒˆ ์ด์ƒ์œผ๋กœ ํ•™์Šต์‹œ์ผœ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฐธ๊ณ ๋กœ ํ•ด๋‹น hyper parameter๋Š” ๊ฒฝํ—˜์ ์œผ๋กœ ์–ป์€ ์ˆ˜์น˜์ž…๋‹ˆ๋‹ค.

colab ๊ธฐ๋ณธ ํ™˜๊ฒฝ์—์„œ๋Š”

batch size 1
1epoch 16๋ถ„ ์†Œ์š”

batch size๋ฅผ 1๋กœ ํ–ˆ์„ ๋•Œ 1 epoch ๋‹น 16๋ถ„ ์ •๋„ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. batch size๋ฅผ 2 ์ด์ƒ์œผ๋กœ ๋†’์ด๋ฉด colab์—์„œ RuntimeError: CUDA out of memory ๋ผ๊ณ  ์—๋Ÿฌ๋ฅผ ๋‚ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. gpu ram์ด ๋ถ€์กฑํ•œ ๊ฒƒ์ด์ฃ . ์‚ฌ์‹ค์ƒ ์–ด๋ ต๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

colab pro ํ™˜๊ฒฝ์—์„œ๋Š”

batch size๋ฅผ 16 → 8 → 4 ๊นŒ์ง€ ์ค„์˜€์„ ๋•Œ ๊ฐ€๋Šฅํ–ˆ๊ณ  1 epoch ๋‹น 3๋ถ„ 40~50์ดˆ ์ •๋„ ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค.

RTX 3080 GPU ์„œ๋ฒ„์—์„œ๋Š”

batch size๋ฅผ 16์œผ๋กœ ์„ค์ •ํ•ด๋„ ๊ฐ€๋Šฅํ–ˆ๊ณ , 1 epoch ๋‹น 1๋ถ„ 50์ดˆ ์ •๋„ ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค.

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

colab์—์„œ๋Š”

!pip install kobert-transformers==0.4.1
!pip install transformers==3.0.2
!pip install torch
!pip install tokenizers==0.8.1rc1

linux์—์„œ๋Š”

pip install kobert-transformers==0.4.1
pip install transformers==3.0.2
pip install tokenizers==0.8.1rc1
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

์ฝ”๋“œ ์„ค๋ช…

๋ ˆํผ๋Ÿฐ์Šค ๋งํฌ์ž…๋‹ˆ๋‹ค.
์ง์ ‘ ํ•™์Šต์‹œ์ผœ๋ณด๋ฉด์„œ ์˜ค๋ฅ˜๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ ์ˆ˜์ •ํ•œ ๋ถ€๋ถ„์ด ์ƒ๊ฒจ github repo๋กœ ์ƒ์„ฑํ•ด ์ €์žฅํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.
์ œ ๋ ˆํฌ์˜ ์ฝ”๋“œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์„ค๋ช…ํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

 

ํ•œ ํŒŒ์ผ ๋‚ด์—์„œ ์ž‘์„ฑ๋˜์ง€ ์•Š์•„ ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๋จผ์ € ๋ณด์—ฌ๋“œ๋ฆฌ๊ณ  ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

data ํด๋” ์•ˆ์— ์žˆ๋Š” ํ…์ŠคํŠธ ํŒŒ์ผ์€ ๊ฐ„๋‹จํ•œ ์ „์ฒ˜๋ฆฌ ์ฝ”๋“œ๋ฅผ ๊ฑฐ์ณ ๋งŒ๋“ค์–ด์ง„ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

์ˆœ์„œ๋Œ€๋กœ category, answer, for_text_classification_all ํŒŒ์ผ์˜ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

๋‚ด์šฉ์„ ๋ณด๋ฉด ๊ฐ์ด ์˜ค์‹คํ…๋ฐ์š”, ๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œ์ผœ ๋ช‡ ๋ฒˆ class์— ์†ํ•˜๋Š”์ง€ ๋ชจ๋ธ์„ ํ†ตํ•ด ๋ถ„๋ฅ˜ํ•˜๊ณ 

์ดํ›„์—๋Š” ๋ถ„๋ฅ˜๋œ class ๋ฒˆํ˜ธ๋กœ category ์ •๋ณด์™€ answer๋กœ ์ฑ—๋ด‡์ด ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.

์ „์ฒ˜๋ฆฌ ์ฝ”๋“œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ฝ”๋“œ ๋งํฌ๋กœ ์—ฐ๊ฒฐํ•ด๋‘๊ฒ ์Šต๋‹ˆ๋‹ค.

import torch
import torch.nn as nn
from kobert_transformers import get_kobert_model
from torch.nn import CrossEntropyLoss, MSELoss
from transformers import BertPreTrainedModel

from model.configuration import get_kobert_config


class KoBERTforSequenceClassfication(BertPreTrainedModel):
    def __init__(self,
                 num_labels=359,
                 hidden_size=768,
                 hidden_dropout_prob=0.1,
                 ):
        super().__init__(get_kobert_config())

        self.num_labels = num_labels
        self.kobert = get_kobert_model()
        self.dropout = nn.Dropout(hidden_dropout_prob)
        self.classifier = nn.Linear(hidden_size, num_labels)

        self.init_weights()

    def forward(
            self,
            input_ids=None,
            attention_mask=None,
            token_type_ids=None,
            position_ids=None,
            head_mask=None,
            inputs_embeds=None,
            labels=None,
    ):
        outputs = self.kobert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        outputs = (logits,) + outputs[2:]  # add hidden states and attention if they are here

        if labels is not None:
            if self.num_labels == 1:
                #  We are doing regression
                loss_fct = MSELoss()
                loss = loss_fct(logits.view(-1), labels.view(-1))
            else:
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            outputs = (loss,) + outputs

        return outputs  # (loss), logits, (hidden_states), (attentions)

๋‹ค์Œ์€ model ํด๋” ์•ˆ์˜ ํŒŒ์ผ์„ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์œ„ ์ฝ”๋“œ๋Š” classifier.py ํŒŒ์ผ์˜ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

์ด๋ฒˆ ์ฑ—๋ด‡ ๋ชจ๋ธ์€ kobert ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด ๋จผ์ € ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ๋ถ„๋ฅ˜ํ•ด์•ผํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ classifier (๋ถ„๋ฅ˜๊ธฐ)๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ถ€๋ถ„์€ train ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ testํ•  ๋•Œ์—๋„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

import logging

from transformers import BertConfig

logger = logging.getLogger(__name__)

#KoBERT
kobert_config = {
    'attention_probs_dropout_prob': 0.1,
    'hidden_act': 'gelu',
    'hidden_dropout_prob': 0.1,
    'hidden_size': 768,
    'initializer_range': 0.02,
    'intermediate_size': 3072,
    'max_position_embeddings': 512,
    'num_attention_heads': 12,
    'num_hidden_layers': 12,
    'type_vocab_size': 2,
    'vocab_size': 8002
}

def get_kobert_config():
    return BertConfig.from_dict(kobert_config)

configuration.py ์˜ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

configuration.py ๋ชจ๋ธ ๊ตฌ์กฐ ๋“ฑ์˜ ์„ค์ •๊ฐ’์„ ์ง€์ •ํ•ด๋‘” ํŒŒ์ผ์ด๊ณ , classifier.py ์ฝ”๋“œ์—์„œ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

import torch
from kobert_transformers import get_tokenizer
from torch.utils.data import Dataset


class WellnessTextClassificationDataset(Dataset):
    def __init__(self,
                 file_path="./data/wellness_dialog_for_text_classification_all.txt",
                 num_label=359,
                 device='cpu',
                 max_seq_len=512,  # KoBERT max_length
                 tokenizer=None
                 ):
        self.file_path = file_path
        self.device = device
        self.data = []
        self.tokenizer = tokenizer if tokenizer is not None else get_tokenizer()

        file = open(self.file_path, 'r', encoding='utf-8')

        while True:
            line = file.readline()
            if not line:
                break
            datas = line.split("    ")
            index_of_words = self.tokenizer.encode(datas[0])
            token_type_ids = [0] * len(index_of_words)
            attention_mask = [1] * len(index_of_words)

            # Padding Length
            padding_length = max_seq_len - len(index_of_words)

            # Zero Padding
            index_of_words += [0] * padding_length
            token_type_ids += [0] * padding_length
            attention_mask += [0] * padding_length

            # Label
            label = int(datas[1][:-1])

            data = {
                'input_ids': torch.tensor(index_of_words).to(self.device),
                'token_type_ids': torch.tensor(token_type_ids).to(self.device),
                'attention_mask': torch.tensor(attention_mask).to(self.device),
                'labels': torch.tensor(label).to(self.device)
            }

            self.data.append(data)

        file.close()

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        item = self.data[index]
        return item


if __name__ == "__main__":
    dataset = WellnessTextClassificationDataset()
    print(dataset)

์œ„๋Š” dataloader.py ์˜ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

dataloader.py๋Š” csv ์™€ ๊ฐ™์€ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ pytorch ๋ชจ๋ธ๋กœ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ Dataset ํด๋ž˜์Šค๋ฅผ ์ƒ์†๋ฐ›์•„ ๊ตฌํ˜„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์•ž์„  KoGPT2 ๊ธฐ๋ฐ˜ Dataset๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, WellnessTextClassificationDataset์€ Dataset์„ ์ƒ์†๋ฐ›์•˜์œผ๋ฏ€๋กœ init, len, getitem ๋ฉ”์„œ๋“œ๋ฅผ ์˜ค๋ฒ„๋ผ์ด๋”ฉํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

len์€ ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋ฅผ ๋ฆฌํ„ดํ•˜๊ณ  getitem์€ i๋ฒˆ์งธ ์ƒ˜ํ”Œ์„ ์ฐพ๋Š”๋ฐ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

import gc
import os

import numpy as np
import torch
from torch.utils.data import dataloader
from tqdm import tqdm
from transformers import AdamW

from model.classifier import KoBERTforSequenceClassfication
from model.dataloader import WellnessTextClassificationDataset


def train(device, epoch, model, optimizer, train_loader, save_step, save_ckpt_pa                                                                                                     th, train_step=0):
    losses = []
    train_start_index = train_step + 1 if train_step != 0 else 0
    total_train_step = len(train_loader)
    model.train()

    with tqdm(total=total_train_step, desc=f"Train({epoch})") as pbar:
        pbar.update(train_step)
        for i, data in enumerate(train_loader, train_start_index):

            optimizer.zero_grad()
            outputs = model(**data)

            loss = outputs[0]

            losses.append(loss.item())

            loss.backward()
            optimizer.step()

            pbar.update(1)
            pbar.set_postfix_str(f"Loss: {loss.item():.3f} ({np.mean(losses):.3f                                                                                                     })")

            if i >= total_train_step or i % save_step == 0:
                torch.save({
                    'epoch': epoch,  # ํ˜„์žฌ ํ•™์Šต epoch
                    'model_state_dict': model.state_dict(),  # ๋ชจ๋ธ ์ €์žฅ
                    'optimizer_state_dict': optimizer.state_dict(),  # ์˜ตํ‹ฐ๋งˆ์ด                                                                                                     ์ € ์ €์žฅ
                    'loss': loss.item(),  # Loss ์ €์žฅ
                    'train_step': i,  # ํ˜„์žฌ ์ง„ํ–‰ํ•œ ํ•™์Šต
                    'total_train_step': len(train_loader)  # ํ˜„์žฌ epoch์— ํ•™์Šต                                                                                                      ํ•  ์ด train step
                }, save_ckpt_path)

    return np.mean(losses)


if __name__ == '__main__':
    gc.collect()
    torch.cuda.empty_cache()

    root_path = "."
    data_path = f"{root_path}/data/wellness_dialog_for_text_classification_all.t                                                                                                     xt"
    checkpoint_path = f"{root_path}/checkpoint"
    save_ckpt_path = f"{checkpoint_path}/kobert-wellness-text-classification.pth                                                                                                     "

    n_epoch = 60  # Num of Epoch
    batch_size = 4   # ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ #Colab์ด ๋Œ์•„๊ฐ€์ง€ ์•Š์•„ 4๋กœ ํ–ˆ์œผ๋ฉฐ, ์ฆ๊ฐ€์‹œ์ผœ๋„                                                                                                      ๋ฌด๋ฐฉ
    ctx = "cuda" if torch.cuda.is_available() else "cpu"
    device = torch.device(ctx)
    save_step = 100  # ํ•™์Šต ์ €์žฅ ์ฃผ๊ธฐ
    learning_rate = 5e-6  # Learning Rate

    # WellnessTextClassificationDataset Data Loader
    dataset = WellnessTextClassificationDataset(file_path=data_path, device=devi                                                                                                     ce)
    train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, s                                                                                                     huffle=True)

    model = KoBERTforSequenceClassfication()
    model.to(device)

    # Prepare optimizer and schedule (linear warmup and decay)
    no_decay = ['bias', 'LayerNorm.weight']
    optimizer_grouped_parameters = [
        {'params': [p for n, p in model.named_parameters() if not any(nd in n fo                                                                                                     r nd in no_decay)],
         'weight_decay': 0.01},
        {'params': [p for n, p in model.named_parameters() if any(nd in n for nd                                                                                                      in no_decay)], 'weight_decay': 0.0}
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=learning_rate)

    pre_epoch, pre_loss, train_step = 0, 0, 0
    if os.path.isfile(save_ckpt_path):
        checkpoint = torch.load(save_ckpt_path, map_location=device)
        pre_epoch = checkpoint['epoch']
        train_step = checkpoint['train_step']
        total_train_step = checkpoint['total_train_step']

        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

        print(f"load pretrain from: {save_ckpt_path}, epoch={pre_epoch}")

    losses = []
    offset = pre_epoch
    for step in range(n_epoch):
        epoch = step + offset
        loss = train(device, epoch, model, optimizer, train_loader, save_step, s                                                                                                     ave_ckpt_path, train_step)
        losses.append(loss)

 

์œ„ ์ฝ”๋“œ๋Š” ํ”„๋กœ์ ํŠธ ํด๋” ์ตœ์ƒ๋‹จ์— ์œ„์น˜ํ•˜๋Š” train.py ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

ํ•™์Šต์‹œํ‚ค๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ˆœ์ „ํŒŒ๋ฅผ ๊ฑฐ์ณ (์ด ๋•Œ KoBERTforSequenceClassfication์˜ forward๊ฐ€ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค) loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  backpropagation ์‹œํ‚ค๋ฉด์„œ ๋ชจ๋ธ ํŒŒ์ผ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

import torch
import torch.nn as nn
import random

from model.classifier import KoBERTforSequenceClassfication
from kobert_transformers import get_tokenizer


def load_wellness_answer():
    root_path = "."
    category_path = f"{root_path}/data/wellness_dialog_category.txt"
    answer_path = f"{root_path}/data/wellness_dialog_answer.txt"

    c_f = open(category_path, 'r')
    a_f = open(answer_path, 'r')

    category_lines = c_f.readlines()
    answer_lines = a_f.readlines()

    category = {}
    answer = {}
    for line_num, line_data in enumerate(category_lines):
        data = line_data.split('    ')
        category[data[1][:-1]] = data[0]

    for line_num, line_data in enumerate(answer_lines):
        data = line_data.split('    ')
        keys = answer.keys()
        if (data[0] in keys):
            answer[data[0]] += [data[1][:-1]]
        else:
            answer[data[0]] = [data[1][:-1]]

    return category, answer


def kobert_input(tokenizer, str, device=None, max_seq_len=512):
    index_of_words = tokenizer.encode(str)
    token_type_ids = [0] * len(index_of_words)
    attention_mask = [1] * len(index_of_words)

    # Padding Length
    padding_length = max_seq_len - len(index_of_words)

    # Zero Padding
    index_of_words += [0] * padding_length
    token_type_ids += [0] * padding_length
    attention_mask += [0] * padding_length

    data = {
        'input_ids': torch.tensor([index_of_words]).to(device),
        'token_type_ids': torch.tensor([token_type_ids]).to(device),
        'attention_mask': torch.tensor([attention_mask]).to(device),
    }
    return data


if __name__ == "__main__":
    root_path = "."
    checkpoint_path = f"{root_path}/checkpoint"
    save_ckpt_path = f"{checkpoint_path}/kobert-wellness-text-classification.pth"

    # ๋‹ต๋ณ€๊ณผ ์นดํ…Œ๊ณ ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
    category, answer = load_wellness_answer()

    ctx = "cuda" if torch.cuda.is_available() else "cpu"
    device = torch.device(ctx)

    # ์ €์žฅํ•œ Checkpoint ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
    checkpoint = torch.load(save_ckpt_path, map_location=device)

    model = KoBERTforSequenceClassfication()
    model.load_state_dict(checkpoint['model_state_dict'])

    model.to(ctx)
    model.eval()

    tokenizer = get_tokenizer()

    while 1:
        sent = input('\nQuestion: ')  # '์š”์ฆ˜ ๊ธฐ๋ถ„์ด ์šฐ์šธํ•œ ๋Š๋‚Œ์ด์—์š”'
        data = kobert_input(tokenizer, sent, device, 512)

        if '์ข…๋ฃŒ' in sent:
            break

        output = model(**data)

        logit = output[0]
        softmax_logit = torch.softmax(logit, dim=-1)
        softmax_logit = softmax_logit.squeeze()

        max_index = torch.argmax(softmax_logit).item()
        max_index_value = softmax_logit[torch.argmax(softmax_logit)].item()

        answer_list = answer[category[str(max_index)]]
        answer_len = len(answer_list) - 1
        answer_index = random.randint(0, answer_len)
        print(f'Answer: {answer_list[answer_index]}, index: {max_index}, softmax_value: {max_index_value}')
        print('-' * 50)

๋งŒ๋“ค์–ด์ง„ ๋ชจ๋ธ ํŒŒ์ผ์„ ์ฝ์–ด ์ฑ—๋ด‡๊ณผ ๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•œ test.py ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

kobert_input ํ•จ์ˆ˜๋กœ ์ž…๋ ฅ์„ ํ† ํฐํ™”ํ•ด์ฃผ์–ด ๋ชจ๋ธ์— input์œผ๋กœ ๋„ฃ์–ด์ฃผ๋ฉด ๋ชจ๋ธ์ด ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ๋ถ„๋ฅ˜ํ•ด๋ƒ…๋‹ˆ๋‹ค.

load_wellness_answer ํ•จ์ˆ˜๋กœ category.txt, answer.txt ํŒŒ์ผ์„ ์ฝ์–ด์™€ ์–ด๋–ค ์นดํ…Œ์ฝ”๋ฆฌ๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ๋Š”์ง€ ์„ค๋ช…๊ณผ ์ฑ—๋ด‡์˜ ์‘๋‹ต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ถœ๋ ฅ ๊ฒฐ๊ณผโ€‹

์ถœ๋ ฅ๋œ ๊ฐ’์ด ์–ด๋–ค ๊ฒƒ์„ ์˜๋ฏธํ•˜๋Š”์ง€ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์‘๋‹ต ๋’ค์— ์ถœ๋ ฅ๋œ ์ •์ˆ˜ '22'๋Š” ๋ถ„๋ฅ˜๋œ ํด๋ž˜์Šค์˜ ๋ฒˆํ˜ธ์ด๊ณ , '๊ฐ์ •/๋ˆˆ๋ฌผ'์€ 22๋ฒˆ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์„ค๋ช…์ด๊ณ , ๋งˆ์ง€๋ง‰ ์‹ค์ˆ˜๊ฐ’์€ softmax ๊ฐ’์ž…๋‹ˆ๋‹ค. 22๋ฒˆ ํด๋ž˜์Šค์ผ ํ™•๋ฅ ์ด 97ํ”„๋กœ์ด๊ณ  ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์ด๊ธฐ ๋•Œ๋ฌธ์— 22๋ฒˆ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์•ž์„  kogpt2 ์ฑ—๋ด‡๊ณผ ๋™์ผํ•œ ์ž…๋ ฅ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

1๋ฒˆ๋ถ€ํ„ฐ 7๋ฒˆ๊นŒ์ง€๋Š” ๋ถ€์ •์ ์ธ ์ž…๋ ฅ์ด๊ณ  8๋ฒˆ๋ถ€ํ„ฐ 10๋ฒˆ๊นŒ์ง€๋Š” ๊ธ์ •์ ์ธ ์ž…๋ ฅ์— ๋Œ€ํ•ด ํ…Œ์ŠคํŠธํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

๋ถ€์ •์ ์ธ ์ƒํ™ฉ์— ๋Œ€ํ•ด์„œ๋Š” softmax๊ฐ’์ด ํ™•์—ฐํžˆ ๋‚ฎ์€ ๊ฒฐ๊ณผ๋„ ์žˆ์ง€๋งŒ ๊ต‰์žฅํžˆ ์ž˜ ๋ถ„๋ฆฌํ•ด๋‚ด๊ณ  ์ƒํ™ฉ์— ์ ์ ˆํ•œ ์‘๋‹ต์„ ํ•ด์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ธ์ •์ ์ธ ์ž…๋ ฅ์—๋Š” ์ „ํ˜€ ์ž…๋ ฅ๋œ ๋ฌธ์žฅ๊ณผ ํ˜ธ์‘์ด ๋˜๊ณ  ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด๋Š” ๋‹น์—ฐํ•œ ๊ฒฐ๊ณผ ์ž…๋‹ˆ๋‹ค. ํ•™์Šต์‹œํ‚จ ๋ฐ์ดํ„ฐ์…‹์€ ์ƒ๋‹ด ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ธ์ •์ ์ธ ์ƒํ™ฉ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋Š” ๊ฑฐ์˜ ์—†๋Š” ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ KoBERT ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡์€ ์œ„๋กœํ˜• ์ฑ—๋ด‡์œผ๋กœ ์ด๋ฆ„์„ ์ •ํ–ˆ๊ณ , ์•ž์„  KoGPT2 ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡๊ณผ ํ•จ๊ป˜ ์„œ๋น„์Šคํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ์–ดํ”Œ์—์„œ ์›ํ•˜๋Š” ์ฑ—๋ด‡์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•ด์„œ๋Š” ์ด ๊ธ€์˜ ํ›„๋ฐ˜๋ถ€์ธ [4] ํ™œ์šฉ ํ˜„ํ™ฉ์—์„œ ๋” ์ž์„ธํžˆ ๋ณด์—ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.


[3] api ์„œ๋ฒ„

์ฝ”๋“œ ์ „๋ฌธ์— ๋Œ€ํ•œ ๋งํฌ์ž…๋‹ˆ๋‹ค.

flask ์›น ํ”„๋ ˆ์ž„์›Œํฌ

IDE ์„ค์น˜, ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ

์ €๋Š” IDE๋กœ PyCharm ์„ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. JetBrains ์‚ฌ์˜ ์ œํ’ˆ์ด ์ต์ˆ™ํ•ด์„œ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•ด๋‹น IDE์—์„œ new project๋ฅผ ๋งŒ๋“ค ๋•Œ ์˜ต์…˜์„ ํด๋ฆญํ•ด ์‰ฝ๊ฒŒ ๊ฐ€์ƒ ํ™˜๊ฒฝ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

virtualenv๋ฅผ ์‚ฌ์šฉํ•ด ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ, kobert ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡, kogpt2 ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡ ๋ชจ๋ธ์ด ๋ชจ๋‘ ๊ฐ™์€ ํ™˜๊ฒฝ์—์„œ ์‹คํ–‰๋˜์–ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ถฉ๋Œ์ด ์ผ์–ด๋‚  ํ™•๋ฅ ์ด ํ›จ์”ฌ ๋†’์Šต๋‹ˆ๋‹ค.

๋จผ์ € ํ•™์Šตํ•  ๋•Œ ์‚ฌ์šฉํ–ˆ๋˜ requirements.txt ํŒŒ์ผ์„ ์ด์šฉํ•ด ์„ค์น˜๋ฅผ ํ•˜๋Š”๋ฐ, ์ถฉ๋Œ์ด ์ผ์–ด๋‚œ๋‹ค๋ฉด ์—๋Ÿฌ๋ฅผ ์ž˜ ์ฝ๊ณ  ์ตœ๋Œ€ํ•œ ๋ฒ„์ „ ์˜ค๋ฅ˜๊ฐ€ ์•ˆ๋‚˜๊ฒŒ ํ•˜๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.

 

pip install --no-deps numpy==์›ํ•˜๋Š”๋ฒ„์ „

ํ•˜์ง€๋งŒ ์ €๋Š” ๊ทธ๋ ‡๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜๊ฐ€ ์—†์–ด์„œ, ๊ฒฐ๊ตญ ์˜์กด์„ฑ์„ ๋ฌด์‹œํ•˜๊ณ  ๊ฐ•์ œ๋กœ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ด๋ณด๋ฉด์„œ ๊ฒฝํ—˜์ ์œผ๋กœ ๊ฐ€๋Šฅํ•œ ํ™˜๊ฒฝ์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. --no-deps ๊ฐ€ ์˜์กด์„ฑ์„ ๋ฌด์‹œํ•˜๊ณ  ๊ฐ•์ œ๋กœ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๋ผ๋Š” ์˜ต์…˜์ž…๋‹ˆ๋‹ค.

 

pip freeze > requirements.txt

ํ™˜๊ฒฝ์„ ๋ชจ๋‘ ๊ตฌ์ถ•ํ•œ ๋‹ค์Œ์—๋Š” ํ˜„์žฌ ์„ค์น˜๋œ ์ƒํ™ฉ์„ ๋ช…๋ น์–ด ๊ทธ๋Œ€๋กœ requirements.txt ํŒŒ์ผ์— ์ €์žฅํ•ด ์–ผ๋ ค๋‘ก๋‹ˆ๋‹ค. ํŒŒ์ผ ์ด๋ฆ„์€ ์ž์œ ๋กญ๊ฒŒ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

pip install --no-deps -r requirements.txt

์ดํ›„์— ์ƒˆ๋กœ์šด ํ™˜๊ฒฝ์—์„œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์žฌ์„ค์น˜ํ•ด์•ผํ•  ๋•Œ์—๋Š” ์œ„์˜ ๋ช…๋ น์–ด๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฝ”๋“œ ์ž‘์„ฑ

 

ํŠน์ • ๋ชจ๋ธ์— ํ•„์š”ํ•œ ํŒŒ์ผ์€ directory ๋ณ„๋กœ ๊ตฌ๋ถ„ํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

 

kobert ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡์„ ๊ตฌํ˜„ํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŒŒ์ผ์€ model.chatbot.kobert ํด๋”์—,

kogpt2 ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡์„ ๊ตฌํ˜„ํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŒŒ์ผ์€ model.chatbot.kogpt2 ํด๋”์—,

์ด ๊ธ€์—์„œ๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์•˜์ง€๋งŒ ๊ฐ์ • ๋ถ„๋ฅ˜ํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŒŒ์ผ์€ model.emotion ํด๋”์— ๋„ฃ์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต์„ ์‹œํ‚ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ์„œ

ํผ์งํ•˜๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด ๋ถ„๋ฅ˜ ๋ชจ๋ธ์€ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜จ ๋‹ค์Œ ๋ถ„๋ฅ˜๊ธฐ๋งŒ ์žˆ์œผ๋ฉด ๋˜๊ณ ,

kogpt2๋Š” ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ถ€๋ถ„๋งŒ ์žˆ์œผ๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ด ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

 

์ด ์™ธ์— checkpoint์—๋Š” ์•ž์„œ trainํ•  ๋•Œ ์ €์žฅํ–ˆ๋˜ ๋ชจ๋ธ ํŒŒ์ผ์„ ์ €์žฅํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

data์—๋Š” .txt, .csv ์™€ ๊ฐ™์€ ๋ฌธ์„œ ํŒŒ์ผ์„ ์ €์žฅํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

preprocess ์—๋Š” ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ €์žฅํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

util ์—๋Š” ๊ฐ์ • ๊ด€๋ จ class๋ฅผ ๋งŒ๋“ค์–ด ๋กœ์ง์—์„œ ์ค‘๋ณต๋˜๋Š” ๋ถ€๋ถ„์„ ์ง‘์–ด class์˜ method๋กœ ์ž‘์„ฑํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

์•„๋ž˜์˜ ์ฝ”๋“œ ๋‚ด์šฉ์€ ํ”„๋กœ์ ํŠธ root์— ์žˆ๋Š” app.py์˜ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

import os
from model.chatbot.kogpt2 import chatbot as ch_kogpt2
from model.chatbot.kobert import chatbot as ch_kobert
from model.emotion import service as emotion
from util.emotion import Emotion
from util.depression import Depression
from flask import Flask, request, jsonify
from kss import split_sentences

 

์•ž์„œ directory ๊ตฌ์กฐ๋ฅผ ๋Œ€์ถฉ ์‚ดํŽด๋ณด์•˜๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋–ค ๊ฒƒ์„ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์ธ์ง€ ๋ˆˆ์— ๋„๋Š” ๊ฒƒ์ด ๋งŽ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋งจ ๋งˆ์ง€๋ง‰ ์ค„์€ NLP์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” kss๋ผ๋Š” ํ•œ๊ตญ์–ด ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋ถˆ๋Ÿฌ์™”์Šต๋‹ˆ๋‹ค.

app = Flask(__name__)
Emotion = Emotion()
Depression = Depression()

@app.route('/')
def hello():
    return "deep learning server is running ๐Ÿ’—"

 

ํ†ต์‹ ์ด ๋  ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ์ธ์ง€ ์•Œ๊ธฐ ์œ„ํ•ด root ๊ฒฝ๋กœ ํ˜ธ์ถœํ–ˆ์„ ๋•Œ ๊ฐ„๋‹จํ•œ string์„ ๋ฐ˜ํ™˜ํ•˜๋„๋ก api๋ฅผ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

@app.route('/emotion')
def classifyEmotion():
    sentence = request.args.get("s")
    if sentence is None or len(sentence) == 0:
        return jsonify({
            "emotion_no": 2,
            "emotion": "์ค‘๋ฆฝ"
        })

    result = emotion.predict(sentence)
    print("[*] ๊ฐ์ • ๋ถ„์„ ๊ฒฐ๊ณผ: " + Emotion.to_string(result))
    return jsonify({
        "emotion_no": int(result),
        "emotion": Emotion.to_string(result)
    })

ํ˜ธ์ถœ ๋ฐฉ๋ฒ•: /emotion?s=๋ถ„์„ํ•ด์ฃผ๊ธฐ๋ฅผ ์›ํ•˜๋Š” ๋ฌธ์žฅ

ํ•œ ๋‘ ๋ฌธ์žฅ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ  ์–ด๋–ค ๊ฐ์ •์— ํ•ด๋‹นํ•˜๋Š”์ง€ ๋ถ„๋ฅ˜ํ•ด ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.

request query์˜ ๊ฐ’์ด ๋น„์–ด์žˆ์œผ๋ฉด BadRequest 400์œผ๋กœ ์‘๋‹ตํ–ˆ๋Š”๋ฐ, ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ์ค‘๋ฆฝ์„ ๋‘๋Š” ๊ฒƒ์ด ๋” ๋‚˜์„ ๊ฒƒ ๊ฐ™์•„ ๊ธฐ๋ณธ๊ฐ’์„ ์ง€์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

 

@app.route('/diary')
def classifyEmotionDiary():
    sentence = request.args.get("s")
    if sentence is None or len(sentence) == 0:
        return jsonify({
            "joy": 0,
            "hope": 0,
            "neutrality": 0,
            "anger": 0,
            "sadness": 0,
            "anxiety": 0,
            "tiredness": 0,
            "regret": 0,
            "depression": 0
        })

    predict, dep_predict = predictDiary(sentence)
    return jsonify({
        "joy": predict[Emotion.JOY],
        "hope": predict[Emotion.HOPE],
        "neutrality": predict[Emotion.NEUTRALITY],
        "anger": predict[Emotion.ANGER],
        "sadness": predict[Emotion.SADNESS],
        "anxiety": predict[Emotion.ANXIETY],
        "tiredness": predict[Emotion.TIREDNESS],
        "regret": predict[Emotion.REGRET],
        "depression": dep_predict
    })
    

def predictDiary(s):
    total_cnt = 0.0
    dep_cnt = 0
    predict = [0.0 for _ in range(8)]
    for sent in split_sentences(s):
        total_cnt += 1
        predict[emotion.predict(sent)] += 1
        if emotion.predict_depression(sent) == Depression.DEPRESS:
            dep_cnt += 1

    for i in range(8):
        predict[i] = float("{:.2f}".format(predict[i] / total_cnt))
    dep_cnt = float("{:.2f}".format(dep_cnt/total_cnt))
    return predict, dep_cnt

ํ˜ธ์ถœ ๋ฐฉ๋ฒ•: /diary?s=๋ถ„์„ํ•ด์ฃผ๊ธฐ๋ฅผ ์›ํ•˜๋Š” ์ผ๊ธฐ

์ผ๊ธฐ ํ•œ ๊ฐœ์˜ ๋ชจ๋“  ๋‚ด์šฉ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ  ๊ฐ ๊ฐ์ •์˜ ๋น„์œจ์„ ๊ณ„์‚ฐํ•˜๊ณ  ์šฐ์šธํ•œ ๋ฌธ์žฅ์˜ ๋น„์œจ๋„ ๊ณ„์‚ฐํ•˜์—ฌ ํ•จ๊ป˜ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.

 

@app.route('/chatbot/g')
def reactChatbotV1():
    sentence = request.args.get("s")
    if sentence is None or len(sentence) == 0:
        return jsonify({
            "answer": "๋“ฃ๊ณ  ์žˆ์–ด์š”. ๋” ๋ง์”€ํ•ด์ฃผ์„ธ์š”~ (๋„๋•๋„๋•)"
        })

    answer = ch_kogpt2.predict(sentence)
    return jsonify({
        "answer": answer
    })


@app.route('/chatbot/b')
def reactChatbotV2():
    sentence = request.args.get("s")
    if sentence is None or len(sentence) == 0:
        return jsonify({
            "answer": "๋“ฃ๊ณ  ์žˆ์–ด์š”. ๋” ๋ง์”€ํ•ด์ฃผ์„ธ์š”~ (๋„๋•๋„๋•)"
        })

    answer, category, desc, softmax = ch_kobert.chat(sentence)
    return jsonify({
        "answer": answer
	})

ํ˜ธ์ถœ ๋ฐฉ๋ฒ•: /chatbot/g?s=์ž…๋ ฅ, /chatbot/b?s=์ž…๋ ฅ

๋‹ค๋ฅธ ๋ฒ„์ „์˜ chatbot์€ ๋‹ค๋ฅธ url์„ ์‚ฌ์šฉํ•ด ํ˜ธ์ถœํ•ด์•ผ ํ•˜๋„๋ก ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ž…๋ ฅ์ด ๋น„์—ˆ๋”๋ผ๋„ ๊ธฐ๋ณธ์ ์œผ๋กœ "๋“ฃ๊ณ  ์žˆ์–ด์š”. ๋” ๋ง์”€ํ•ด์ฃผ์„ธ์š”~ (๋„๋•๋„๋•)"์œผ๋กœ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.

 

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 5000)))

ip์™€ port๋ฅผ ์ง€์ •ํ•˜์—ฌ flask ์–ดํ”Œ์„ ์‹คํ–‰์‹œํ‚ค๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

docker image ๋นŒ๋“œ, ์—…๋กœ๋“œ

Dockerfile

dockerfile, .dockerignore

ํ”„๋กœ์ ํŠธ ์ตœ์ƒ๋‹จ ๊ฒฝ๋กœ์— Dockerfile์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

FROM python:3.8.5

WORKDIR /app
COPY . .

RUN pip install --upgrade pip
RUN pip install -r requirements.txt

EXPOSE 5000

CMD python ./app.py

python์„ ์„ค์น˜ํ•˜๊ณ  /app ๋ฐ‘์„ working directory๋กœ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. .dockerignore์— ์ž‘์„ฑ๋œ ํŒŒ์ผ์ด๋‚˜ ํด๋”๋ฅผ ๋ฌด์‹œํ•˜๊ณ  ๋‚˜๋จธ์ง€์˜ ํŒŒ์ผ๋“ค์„ ๋ณต์‚ฌํ•ฉ๋‹ˆ๋‹ค. pip์„ ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•œ ๋’ค requirements.txt์— ์ž‘์„ฑ๋œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. 5000๋ฒˆ ํฌํŠธ๋ฅผ ์™ธ๋ถ€๋กœ ๊ฐœ๋ฐฉํ•  ๊ฒƒ์ด๋ผ๊ณ  ์„ค์ •ํ•œ ๋’ค ์‹คํ–‰ ๋ช…๋ น์–ด๋ฅผ ์ž‘์„ฑํ•ด์ค๋‹ˆ๋‹ค.

.dockerignore

๊ผญ ํ•„์š”ํ•œ ํŒŒ์ผ๋กœ๋งŒ image๋ฅผ buildํ•˜๋„๋ก ํ•˜์—ฌ, docker image๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

.git
.gitignore
.idea
.cache
.DS_Store

__pycache__
Scripts
Lib

*.md
*.cfg

preprocess
kss_example.py
*/__pycache__/*

์›ํ•˜๋Š” ํŒŒ์ผ, ํด๋”๋ช…, ํŠน์ • ํ™•์žฅ์ž๋กœ ๋๋‚˜๋Š” ํŒŒ์ผ ๋“ฑ๋“ฑ ์ž์œ ๋กญ๊ฒŒ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋กœ์ ํŠธ ์ตœ์ƒ๋‹จ ๊ฒฝ๋กœ์— ์œ„์น˜ํ•ด์•ผํ•˜๋ฉฐ, Dockerfile์ด ์‹คํ–‰๋  ๋•Œ ์ž๋™์œผ๋กœ ํ•ด๋‹น ํŒŒ์ผ์„ ์ธ์‹ํ•˜์—ฌ ์ ํžŒ ํŒŒ์ผ์ด๋‚˜ ํด๋”๋Š” ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค.

image build

docker build --tag attiary_model:2.0 .

--tag ์˜ต์…˜์— ์ด๋ฏธ์ง€ ์ด๋ฆ„๊ณผ ํƒœ๊ทธ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฃผ์˜ํ•  ๋ถ€๋ถ„์€ ๋งจ ๋งˆ์ง€๋ง‰์— . ์„ ๋นผ๋จน์ง€ ์•Š๊ณ  ์ž‘์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ์ด ๊ณผ์ •์—์„œ 7~8๋ถ„ ์ •๋„ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.

image push

docker image tag attiary_model:2.0 hoit1302/attiary_model:latest
docker push hoit1302/attiary_model:latest

์ด๋ฏธ์ง€ ํƒœ๊ทธ์˜ ์ด๋ฆ„์„ dockerhub์— ์žˆ๋Š” username/reponame:์›ํ•˜๋Š”ํƒœ๊ทธ ๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  pushํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.์ด ๊ณผ์ •์—์„œ ๋กœ๊ทธ์ธ์„ ์š”๊ตฌํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๋„คํŠธ์›Œํฌ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์—…๋กœ๋“œ์— ์†Œ์š”๋˜๋Š” ์‹œ๊ฐ„์€ ๋‹ค์–‘ํ•ฉ๋‹ˆ๋‹ค. ์ข‹์„ ๋•Œ๋Š” 20๋ถ„ ์ •๋„์—์„œ ์—…๋กœ๋“œ๊ฐ€ ์™„๋ฃŒ๋  ๋•Œ๋„ ์žˆ๊ณ  ์•ˆ์ข‹์„ ๋•Œ๋Š” 4์‹œ๊ฐ„์ด ์†Œ์š”๋œ ์ ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

pushํ•œ repository์—์„œ ์ž˜ push ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํด๋ผ์šฐ๋“œ ์„œ๋ฒ„๋กœ ๋ฐฐํฌ

๋ฌธ์ œ์ 

๋ฐฑ์—”๋“œ ๊ฐœ๋ฐœ ํ›„์— ์„œ๋ฒ„ AWS๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„œ๋ฒ„๋ฅผ ๋ช‡ ๋ฒˆ ์šด์˜ํ•ด๋ณด์•˜๋˜ ๊ฒฝํ—˜์— ๋น„์ถ”์–ด ์†์‰ฝ๊ฒŒ ํ”„๋ฆฌ ํ‹ฐ์–ด์˜ ec2 ์„œ๋ฒ„์— docker container๋ฅผ ์˜ฌ๋ ธ๋”๋‹ˆ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ตฌ๋™์‹œํ‚ค๋‹ค๊ฐ€ Killed๋ผ๋Š” ๊ฐ•๋ ฌํ•œ ๋ฌธ๊ตฌ๋ฅผ ๋‚จ๊ธฐ๊ณ  ์ฃฝ์–ด๋ฒ„๋ ธ์Šต๋‹ˆ๋‹ค. ์‹ฌ์ง€์–ด ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์—์„œ ์ „ํ˜€ ์‘๋‹ต์„ ๋ฐ›์„ ์ˆ˜ ์—†์–ด์„œ ์žฌ๋ถ€ํŒ… ์‹œ์ผœ์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

memory๊ฐ€ ์ ˆ๋Œ€์ ์œผ๋กœ ๋ถ€์กฑํ–ˆ๋˜ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ”„๋ฆฌํ‹ฐ์–ด์—์„œ๋Š” ๋””์Šคํฌ ๊ณต๊ฐ„์„ memory๋กœ ์“ธ ์ˆ˜ ์žˆ๋„๋ก swap ์‹œ์ผœ๋ดค์ž ์ตœ๋Œ€ 2GB์˜€๊ธฐ ๋•Œ๋ฌธ์— ๋”ฅ๋Ÿฌ๋‹ ์„œ๋ฒ„๋ฅผ ๊ตฌ๋™์‹œํ‚ค๊ธฐ์—๋Š” ์—ญ๋ถ€์กฑ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ํ˜„์‹ค์ ์ธ ๋ฒฝ์— ๋ถ€๋”ชํ˜”์„ ๋•Œ, ํ•™๊ต ์ธก์—์„œ 4์›” ๋ง์— tencent cloud๋ฅผ ์ง€์›ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

ํ…์„ผํŠธ ํด๋ผ์šฐ๋“œ

ํ™ˆํŽ˜์ด์ง€ ๋งํฌ

GPU based์˜ instance๋ฅผ ๋„์šฐ๊ณ  ์ž‘์—…ํ–ˆ์Šต๋‹ˆ๋‹ค.

deploy.sh

๊ฐ„๋‹จํžˆ docker ๋ช…๋ น์–ด๋ฅผ ๋ชจ์•„๋‘” deploy shell script๋ฅผ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

echo "[*] ์‹คํ–‰๋˜๊ณ  ์žˆ๋Š” ์ปจํ…Œ์ด๋„ˆ ์ค‘์ง€"
sudo docker stop atti_model

echo "[*] ์ค‘์ง€๋œ ์ปจํ…Œ์ด๋„ˆ ์‚ญ์ œ"
sudo docker rm atti_model

echo "[*] ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€ pull ๋ฐ›๊ธฐ"
sudo docker pull hoit1302/attiary_model:latest

echo "[*] ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€ ํ™•์ธํ•˜๊ธฐ"
sudo docker images

echo "[*] ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€ ๋ฐฑ๊ทธ๋ผ์šด๋“œ๋กœ ์‹คํ–‰ํ•˜๊ธฐ"
sudo docker run --name atti_model -d -p 5000:5000 hoit1302/attiary_model:latest

echo "[*] ์‹คํ–‰๋˜๊ณ  ์žˆ์ง€ ์•Š๋Š” ์ด๋ฏธ์ง€ ์‚ญ์ œํ•˜๊ธฐ"
sudo docker image prune -a -f

echo "[*] ์‚ญ์ œ๋œ ์ด๋ฏธ์ง€ ํ™•์ธํ•˜๊ธฐ"
sudo docker images

atti_model ๋ถ€๋ถ„์—๋Š” ์›ํ•˜๋Š” container ์ด๋ฆ„์„ ์ ๊ณ 

hoit1302/attiary_model์€ ์ด๋ฏธ์ง€๋ฅผ ์˜ฌ๋ฆฐ docker์˜ ๊ณ„์ •๊ณผ repo๋ฅผ ์ง€์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

run ๋ช…๋ นํ•  ๋•Œ -d ์˜ต์…˜์ด background๋กœ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์˜ต์…˜์ž…๋‹ˆ๋‹ค.

ํ•ด๋‹น ์˜ต์…˜์„ ๋นผ๊ณ  ์‹คํ–‰์‹œํ‚ค๋ฉด ์ œ ์ปดํ“จํ„ฐ์™€ ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์˜ ์—ฐ๊ฒฐ ์„ธ์…˜์ด ๋Š๊ฒผ์„ ๋•Œ ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์ค‘๋‹จ๋ฉ๋‹ˆ๋‹ค.

๋กœ๊ทธ ๋ณด๊ธฐ

๋„์ปค ์ปจํ…Œ์ด๋„ˆ์˜ ๋กœ๊ทธ๋ฅผ ๋ณด๋Š” ๋ช…๋ น์–ด์ž…๋‹ˆ๋‹ค.

docker logs atti_model

์•„๋ž˜ 10์ค„์˜ ๋กœ๊ทธ ๋ณด๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. --tail ์˜ต์…˜์— ์›ํ•˜๋Š” ์ˆซ์ž๋ฅผ ๋„ฃ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

docker logs --tail 10 atti_model

docker image hoit1302/attiary_model:2.1์„ gpu ์„œ๋ฒ„์—์„œ ์‹คํ–‰์‹œํ‚จ ๋กœ๊ทธ์ž…๋‹ˆ๋‹ค.

hoit1302/attiary_model:2.1 log 1
hoit1302/attiary_model:2.1 log 2

5์ผ ์ „์— ์˜ฌ๋ฆฐ ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์ž˜ ์‹คํ–‰๋˜๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


[4] ํ™œ์šฉ ํ˜„ํ™ฉ

์ €๋Š” ์กธ์—…ํ”„๋กœ์ ํŠธ๋กœ ์‹ฌ๋ฆฌ ์ผ€์–ด ๋ชฉ์ ์˜ ์ผ๊ธฐ ์–ดํ”Œ, "์•„๋ ์–ด๋ฆฌ"์„ ๊ฐœ๋ฐœํ–ˆ์Šต๋‹ˆ๋‹ค.

1. ์ฑ—๋ด‡

์ฑ—๋ด‡์€ ์ผ๊ธฐ๋ฅผ ์ž‘์„ฑํ•˜๊ณ  ์žˆ์„ ๋•Œ ์—”ํ„ฐํ‚ค๋ฅผ ๋ˆ„๋ฅด๋ฉด ์ค„๋ฐ”๊ฟˆ์ด ๋˜๊ณ  ์ž…๋ ฅ๋œ ๋‚ด์šฉ์— ๋Œ€ํ•œ ๋ฐ˜์‘์„ ์„œ๋ฒ„์— ์š”์ฒญํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋ฉด ๋…ธ๋ž€ ๋ณ‘์•„๋ฆฌ์˜ ์•„๋ ๋ผ๋Š” ์นœ๊ตฌ๊ฐ€ ์‚ฌ์šฉ์ž๊ฐ€ ์ž‘์„ฑํ•œ ๋‚ด์šฉ์— ๋Œ€ํ•ด ๊ณต๊ฐํ•˜๊ฑฐ๋‚˜ ์œ„๋กœํ•ด์ฃผ๋Š” ๋ฐ˜์‘์„ ๋ณด์ž…๋‹ˆ๋‹ค.

์˜ค๋Š˜์€ ๋„ˆ๋ฌด ์Šฌํ”ˆ ๋‚ ์ด์—ˆ์–ด.
์นœ๊ตฌ๊ฐ€ ๋‹น์ผ์— ์•ฝ์†์„ ์ทจ์†Œํ–ˆ๊ฑฐ๋“ .
์‚ฌ์‹ค ์š”์ฆ˜๋“ค์–ด ์—ฐ๋ฝ์ด ์ž˜ ์•ˆ๋œ๋‹ค๋Š” ๋Š๋‚Œ์ด ์žˆ์—ˆ๊ฑฐ๋“ .
์†์ƒํ•ด... ๋‚ด ์ž˜๋ชป๊ฐ™๊ณ ...
์˜ค๋Š˜์€ ์ผ์ฐ ์ž˜๊ฑฐ์•ผ.

๊ฐ™์€ ์ผ๊ธฐ ๋‚ด์šฉ์— ๋Œ€ํ•œ ์ฑ—๋ด‡์˜ ์‘๋‹ต์„ ๋น„๊ตํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋จผ์ € kobert ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค.

kobert ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡

์ •๋ง ์œ„๋กœ๋ฅผ ์ž˜ํ•ด์ฃผ๊ณ  ์žˆ๋Š” ๋ชจ์Šต์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ kobert ์ฑ—๋ด‡์˜ ์ด๋ฆ„์€ ์œ„๋กœํ˜• ์•„๋ ๋กœ ์ง€์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋œปํ•œ ์„ฑ๊ฒฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ๋ถ€์ •์ ์ธ ๊ฐ์ •์ด ๋Š๊ปด์งˆ ๋•Œ ์‚ฌ์šฉํ•˜๋ฉด ์ข‹๋‹ค๊ณ  ์‚ฌ์šฉ์ž์—๊ฒŒ ์–ดํ”Œ ๋‚ด์—์„œ ์•ˆ๋‚ดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

kogpt2 ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡

kogpt2 ์ฑ—๋ด‡์€ ์–ด๋–ค ๋ฐ˜์‘์„ ํ•˜๋Š”์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ „๋ฐ˜์ ์œผ๋กœ ๋‹ต๋ณ€์ด ์งง๋‹ค๋Š” ๊ฒƒ์„ ๋Š๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์กฐ๊ธˆ ๋” ์ผ์ƒ์ ์ธ ๋Œ€ํ™”๋ฅผ ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋Š๊ปด์ง‘๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ kogpt2 ์ฑ—๋ด‡์˜ ์ด๋ฆ„์€ ๊ณต๊ฐํ˜• ์•„๋ ๋กœ ์ง€์—ˆ์Šต๋‹ˆ๋‹ค.
๋‚ด์šฉ์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ๋ฐ˜์‘์„ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฐœ๋ž„ํ•œ ์„ฑ๊ฒฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ์‚ฌ์šฉ์ž์—๊ฒŒ ์•ˆ๋‚ดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๋‘ ์ฑ—๋ด‡ ๋ชจ๋‘ ์‘๋‹ต ์†๋„๋Š” ๋น ๋ฅธ ํŽธ์ด๊ณ  ์‚ฌ์šฉ์ž๋Š” ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์€ ์ฑ—๋ด‡์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ์–ดํ”Œ์„ ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

2. ์ค‘๋ฆฝ/์Šฌํ””/๋ถ„๋…ธ/๋ถˆ์•ˆ/ํ”ผ๊ณค/ํ›„ํšŒ ๋ถ„๋ฅ˜

์•„๋ ์–ด๋ฆฌ๋Š” ์ผ๊ธฐ๋ฅผ ์ž‘์„ฑํ•˜๊ณ  ์žˆ์„ ๋•Œ ์—”ํ„ฐํ‚ค๋ฅผ ๋ˆ„๋ฅด๋ฉด ์ค„๋ฐ”๊ฟˆ์ด ๋˜๊ณ  ์ฑ—๋ด‡์ด ๋ฐ˜์‘์„ ํ•ด์ฃผ๊ธฐ๋„ ํ•˜๋Š”๋ฐ, ์ด ๋•Œ ๊ฐ์ง€๋œ ๊ฐ์ •๊ณผ ์•Œ๋งž๋Š” ๋ฐฐ๊ฒฝ์Œ์•…์œผ๋กœ ๋ณ€๊ฒฝํ•ด ๋“ค๋ ค์ฃผ๋Š” ๊ธฐ๋Šฅ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ธฐ๋Šฅ (์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฐ์ •์„ ๊ฐ์ง€ํ•ด ๋ฐฐ๊ฒฝ ์Œ์•…์„ ๋ณ€๊ฒฝํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ)๋„ ์‚ฌ์šฉ์ž๊ฐ€ ์‚ฌ์šฉ์„ ์›ํ•˜์ง€ ์•Š์œผ๋ฉด ๋„๋„๋ก ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋งŽ์ด ๊ฐ์ง€๋œ ๊ฐ์ • ์ˆœ์œผ๋กœ ๋ณด์—ฌ์ฃผ๊ณ  ์ฒซ๋ฒˆ์งธ ๊ฐ์ •์ด ๋Œ€ํ‘œ ๊ฐ์ • ์ด๋ฏธ์ง€์™€ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

๊ฐ ๊ฐ์ • ๋ณ„๋กœ 3๋‹จ๊ณ„์”ฉ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ ๋‘๋ฒˆ์งธ์˜ ๋Œ€ํ‘œ ๊ฐ์ •์€ ๊ธฐ์จ์ด์ง€๋งŒ ์ •๋„์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ ๊ฐ€ ํ•˜๊ณ  ์‹ถ์€ ๋ง์ด ์žˆ๋Œ€์š”! ๋ถ€๋ถ„๋„ ๊ฐ์ง€๋œ ๋Œ€ํ‘œ ๊ฐ์ •์— ๋”ฐ๋ผ ๋‚ด์šฉ์ด ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

 

์ด๋Ÿฐ ๊ธฐ๋Šฅ์— ๊ฐ์ • ๋ถ„๋ฅ˜๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์ฒ˜์Œ์—๋Š” ํ•œ๋ฒˆ์— ๊ธฐ์จ/ํฌ๋ง/์ค‘๋ฆฝ/์Šฌํ””/๋ถ„๋…ธ/๋ถˆ์•ˆ/ํ”ผ๊ณค/ํ›„ํšŒ 8๊ฐ€์ง€ ๊ฐ์ •์„ ๋ถ„๋ฅ˜ํ•ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ์ œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•œ ๋‹ฌ ๋™์•ˆ ๊ฑฐ์˜ ๋งค์ผ ๋ฐค๋งˆ๋‹ค ์˜จ๋ผ์ธ ํšŒ์˜๋กœ ๋ชจ์—ฌ ์ง์ ‘ ๋งŒ ๊ฐœ์˜ ๋ฌธ์žฅ์„ ๊ฐ์ •์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ์ž‘์—…์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•ด๋‹น ๋ชจ๋ธ์€ ์ •ํ™•๋„๊ฐ€ 80%์— ๊ทธ์ณค์Šต๋‹ˆ๋‹ค. ํ‹€๋ฆด ํ™•๋ฅ ์ด 20% ์ •๋„๋‚˜ ๋˜๋Š” ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜๋Š” ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๋ฐฉ๋ฒ•์„ ๋ฐ”๊พธ์–ด ์ด ๊ธ€์—์„œ ๊ธฐ์ˆ ํ•œ KoBERT ์ƒํ™ฉ ํŒ๋‹จ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. 359๊ฐœ์˜ ๊ฐ ํด๋ž˜์Šค๋ฅผ ํŠน์ • ๊ฐ์ •๊ณผ 1:1๋กœ ๋Œ€์‘์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต์‹œํ‚จ ๋ฐ์ดํ„ฐ๋Š” ์ƒ๋‹ด ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ถ€์ •์ ์ธ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ๋ฅผ ์ด๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋จผ์ € ๊ธ์ •๊ณผ ์ค‘๋ฆฝ ๋ถ€์ •์œผ๋กœ 3์ค‘ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ๊ฑฐ์นœ ํ›„ ๋ถ€์ •์œผ๋กœ ๋ถ„๋ฅ˜๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์‹œ KoBERT ์ƒํ™ฉ ํŒ๋‹จ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ๊ฑฐ์ณ 6๊ฐ€์ง€ ๊ฐ์ •(์ค‘๋ฆฝ/์Šฌํ””/๋ถ„๋…ธ/๋ถˆ์•ˆ/ํ”ผ๊ณค/ํ›„ํšŒ) ์ค‘ ํ•œ ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.

3. ์šฐ์šธ ๋ถ„๋ฅ˜

์ผ๊ธฐ๋ฅผ ๋‹ค ์“ฐ๊ณ  ๋‚˜๋ฉด ์บ˜๋ฆฐ๋” ํ™”๋ฉด์—์„œ ๊ฐ ๋‹ฌ์˜ ์ข…ํ•ฉ ์šฐ์šธ์ง€์ˆ˜์™€ ํ–‰๋ณต์ง€์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ด ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ข…ํ•ฉ ์šฐ์šธ์ง€์ˆ˜์™€ ํ–‰๋ณต์ง€์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ถ€๋ถ„์„ ํด๋ฆญํ•˜๋ฉด ํ•ด๋‹นํ•˜๋Š” ๋‹ฌ์˜ ์ผ๊ธฐ๋ฅผ ๋‹ค๋ฅธ ํ˜•์‹์œผ๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋Œ€ํ‘œ ๊ฐ์ •๋“ค์€ ํŒŒ์ด์ฐจํŠธ๋กœ ๋ณด์—ฌ์ฃผ๊ณ , ํ–‰๋ณต ์ง€์ˆ˜์™€ ์šฐ์šธ ์ง€์ˆ˜์˜ ์ถ”์ด๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ข…ํ•ฉ ํ–‰๋ณต์ง€์ˆ˜๋Š” ํ–‰๋ณต/ํฌ๋ง์ด ๋‚˜ํƒ€๋‚œ ๋น„์œจ๋กœ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‚ฐ์ˆ  ํ‰๊ท ์„ ๋ƒ…๋‹ˆ๋‹ค.์ข…ํ•ฉ ์šฐ์šธ์ง€์ˆ˜๋Š” ์Šฌํ””/๋ถ„๋…ธ/๋ถˆ์•ˆ/ํ”ผ๊ณค/ํ›„ํšŒ ๊ฐ์ •๊ณผ ๋”๋ถˆ์–ด ์šฐ์šธ์„ ์ถ”๊ฐ€๋กœ ๊ฐ์ง€ํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ๋‘ก๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์šฐ์šธ๊ณผ ๋น„์šฐ์šธ์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ์ด ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค.

์šฐ์šธ๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ์ƒํ™ฉ

์ €ํฌ ํŒ€์€ ์•ž์„œ ์„ค๋ช…ํ•œ KoBERT ๊ธฐ๋ฐ˜ 359๊ฐ€์ง€์˜ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ๋˜ ํ™œ์šฉํ•˜์—ฌ ์šฐ์šธ/๋น„์šฐ์šธ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์†์‰ฝ๊ฒŒ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.โ€‹

ํŒ€์›๋“ค๊ณผ ํšŒ์˜ํ•˜์—ฌ ์œ„์˜ ์ƒํ™ฉ๋“ค๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ์„ ๋•Œ ์šฐ์šธ๋กœ ํŒ๋‹จ๋˜๋„๋ก ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.

์šฐ์šธ ์ง€์ˆ˜๊ฐ€ ํŠน์ • ๊ฐ’ ์ด์ƒ์œผ๋กœ ์ง€์†๋˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์ƒ๋‹ด์„ ๊ถŒ์œ ํ•˜๊ฑฐ๋‚˜ ์ „๋ฌธ์˜์™€ ์—ฐ๊ฒฐ์ง€์–ด์ฃผ๋Š” ๋ฐฉ์‹์œผ๋กœ ์„œ๋น„์Šค๋ฅผ ์กฐ๊ธˆ ๋” ํ™•์žฅ์‹œํ‚ฌ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

[5] ๋Š๋‚€ ์ 

๊ธฐ์ˆ ๊ณผ๋Š” ๊ด€๋ จ์ด ์—†๊ณ , ๊ฐœ์ธ์ ์œผ๋กœ ๋Š๋‚€ ์ ์„ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๋”๋ณด๊ธฐ

1. HW์—๋„ ๊ด€์‹ฌ์„ ๊ฐ€์ง€์ž

ํ•™์Šต์„ ์‹œํ‚ฌ ๋•Œ์—๋„, classifier๋งŒ ์žˆ๋Š” ์›น ์„œ๋ฒ„๋ฅผ ์‹คํ–‰์‹œํ‚ฌ ๋•Œ์—๋„ ํ•˜๋“œ์›จ์–ด์˜ ํ•œ๊ณ„์— ์ฐธ ๋งŽ์ด ๋ถ€๋”ชํ˜”๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ๋ฅผ ๋ณด๊ณ  ๋˜๋ด๋„ ์ด์ œ ๋” ์ด์ƒ ์˜ค๋ฅ˜๊ฐ€ ์—†์–ด๋ณด์ด๋Š”๋ฐ ๋Œ์•„๊ฐ€์ง€ ์•Š๋Š” ์ด์œ ๋Š” ๊ทผ๋ณธ์ ์œผ๋กœ ํ•˜๋“œ์›จ์–ด ์ž์›์ด ๋ถ€์กฑํ•ด์„œ ์˜€๋˜ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜์Šต๋‹ˆ๋‹ค. software ๊ฐœ๋ฐœ์ž์ด์ง€๋งŒ hardware์—๋„ ๊ด€์‹ฌ์„ ๊ฐ€์ง€๊ณ  ์ฃผ๋ชฉํ•ด์•ผํ•œ๋‹ค๊ณ  ํ–ˆ๋˜ ๊ต์ˆ˜๋‹˜์˜ ๋ง์”€์ด ๋งŽ์ด ๋– ์˜ฌ๋ž์Šต๋‹ˆ๋‹ค.

 

2. ํ•„์š”์—†๋Š” ๊ณต๋ถ€๋Š” ์—†๊ตฌ๋‚˜

GPU๊ฐ€ ์—ฌ๋Ÿฌ ๋Œ€์ธ ํ™˜๊ฒฝ์—์„œ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ฝ”๋“œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ถ€๋ถ„๋„ ์‹ ๊ธฐํ–ˆ์Šต๋‹ˆ๋‹ค. ํด๋ผ์šฐ๋“œ ์ˆ˜์—… ์‹œ๊ฐ„์— ๋ฐฐ์› ๋Š”๋ฐ ์ด๋ ‡๊ฒŒ ๋นจ๋ฆฌ ๋ณ‘๋ ฌ ์ปดํ“จํŒ… ์ง€์‹์„ ํ™œ์šฉํ•ด ์‹ค์ „์—์„œ ์˜ค๋ฅ˜๋ฅผ ํ’€์–ด๋‚ผ๊ฑฐ๋ผ๊ณ ๋Š” ์ƒ๊ฐ๋„ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

 

3. ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ๋„ ์˜คํ”ˆ ๋งˆ์ธ๋“œ๋กœ

๋”ฅ๋Ÿฌ๋‹/์ธ๊ณต์ง€๋Šฅ์— ๋Œ€ํ•œ ๊ธฐ๋ฐ˜ ์ง€์‹์ด ์ „ํ˜€ ์—†์„ ๋•Œ ํ”„๋กœ์ ํŠธ๋ฅผ ์‹œ์ž‘ํ–ˆ๊ณ  ์ฃผ์–ธ์–ด๋„ python์ด ์•„๋‹ˆ๋ผ์„œ ์ •๋ง ๊ฐ„๋‹จํ•œ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๊ฑฐ๋‚˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์˜์กด์„ฑ ๋ฌธ์ œ์—๋„ ์ •๋ง ๋งŽ์ด ํž˜๋“ค์–ด ํ–ˆ๋˜ ๊ธฐ์–ต์ด ๋‚ฉ๋‹ˆ๋‹ค. ๋ง๋กœ๋งŒ ๋“ค์–ด๋ณด์•˜๋˜ conda, jupyter ๋ชจ๋‘ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋‚˜๋งˆ ๋‹ค๋ค„๋ณด๊ฒŒ ๋˜์—ˆ๊ณ  ์ƒˆ๋กญ๊ฒŒ ์ ‘ํ•œ ๊ธฐ์ˆ ์ด ์ •๋ง ๋งŽ์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ด ๊ธ€์—์„œ๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์•˜์ง€๋งŒ ์•ˆ๋“œ๋กœ์ด๋“œ ๊ฐœ๋ฐœ์—๋„ ๊ฝค ์ฐธ์—ฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์•ˆ๋“œ๋กœ์ด๋“œ ์—ญ์‹œ ์ฒ˜์Œ์œผ๋กœ ๊ฐœ๋ฐœํ•ด๋ณด๊ฒŒ ๋˜์—ˆ๊ณ , ์–ธ์–ด๋„ ๋„ˆ๋ฌด๋‚˜ ์ƒ์†Œํ•œ ์ฝ”ํ‹€๋ฆฐ์œผ๋กœ ์‹œ์ž‘ํ•˜๊ฒŒ ๋˜์—ˆ๋Š”๋ฐ ๋„์„œ๊ด€์—์„œ ๋นŒ๋ฆฐ ์ฑ… ๋ช‡ ๊ถŒ์„ ๋ฐœ์ทŒํ•ด ๋ณด๊ณ  ๊ตฌ๊ธ€๋งํ•˜๋ฉด์„œ ๋˜ ๊ฐœ๋ฐœ์ด ๋˜์—ˆ๋„ค์š”. 

์ƒˆ๋กœ์šด ๋ถ„์•ผ์— ๋Œ€ํ•œ ๋‘๋ ค์›€์€ ํ•ญ์ƒ ์žˆ๋Š”๋ฐ ํ”„๋กœ์ ํŠธ๊ฐ€ ๋๋‚  ๋ฌด๋ ต์ด๋ฉด ๋งŽ์ด ์„ฑ์žฅํ•œ ๊ฒƒ ๊ฐ™์•„์„œ ๋„ˆ๋ฌด ๋ฟŒ๋“ฏํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ์„ฑ์ทจ๊ฐ์ด ์ข‹์•„์„œ ๊ฐœ๋ฐœ์ž๋ฅผ ๊ฟˆ๊พธ๊ฒŒ ๋˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

4. ์ข‹์€ ํŒ€์›...! 

ํ”„๋กœ์ ํŠธ๊ฐ€ ์ˆœํƒ„ํ•˜๊ฒŒ ์ด์–ด์ ธ์™”๊ณ  ๊ฐœ๋ฐœ๋„ ์™„์„ฑ์ด  ํ•œ ๊ฒƒ์—๋Š” ํŒ€์›๋“ค์„ ์ž˜ ๋งŒ๋‚œ ์˜ํ–ฅ์ด ํฐ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ฐ์ž ๋งก์€ ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ ๊ณ ํ†ต์„ ํ† ๋กœํ•˜๊ธด ํ–ˆ์ง€๋งŒ ๊ฒฐ๊ตญ ๊ฐ์ž๊ฐ€ ๋งก์€ ๋ถ€๋ถ„์€ ๋ฌต๋ฌตํžˆ ํ•ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๊ฐœ๋ฐœํ•˜๋‹ค๋ณด๋ฉด ๊ธฐ๋Šฅ์„ ์—†์• ๊ฑฐ๋‚˜ ์ถ•์†Œํ•˜๊ฒŒ ๋˜๋Š” ๊ฒŒ ๋‹ค๋ฐ˜์‚ฌ์ธ๋ฐ ์ˆ˜์ •ํ•œ ๋ถ€๋ถ„์ด ์—†๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ฐธ ๋ชจ๋‘๊ฐ€ ๋ฉ‹์ ธ์š”!!(ํŒ€๋ช… NICER๐Ÿฅฐ) ์ด๋Ÿฐ ์กฐ์ง์— ๋“ค์–ด๊ฐ€๊ธฐ ์œ„ํ•ด์„œ ๋งŽ์€ ๋…ธ๋ ฅ์„ ๋“ค์ผ ํ•„์š”๊ฐ€ ์žˆ๋‹ค๋Š” ๋™๊ธฐ๋ถ€์—ฌ๋„ ๋งŽ์ด ์–ป์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.