Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
tensorflow
GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/tutorials/structured_data/imbalanced_data.ipynb
25118 views
Kernel: Python 3
#@title Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.

๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ํ•œ ํด๋ž˜์Šค์˜ ์˜ˆ์‹œ์˜ ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ํด๋ž˜์Šค๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ ๋งค์šฐ ๋ถˆ๊ท ํ˜•์ ์ธ ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. Kaggle์—์„œ ํ˜ธ์ŠคํŒ…๋˜๋Š” ์‹ ์šฉ ์นด๋“œ ๋ถ€์ • ํ–‰์œ„ ํƒ์ง€ ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์—…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด 284,807๊ฑด์˜ ๊ฑฐ๋ž˜์—์„œ 492๊ฑด์˜ ๋ถ€์ • ๊ฑฐ๋ž˜๋ฅผ ํƒ์ง€ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. Keras๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ๋ฐ ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜๋ฅผ ์ •์˜ํ•˜์—ฌ ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ์—์„œ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด ํŠœํ† ๋ฆฌ์–ผ์—๋Š” ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์™„์ „ํ•œ ์ฝ”๋“œ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • Pandas๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ CSV ํŒŒ์ผ ๋กœ๋“œ.

  • ํ•™์Šต, ๊ฒ€์ฆ ๋ฐ ํ…Œ์ŠคํŠธ์„ธํŠธ ์ž‘์„ฑ.

  • Keras๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ •์˜ํ•˜๊ณ  ํ•™์Šต(ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜ ์„ค์ • ํฌํ•จ)

  • ๋‹ค์–‘ํ•œ ์ธก์ • ๊ธฐ์ค€(์ •๋ฐ€๋„ ๋ฐ ์žฌํ˜„์œจ ํฌํ•จ)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ‰๊ฐ€

  • ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์ผ๋ฐ˜์ ์ธ ๊ธฐ์ˆ  ์‚ฌ์šฉ

    • ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜

    • ์˜ค๋ฒ„์ƒ˜ํ”Œ๋ง

์„ค์ •

import tensorflow as tf from tensorflow import keras import os import tempfile import matplotlib as mpl import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns import sklearn from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler
mpl.rcParams['figure.figsize'] = (12, 10) colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ํƒ์ƒ‰

Kaggle ์‹ ์šฉ ์นด๋“œ ๋ถ€์ • ํ–‰์œ„ ๋ฐ์ดํ„ฐ ์„ธํŠธ

Pandas๋Š” ๊ตฌ์กฐ์  ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ์œ ์šฉํ•œ ์—ฌ๋Ÿฌ ์œ ํ‹ธ๋ฆฌํ‹ฐ๊ฐ€ ํฌํ•จ๋œ Python ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. CSV๋ฅผ Pandas ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ๋‹ค์šด๋กœ๋“œํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ : ์ด ๋ฐ์ดํ„ฐ์„ธํŠธ๋Š” ๋น…๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹ ๋ฐ ๋ถ€์ • ํ–‰์œ„ ๊ฐ์ง€์— ๋Œ€ํ•œ Worldline๊ณผ ULB(Universitรฉ Libre de Bruxelles) Machine Learning Group์˜ ์—ฐ๊ตฌ ํ˜‘์—…์„ ํ†ตํ•ด ์ˆ˜์ง‘ ๋ฐ ๋ถ„์„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ด€๋ จ ์ฃผ์ œ์— ๊ด€ํ•œ ํ˜„์žฌ ๋ฐ ๊ณผ๊ฑฐ ํ”„๋กœ์ ํŠธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์—ฌ๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜๊ฑฐ๋‚˜ DefeatFraud ํ”„๋กœ์ ํŠธ ํŽ˜์ด์ง€์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

file = tf.keras.utils raw_df = pd.read_csv('https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv') raw_df.head()
raw_df[['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V26', 'V27', 'V28', 'Amount', 'Class']].describe()

ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” ๋ถˆ๊ท ํ˜• ๊ฒ€์‚ฌ

๋ฐ์ดํ„ฐ์„ธํŠธ ๋ถˆ๊ท ํ˜•์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.:

neg, pos = np.bincount(raw_df['Class']) total = neg + pos print('Examples:\n Total: {}\n Positive: {} ({:.2f}% of total)\n'.format( total, pos, 100 * pos / total))

์ด๋ฅผ ํ†ตํ•ด ์–‘์„ฑ ์ƒ˜ํ”Œ ์ผ๋ถ€๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ •๋ฆฌ, ๋ถ„ํ•  ๋ฐ ์ •๊ทœํ™”

์›์‹œ ๋ฐ์ดํ„ฐ์—๋Š” ๋ช‡ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ € Time ๋ฐ Amount ์—ด์ด ๋งค์šฐ ๊ฐ€๋ณ€์ ์ด๋ฏ€๋กœ ์ง์ ‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. (์˜๋ฏธ๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ) Time ์—ด์„ ์‚ญ์ œํ•˜๊ณ  Amount ์—ด์˜ ๋กœ๊ทธ๋ฅผ ๊ฐ€์ ธ์™€ ๋ฒ”์œ„๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค.

cleaned_df = raw_df.copy() # You don't want the `Time` column. cleaned_df.pop('Time') # The `Amount` column covers a huge range. Convert to log-space. eps = 0.001 # 0 => 0.1ยข cleaned_df['Log Amount'] = np.log(cleaned_df.pop('Amount')+eps)

๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ํ•™์Šต, ๊ฒ€์ฆ ๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค. ๊ฒ€์ฆ ์„ธํŠธ๋Š” ๋ชจ๋ธ ํ”ผํŒ… ์ค‘์— ์‚ฌ์šฉ๋˜์–ด ์†์‹ค ๋ฐ ๋ฉ”ํŠธ๋ฆญ์„ ํ‰๊ฐ€ํ•˜์ง€๋งŒ ํ•ด๋‹น ๋ชจ๋ธ์€ ์ด ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธ ์„ธํŠธ๋Š” ํ›ˆ๋ จ ๋‹จ๊ณ„์—์„œ๋Š” ์ „ํ˜€ ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฉฐ ๋งˆ์ง€๋ง‰์—๋งŒ ์‚ฌ์šฉ๋˜์–ด ๋ชจ๋ธ์ด ์ƒˆ ๋ฐ์ดํ„ฐ๋กœ ์ผ๋ฐ˜ํ™”๋˜๋Š” ์ •๋„๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•˜์—ฌ ๊ณผ๋Œ€์ ํ•ฉ์ด ํฌ๊ฒŒ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ ํŠนํžˆ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

# Use a utility from sklearn to split and shuffle your dataset. train_df, test_df = train_test_split(cleaned_df, test_size=0.2) train_df, val_df = train_test_split(train_df, test_size=0.2) # Form np arrays of labels and features. train_labels = np.array(train_df.pop('Class')) bool_train_labels = train_labels != 0 val_labels = np.array(val_df.pop('Class')) test_labels = np.array(test_df.pop('Class')) train_features = np.array(train_df) val_features = np.array(val_df) test_features = np.array(test_df)

sklearn StandardScaler๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ํŠน์„ฑ์„ ์ •๊ทœํ™”ํ•˜๋ฉด ํ‰๊ท ์€ 0์œผ๋กœ, ํ‘œ์ค€ ํŽธ์ฐจ๋Š” 1๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ : StandardScaler๋Š” ๋ชจ๋ธ์ด ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๋˜๋Š” ํ…Œ์ŠคํŠธ ์„ธํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด train_features๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

scaler = StandardScaler() train_features = scaler.fit_transform(train_features) val_features = scaler.transform(val_features) test_features = scaler.transform(test_features) train_features = np.clip(train_features, -5, 5) val_features = np.clip(val_features, -5, 5) test_features = np.clip(test_features, -5, 5) print('Training labels shape:', train_labels.shape) print('Validation labels shape:', val_labels.shape) print('Test labels shape:', test_labels.shape) print('Training features shape:', train_features.shape) print('Validation features shape:', val_features.shape) print('Test features shape:', test_features.shape)

์ฃผ์˜: ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋ ค๋ฉด ์ „์ฒ˜๋ฆฌ ๊ณ„์‚ฐ์„ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ ˆ์ด์–ด๋กœ ๊ตฌํ˜„ํ•˜๊ณ  ๋‚ด๋ณด๋‚ด๊ธฐ ์ „์— ๋ชจ๋ธ์— ์—ฐ๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ

๋‹ค์Œ์œผ๋กœ ๋ช‡ ๊ฐ€์ง€ ํŠน์„ฑ์— ๋Œ€ํ•œ ์–‘ ๋ฐ ์Œ์˜ ์˜ˆ์‹œ ๋ถ„ํฌ๋ฅผ ๋น„๊ตํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋•Œ ์Šค์Šค๋กœ ๊ฒ€ํ† ํ•  ์‚ฌํ•ญ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ด์™€ ๊ฐ™์€ ๋ถ„ํฌ๊ฐ€ ํ•ฉ๋ฆฌ์ ์ธ๊ฐ€?

    • ์˜ˆ, ์ด๋ฏธ ์ž…๋ ฅ์„ ์ •๊ทœํ™”ํ–ˆ์œผ๋ฉฐ ๋Œ€๋ถ€๋ถ„ +/- 2 ๋ฒ”์œ„์— ์ง‘์ค‘๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋ถ„ํฌ ๊ฐ„ ์ฐจ์ด๋ฅผ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

    • ์˜ˆ, ์–‘์˜ ์˜ˆ์—๋Š” ๊ทน๋‹จ์  ๊ฐ’์˜ ๋น„์œจ์ด ํ›จ์”ฌ ๋†’์Šต๋‹ˆ๋‹ค.

pos_df = pd.DataFrame(train_features[ bool_train_labels], columns=train_df.columns) neg_df = pd.DataFrame(train_features[~bool_train_labels], columns=train_df.columns) sns.jointplot(x=pos_df['V5'], y=pos_df['V6'], kind='hex', xlim=(-5,5), ylim=(-5,5)) plt.suptitle("Positive distribution") sns.jointplot(x=neg_df['V5'], y=neg_df['V6'], kind='hex', xlim=(-5,5), ylim=(-5,5)) _ = plt.suptitle("Negative distribution")

๋ชจ๋ธ ๋ฐ ๋ฉ”ํŠธ๋ฆญ ์ •์˜

์กฐ๋ฐ€ํ•˜๊ฒŒ ์—ฐ๊ฒฐ๋œ ์ˆจ๊ฒจ์ง„ ๋ ˆ์ด์–ด, ๊ณผ๋Œ€์ ํ•ฉ์„ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด, ๊ฑฐ๋ž˜ ์‚ฌ๊ธฐ ๊ฐ€๋Šฅ์„ฑ์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ์ถœ๋ ฅ ๋ ˆ์ด์–ด๋กœ ๊ฐ„๋‹จํ•œ ์‹ ๊ฒฝ๋ง์„ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

METRICS = [ keras.metrics.TruePositives(name='tp'), keras.metrics.FalsePositives(name='fp'), keras.metrics.TrueNegatives(name='tn'), keras.metrics.FalseNegatives(name='fn'), keras.metrics.BinaryAccuracy(name='accuracy'), keras.metrics.Precision(name='precision'), keras.metrics.Recall(name='recall'), keras.metrics.AUC(name='auc'), keras.metrics.AUC(name='prc', curve='PR'), # precision-recall curve ] def make_model(metrics=METRICS, output_bias=None): if output_bias is not None: output_bias = tf.keras.initializers.Constant(output_bias) model = keras.Sequential([ keras.layers.Dense( 16, activation='relu', input_shape=(train_features.shape[-1],)), keras.layers.Dropout(0.5), keras.layers.Dense(1, activation='sigmoid', bias_initializer=output_bias), ]) model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-3), loss=keras.losses.BinaryCrossentropy(), metrics=metrics) return model

์œ ์šฉํ•œ ๋ฉ”ํŠธ๋ฆญ ์ดํ•ดํ•˜๊ธฐ

์œ„์—์„œ ์ •์˜ํ•œ ๋ช‡ ๊ฐ€์ง€ ๋ฉ”ํŠธ๋ฆญ์€ ๋ชจ๋ธ์„ ํ†ตํ•ด ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • ํ—ˆ์œ„ ์Œ์„ฑ๊ณผ ํ—ˆ์œ„ ์–‘์„ฑ์€ ์ž˜๋ชป ๋ถ„๋ฅ˜๋œ ์ƒ˜ํ”Œ์ž…๋‹ˆ๋‹ค.

  • ์‹ค์ œ ์Œ์„ฑ๊ณผ ์‹ค์ œ ์–‘์„ฑ์€ ์˜ฌ๋ฐ”๋กœ ๋ถ„๋ฅ˜๋œ ์ƒ˜ํ”Œ์ž…๋‹ˆ๋‹ค.

  • ์ •ํ™•๋„๋Š” ์˜ฌ๋ฐ”๋กœ ๋ถ„๋ฅ˜๋œ ์˜ˆ์˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค.

trueย samplestotalย samples\frac{\text{true samples}}{\text{total samples}}

  • ์ •๋ฐ€๋„๋Š” ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜๋œ ์˜ˆ์ธก ์–‘์„ฑ์˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค.

trueย positivestrueย positivesย +ย falseย positives\frac{\text{true positives}}{\text{true positives + false positives}}

  • ์žฌํ˜„์œจ์€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜๋œ ์‹ค์ œ ์–‘์„ฑ์˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค.

trueย positivestrueย positivesย +ย falseย negatives\frac{\text{true positives}}{\text{true positives + false negatives}}

  • AUC๋Š” ROC-AUC(Area Under the Curve of a Receiver Operating Characteristic) ๊ณก์„ ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฉ”ํŠธ๋ฆญ์€ ๋ถ„๋ฅ˜์ž๊ฐ€ ์ž„์˜์˜ ์–‘์„ฑ ์ƒ˜ํ”Œ ์ˆœ์œ„๋ฅผ ์ž„์˜์˜ ์Œ์„ฑ ์ƒ˜ํ”Œ ์ˆœ์œ„๋ณด๋‹ค ๋†’๊ฒŒ ์ง€์ •ํ•  ํ™•๋ฅ ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • AUPRC๋Š” PR curve AUC๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฉ”ํŠธ๋ฆญ์€ ๋‹ค์–‘ํ•œ ํ™•๋ฅ  ์ž„๊ณ„๊ฐ’์— ๋Œ€ํ•œ ์ •๋ฐ€๋„-์žฌํ˜„์œจ ์Œ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ : ์ •ํ™•๋„๋Š” ์ด ์ž‘์—…์— ์œ ์šฉํ•œ ๋ฉ”ํŠธ๋ฆญ์ด ์•„๋‹™๋‹ˆ๋‹ค. ํ•ญ์ƒ False๋ฅผ ์˜ˆ์ธกํ•ด์•ผ ์ด ์ž‘์—…์—์„œ 99.8% ์ด์ƒ์˜ ์ •ํ™•๋„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋” ์ฝ์–ด๋ณด๊ธฐ:

๊ธฐ์ค€ ๋ชจ๋ธ

๋ชจ๋ธ ๊ตฌ์ถ•

์ด์ œ ์•ž์„œ ์ •์˜ํ•œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ  ํ•™์Šตํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๊ธฐ๋ณธ ๋ฐฐ์น˜ ํฌ๊ธฐ์ธ 2048๋ณด๋‹ค ํฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ฐฐ์น˜์—์„œ ์–‘์„ฑ ์ƒ˜ํ”Œ์„ ์ผ๋ถ€ ํฌํ•จ์‹œ์ผœ ์ ์ ˆํ•œ ๊ธฐํšŒ๋ฅผ ์–ป๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๋ถ€์ • ๊ฑฐ๋ž˜ ์˜ˆ์‹œ๋ฅผ ์ œ๋Œ€๋กœ ํ•™์Šตํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ : ์ด ๋ชจ๋ธ์€ ํด๋ž˜์Šค์˜ ๋ถˆ๊ท ํ˜•์„ ์ž˜ ๋‹ค๋ฃจ์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์ด ํŠœํ† ๋ฆฌ์–ผ์˜ ๋’ท๋ถ€๋ถ„์—์„œ ๊ฐœ์„ ํ•˜๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค.

EPOCHS = 100 BATCH_SIZE = 2048 early_stopping = tf.keras.callbacks.EarlyStopping( monitor='val_prc', verbose=1, patience=10, mode='max', restore_best_weights=True)
model = make_model() model.summary()

๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜์—ฌ ํ…Œ์ŠคํŠธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

model.predict(train_features[:10])

์„ ํƒ์‚ฌํ•ญ: ์ดˆ๊ธฐ ๋ฐ”์ด์–ด์Šค๋ฅผ ์˜ฌ๋ฐ”๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

์ด์™€ ๊ฐ™์€ ์ดˆ๊ธฐ ์ถ”์ธก์€ ์ ์ ˆํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์„ธํŠธ๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ์œผ๋‹ˆ๊นŒ์š”. ์ถœ๋ ฅ ๋ ˆ์ด์–ด์˜ ๋ฐ”์ด์–ด์Šค๋ฅผ ์„ค์ •ํ•˜์—ฌ ํ•ด๋‹น ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ๋ฐ˜์˜ํ•˜๋ฉด(์ฐธ์กฐ: ์‹ ๊ฒฝ๋ง ํ›ˆ๋ จ ๋ฐฉ๋ฒ•: "init well") ์ดˆ๊ธฐ ์ˆ˜๋ ด์— ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ๋ณธ ๋ฐ”์ด์–ด์Šค ์ดˆ๊ธฐํ™”๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์†์‹ค์€ ์•ฝ math.log(2) = 0.69314

results = model.evaluate(train_features, train_labels, batch_size=BATCH_SIZE, verbose=0) print("Loss: {:0.4f}".format(results[0]))

์˜ฌ๋ฐ”๋ฅธ ๋ฐ”์ด์–ด์Šค ์„ค์ •์€ ๋‹ค์Œ์—์„œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

p0=pos/(pos+neg)=1/(1+eโˆ’b0)p_0 = pos/(pos + neg) = 1/(1+e^{-b_0})
initial_bias = np.log([pos/neg]) initial_bias

์ด๋ฅผ ์ดˆ๊ธฐ ๋ฐ”์ด์–ด์Šค๋กœ ์„ค์ •ํ•˜๋ฉด ๋ชจ๋ธ์€ ํ›จ์”ฌ ๋” ํ•ฉ๋ฆฌ์ ์œผ๋กœ ์ดˆ๊ธฐ ์ถ”์ธก์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

pos/total = 0.0018์— ๊ฐ€๊นŒ์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

model = make_model(output_bias=initial_bias) model.predict(train_features[:10])

์ด ์ดˆ๊ธฐํ™”๋ฅผ ํ†ตํ•ด์„œ ์ดˆ๊ธฐ ์†์‹ค์€ ๋Œ€๋žต ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผํ•ฉ๋‹ˆ๋‹ค.:

โˆ’p0log(p0)โˆ’(1โˆ’p0)log(1โˆ’p0)=0.01317-p_0log(p_0)-(1-p_0)log(1-p_0) = 0.01317
results = model.evaluate(train_features, train_labels, batch_size=BATCH_SIZE, verbose=0) print("Loss: {:0.4f}".format(results[0]))

์ด ์ดˆ๊ธฐ ์†์‹ค์€ ๋‹จ์ˆœํ•œ ์ƒํƒœ์˜ ์ดˆ๊ธฐํ™”์—์„œ ๋ฐœ์ƒํ–ˆ์„ ๋•Œ ๋ณด๋‹ค ์•ฝ 50๋ฐฐ ์ ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฐ ์‹์œผ๋กœ ๋ชจ๋ธ์€ ์ฒ˜์Œ ๋ช‡ epoch๋ฅผ ์“ฐ๋ฉฐ ์–‘์„ฑ ์˜ˆ์‹œ๊ฐ€ ๊ฑฐ์˜ ์—†๋‹ค๋Š” ๊ฒƒ์„ ํ•™์Šตํ•  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ•™์Šต์„ ํ•˜๋ฉด์„œ ์†์‹ค๋œ ํ”Œ๋กฏ์„ ๋” ์‰ฝ๊ฒŒ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜ ์ฒดํฌ ํฌ์ธํŠธ

๋‹ค์–‘ํ•œ ํ•™์Šต ๊ณผ์ •์„ ๋น„๊ตํ•˜๋ ค๋ฉด ์ด ์ดˆ๊ธฐ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ฒดํฌํฌ์ธํŠธ ํŒŒ์ผ์— ๋ณด๊ด€ํ•˜๊ณ  ํ•™์Šต ์ „์— ๊ฐ ๋ชจ๋ธ์— ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

initial_weights = os.path.join(tempfile.mkdtemp(), 'initial_weights') model.save_weights(initial_weights)

๋ฐ”์ด์–ด์Šค ์ˆ˜์ •์ด ๋„์›€์ด ๋˜๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ

๊ณ„์† ์ง„ํ–‰ํ•˜๊ธฐ ์ „์— ์กฐ์‹ฌ์Šค๋Ÿฌ์šด ๋ฐ”์ด์–ด์Šค ์ดˆ๊ธฐํ™”๊ฐ€ ์‹ค์ œ๋กœ ๋„์›€์ด ๋˜์—ˆ๋Š”์ง€ ๋น ๋ฅด๊ฒŒ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค

์ •๊ตํ•œ ์ดˆ๊ธฐํ™”๋ฅผ ํ•œ ๋ชจ๋ธ๊ณผ ํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ์„ 20 epoch ํ•™์Šต์‹œํ‚ค๊ณ  ์†์‹ค์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

model = make_model() model.load_weights(initial_weights) model.layers[-1].bias.assign([0.0]) zero_bias_history = model.fit( train_features, train_labels, batch_size=BATCH_SIZE, epochs=20, validation_data=(val_features, val_labels), verbose=0)
model = make_model() model.load_weights(initial_weights) careful_bias_history = model.fit( train_features, train_labels, batch_size=BATCH_SIZE, epochs=20, validation_data=(val_features, val_labels), verbose=0)
def plot_loss(history, label, n): # Use a log scale on y-axis to show the wide range of values. plt.semilogy(history.epoch, history.history['loss'], color=colors[n], label='Train ' + label) plt.semilogy(history.epoch, history.history['val_loss'], color=colors[n], label='Val ' + label, linestyle="--") plt.xlabel('Epoch') plt.ylabel('Loss')
plot_loss(zero_bias_history, "Zero Bias", 0) plot_loss(careful_bias_history, "Careful Bias", 1)

์œ„์˜ ๊ทธ๋ฆผ์—์„œ ๋ช…ํ™•ํžˆ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด, ๊ฒ€์ฆ ์†์‹ค ์ธก๋ฉด์—์„œ ์ด์™€ ๊ฐ™์€ ์ •๊ตํ•œ ์ดˆ๊ธฐํ™”์—๋Š” ๋ถ„๋ช…ํ•œ ์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ํ•™์Šต

model = make_model() model.load_weights(initial_weights) baseline_history = model.fit( train_features, train_labels, batch_size=BATCH_SIZE, epochs=EPOCHS, callbacks=[early_stopping], validation_data=(val_features, val_labels))

ํ•™์Šต ์ด๋ ฅ ํ™•์ธ

์ด ์„น์…˜์—์„œ๋Š” ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ ์„ธํŠธ์—์„œ ๋ชจ๋ธ์˜ ์ •ํ™•๋„ ๋ฐ ์†์‹ค์— ๋Œ€ํ•œ ํ”Œ๋กฏ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ณผ๋Œ€์ ํ•ฉ ํ™•์ธ์— ์œ ์šฉํ•˜๋ฉฐ ๊ณผ๋Œ€์ ํ•ฉ ๋ฐ ๊ณผ์†Œ์ ํ•ฉ ํŠœํ† ๋ฆฌ์–ผ์—์„œ ์ž์„ธํžˆ ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ, ์œ„์—์„œ ๋งŒ๋“  ๋ชจ๋“  ๋ฉ”ํŠธ๋ฆญ์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ”Œ๋กฏ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฑฐ์ง“ ์Œ์„ฑ์ด ์˜ˆ์‹œ์— ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

def plot_metrics(history): metrics = ['loss', 'prc', 'precision', 'recall'] for n, metric in enumerate(metrics): name = metric.replace("_"," ").capitalize() plt.subplot(2,2,n+1) plt.plot(history.epoch, history.history[metric], color=colors[0], label='Train') plt.plot(history.epoch, history.history['val_'+metric], color=colors[0], linestyle="--", label='Val') plt.xlabel('Epoch') plt.ylabel(name) if metric == 'loss': plt.ylim([0, plt.ylim()[1]]) elif metric == 'auc': plt.ylim([0.8,1]) else: plt.ylim([0,1]) plt.legend()
plot_metrics(baseline_history)

์ฐธ๊ณ : ๊ฒ€์ฆ ๊ณก์„ ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ํ›ˆ๋ จ ๊ณก์„ ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ฃผ๋กœ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•  ๋•Œ drop out ๋ ˆ์ด์–ด๊ฐ€ ํ™œ์„ฑํ™” ๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๋ฉ”ํŠธ๋ฆญ ํ‰๊ฐ€

ํ˜ผ๋™ ํ–‰๋ ฌ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹ค์ œ ๋ ˆ์ด๋ธ”๊ณผ ์˜ˆ์ธก ๋ ˆ์ด๋ธ”์„ ์š”์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ X์ถ•์€ ์˜ˆ์ธก ๋ ˆ์ด๋ธ”์ด๊ณ  Y์ถ•์€ ์‹ค์ œ ๋ ˆ์ด๋ธ”์ž…๋‹ˆ๋‹ค.

train_predictions_baseline = model.predict(train_features, batch_size=BATCH_SIZE) test_predictions_baseline = model.predict(test_features, batch_size=BATCH_SIZE)
def plot_cm(labels, predictions, p=0.5): cm = confusion_matrix(labels, predictions > p) plt.figure(figsize=(5,5)) sns.heatmap(cm, annot=True, fmt="d") plt.title('Confusion matrix @{:.2f}'.format(p)) plt.ylabel('Actual label') plt.xlabel('Predicted label') print('Legitimate Transactions Detected (True Negatives): ', cm[0][0]) print('Legitimate Transactions Incorrectly Detected (False Positives): ', cm[0][1]) print('Fraudulent Transactions Missed (False Negatives): ', cm[1][0]) print('Fraudulent Transactions Detected (True Positives): ', cm[1][1]) print('Total Fraudulent Transactions: ', np.sum(cm[1]))

ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๊ณ  ์œ„์—์„œ ์ƒ์„ฑํ•œ ๋ฉ”ํŠธ๋ฆญ ๊ฒฐ๊ณผ๋ฅผ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

baseline_results = model.evaluate(test_features, test_labels, batch_size=BATCH_SIZE, verbose=0) for name, value in zip(model.metrics_names, baseline_results): print(name, ': ', value) print() plot_cm(test_labels, test_predictions_baseline)

๋งŒ์•ฝ ๋ชจ๋ธ์ด ๋ชจ๋‘ ์™„๋ฒฝํ•˜๊ฒŒ ์˜ˆ์ธกํ–ˆ๋‹ค๋ฉด ๋Œ€๊ฐํ–‰๋ ฌ์ด ๋˜์–ด ์˜ˆ์ธก ์˜ค๋ฅ˜๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉฐ ๋Œ€๊ฐ์„  ๊ฐ’์€ 0์ด ๋ฉ๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ๊ฒฝ์šฐ, ๋งคํŠธ๋ฆญ์— ๊ฑฐ์ง“ ์–‘์„ฑ์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋ฅผ ํ†ตํ•ด ํ”Œ๋ž˜๊ทธ๊ฐ€ ์ž˜๋ชป ์ง€์ •๋œ ํ•ฉ๋ฒ•์ ์ธ ๊ฑฐ๋ž˜๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์ ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ฑฐ์ง“ ์–‘์„ฑ ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋”๋ผ๋„ ๊ฑฐ์ง“ ์Œ์„ฑ์„ ๋” ๋‚ฎ์ถ”๊ณ  ์‹ถ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฑฐ์ง“ ์Œ์„ฑ์€ ๋ถ€์ • ๊ฑฐ๋ž˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ฑฐ์ง“ ์–‘์„ฑ์€ ๊ณ ๊ฐ์—๊ฒŒ ์ด๋ฉ”์ผ์„ ๋ณด๋‚ด ์นด๋“œ ํ™œ๋™ ํ™•์ธ์„ ์š”์ฒญํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฑฐ์ง“ ์Œ์„ฑ์„ ๋‚ฎ์ถ”๋Š” ๊ฒƒ์ด ๋” ๋ฐ”๋žŒ์งํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

ROC ํ”Œ๋กœํŒ…

์ด์ œ ROC์„ ํ”Œ๋กœํŒ… ํ•˜์‹ญ์‹œ์˜ค. ์ด ๊ทธ๋ž˜ํ”„๋Š” ์ถœ๋ ฅ ์ž„๊ณ„๊ฐ’์„ ์กฐ์ •ํ•˜๊ธฐ๋งŒ ํ•ด๋„ ๋ชจ๋ธ์ด ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๋Š” ์„ฑ๋Šฅ ๋ฒ”์œ„๋ฅผ ํ•œ๋ˆˆ์— ๋ณด์—ฌ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

def plot_roc(name, labels, predictions, **kwargs): fp, tp, _ = sklearn.metrics.roc_curve(labels, predictions) plt.plot(100*fp, 100*tp, label=name, linewidth=2, **kwargs) plt.xlabel('False positives [%]') plt.ylabel('True positives [%]') plt.xlim([-0.5,20]) plt.ylim([80,100.5]) plt.grid(True) ax = plt.gca() ax.set_aspect('equal')
plot_roc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0]) plot_roc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--') plt.legend(loc='lower right');

AUPRC ํ”Œ๋กœํŒ…

์ด์ œ AUPRC๋ฅผ ํ”Œ๋กœํŒ…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ณด๊ฐ„๋œ ์ •๋ฐ€๋„-์žฌํ˜„์œจ ๊ณก์„  ์•„๋ž˜ ์˜์—ญ์œผ๋กœ, ๋ถ„๋ฅ˜ ์ž„๊ณ„๊ฐ’์˜ ์—ฌ๋Ÿฌ ๊ฐ’์— ๋Œ€ํ•œ (์žฌํ˜„์œจ, ์ •๋ฐ€๋„) ์ ์„ ํ”Œ๋กœํŒ…ํ•˜์—ฌ ์–ป์Šต๋‹ˆ๋‹ค. ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ PR AUC๋Š” ๋ชจ๋ธ์˜ ํ‰๊ท  ์ •๋ฐ€๋„์™€ ๋™์ผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

def plot_prc(name, labels, predictions, **kwargs): precision, recall, _ = sklearn.metrics.precision_recall_curve(labels, predictions) plt.plot(precision, recall, label=name, linewidth=2, **kwargs) plt.xlabel('Precision') plt.ylabel('Recall') plt.grid(True) ax = plt.gca() ax.set_aspect('equal')
plot_prc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0]) plot_prc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--') plt.legend(loc='lower right');

์ •๋ฐ€๋„๊ฐ€ ๋น„๊ต์  ๋†’์€ ๊ฒƒ ๊ฐ™์ง€๋งŒ ์žฌํ˜„์œจ๊ณผ ROC ๊ณก์„ (AUC) ์•„๋ž˜ ๋ฉด์ ์ด ๋†’์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ถ„๋ฅ˜์ž๊ฐ€ ์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ ๋ชจ๋‘๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋ ค๊ณ  ํ•˜๋ฉด ์ข…์ข… ์–ด๋ ค์›€์— ์ง๋ฉดํ•˜๋Š”๋ฐ, ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ์„ธํŠธ๋กœ ์ž‘์—…ํ•  ๋•Œ ํŠนํžˆ ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค. ๊ด€์‹ฌ์žˆ๋Š” ๋ฌธ์ œ์˜ ๋งฅ๋ฝ์—์„œ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ์˜ค๋ฅ˜ ๋น„์šฉ์„ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ์˜ˆ์‹œ์—์„œ ๊ฑฐ์ง“ ์Œ์„ฑ(๋ถ€์ • ๊ฑฐ๋ž˜๋ฅผ ๋†“์นœ ๊ฒฝ์šฐ)์€ ๊ธˆ์ „์  ๋น„์šฉ์ด ๋“ค ์ˆ˜ ์žˆ์ง€๋งŒ , ๊ฑฐ์ง“ ์–‘์„ฑ(๊ฑฐ๋ž˜๊ฐ€ ์‚ฌ๊ธฐ ํ–‰์œ„๋กœ ์ž˜๋ชป ํ‘œ์‹œ๋จ)์€ ์‚ฌ์šฉ์ž ๋งŒ์กฑ๋„๋ฅผ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜

ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜ ๊ณ„์‚ฐ

๋ชฉํ‘œ๋Š” ๋ถ€์ • ๊ฑฐ๋ž˜๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๊ฒƒ์ด์ง€๋งŒ, ์ž‘์—…ํ•  ์ˆ˜ ์žˆ๋Š” ์–‘์„ฑ ์ƒ˜ํ”Œ์ด ๋งŽ์ง€ ์•Š์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋ถ„๋ฅ˜์ž๋Š” ์ด์šฉ ๊ฐ€๋Šฅํ•œ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋‘๊ณ ์ž ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ Keras ๊ฐ€์ค‘์น˜๋ฅผ ์ „๋‹ฌํ•œ๋‹ค๋ฉด ์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ๋ชจ๋ธ์€ ๋” ์ ์€ ํด๋ž˜์Šค ์˜ˆ์‹œ์— "๋” ๋งŽ์€ ๊ด€์‹ฌ์„ ๊ธฐ์šธ์ผ" ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# Scaling by total/2 helps keep the loss to a similar magnitude. # The sum of the weights of all examples stays the same. weight_for_0 = (1 / neg) * (total / 2.0) weight_for_1 = (1 / pos) * (total / 2.0) class_weight = {0: weight_for_0, 1: weight_for_1} print('Weight for class 0: {:.2f}'.format(weight_for_0)) print('Weight for class 1: {:.2f}'.format(weight_for_1))

ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜๋กœ ๋ชจ๋ธ ๊ต์œก

์ด์ œ ํ•ด๋‹น ๋ชจ๋ธ์ด ์˜ˆ์ธก์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜๋กœ ๋ชจ๋ธ์„ ์žฌ ๊ต์œกํ•˜๊ณ  ํ‰๊ฐ€ํ•ด ๋ณด์‹ญ์‹œ์˜ค.

์ฐธ๊ณ : class_weights๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์†์‹ค ๋ฒ”์œ„๊ฐ€ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์˜ตํ‹ฐ๋งˆ์ด์ €์— ๋”ฐ๋ผ ํ›ˆ๋ จ์˜ ์•ˆ์ •์„ฑ์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. tf.keras.optimizers.SGD์™€ ๊ฐ™์ด ๋‹จ๊ณ„ ํฌ๊ธฐ๊ฐ€ ๊ทธ๋ž˜๋””์–ธํŠธ์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š” ์˜ตํ‹ฐ๋งˆ์ด์ €๋Š” ์‹คํŒจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์‚ฌ์šฉ๋œ ์˜ตํ‹ฐ๋งˆ์ด์ €์ธ tf.keras.optimizers.Adam์€ ์Šค์ผ€์ผ๋ง ๋ณ€๊ฒฝ์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ฐ€์ค‘์น˜๋กœ ์ธํ•ด ์ด ์†์‹ค์€ ๋‘ ๋ชจ๋ธ ๊ฐ„์— ๋น„๊ตํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

weighted_model = make_model() weighted_model.load_weights(initial_weights) weighted_history = weighted_model.fit( train_features, train_labels, batch_size=BATCH_SIZE, epochs=EPOCHS, callbacks=[early_stopping], validation_data=(val_features, val_labels), # The class weights go here class_weight=class_weight)

ํ•™์Šต ์ด๋ ฅ ์กฐํšŒ

plot_metrics(weighted_history)

๋งคํŠธ๋ฆญ ํ‰๊ฐ€

train_predictions_weighted = weighted_model.predict(train_features, batch_size=BATCH_SIZE) test_predictions_weighted = weighted_model.predict(test_features, batch_size=BATCH_SIZE)
weighted_results = weighted_model.evaluate(test_features, test_labels, batch_size=BATCH_SIZE, verbose=0) for name, value in zip(weighted_model.metrics_names, weighted_results): print(name, ': ', value) print() plot_cm(test_labels, test_predictions_weighted)

์—ฌ๊ธฐ์„œ ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ฑฐ์ง“ ์–‘์„ฑ์ด ๋” ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์ •ํ™•๋„์™€ ์ •๋ฐ€๋„๋Š” ๋” ๋‚ฎ์ง€๋งŒ, ๋ฐ˜๋Œ€๋กœ ์ฐธ ์–‘์„ฑ์ด ๋งŽ์œผ๋ฏ€๋กœ ์žฌํ˜„์œจ๊ณผ AUC๋Š” ๋” ๋†’๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ด ๋ชจ๋ธ์€ ์žฌํ˜„์œจ์ด ๋” ๋†’์Šต๋‹ˆ๋‹ค(๋” ๋งŽ์€ ๋ถ€์ • ๊ฑฐ๋ž˜ ์‹๋ณ„). ๋ฌผ๋ก  ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ์˜ค๋ฅ˜ ๋ชจ๋‘ ๋น„์šฉ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค(๋งŽ์€ ํ•ฉ๋ฒ• ๊ฑฐ๋ž˜๋ฅผ ์‚ฌ๊ธฐ๋กœ ํ‘œ์‹œํ•˜์—ฌ ์‚ฌ์šฉ์ž๋ฅผ ๋ฒˆ๊ฑฐ๋กญ๊ฒŒ ํ•˜๋Š” ๊ฒƒ์€ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ). ๋”ฐ๋ผ์„œ, ์—ฌ๋Ÿฌ ์œ ํ˜• ์˜ค๋ฅ˜ ๊ฐ„ ์ ˆ์ถฉ ์‚ฌํ•ญ์„ ์‹ ์ค‘ํ•˜๊ฒŒ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ROC ํ”Œ๋กœํŒ…

plot_roc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0]) plot_roc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--') plot_roc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1]) plot_roc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--') plt.legend(loc='lower right');

AUPRC ํ”Œ๋กœํŒ…

plot_prc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0]) plot_prc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--') plot_prc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1]) plot_prc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--') plt.legend(loc='lower right');

์˜ค๋ฒ„์ƒ˜ํ”Œ๋ง

์†Œ์ˆ˜ ๊ณ„๊ธ‰ ๊ณผ๋Œ€ ํ‘œ๋ณธ

๊ด€๋ จ๋œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ์†Œ์ˆ˜ ํด๋ž˜์Šค๋ฅผ ์˜ค๋ฒ„ ์ƒ˜ํ”Œ๋ง ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋ฆฌ ์ƒ˜ํ”Œ๋ง ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

pos_features = train_features[bool_train_labels] neg_features = train_features[~bool_train_labels] pos_labels = train_labels[bool_train_labels] neg_labels = train_labels[~bool_train_labels]

NumPy ์‚ฌ์šฉ

๊ธ์ •์ ์ธ ์˜ˆ์—์„œ ์ ์ ˆํ•œ ์ˆ˜์˜ ์ž„์˜ ์ธ๋ฑ์Šค๋ฅผ ์„ ํƒํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๊ท ํ˜•์„ ์ˆ˜๋™์œผ๋กœ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.:

ids = np.arange(len(pos_features)) choices = np.random.choice(ids, len(neg_features)) res_pos_features = pos_features[choices] res_pos_labels = pos_labels[choices] res_pos_features.shape
resampled_features = np.concatenate([res_pos_features, neg_features], axis=0) resampled_labels = np.concatenate([res_pos_labels, neg_labels], axis=0) order = np.arange(len(resampled_labels)) np.random.shuffle(order) resampled_features = resampled_features[order] resampled_labels = resampled_labels[order] resampled_features.shape

tf.data ์‚ฌ์šฉ

tf.data๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๊ท ํ˜•์žˆ๋Š” ์˜ˆ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ positive์™€ negative ๋ฐ์ดํ„ฐ์„ธํŠธ๋กœ ์‹œ์ž‘ํ•˜์—ฌ ์ด๋“ค์„ ๋ณ‘ํ•ฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. tf.data guide์—์„œ ๋” ๋งŽ์€ ์˜ˆ๋ฅผ ์ฐธ์กฐํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

BUFFER_SIZE = 100000 def make_ds(features, labels): ds = tf.data.Dataset.from_tensor_slices((features, labels))#.cache() ds = ds.shuffle(BUFFER_SIZE).repeat() return ds pos_ds = make_ds(pos_features, pos_labels) neg_ds = make_ds(neg_features, neg_labels)

๊ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” (feature, label) ์Œ์œผ๋กœ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

for features, label in pos_ds.take(1): print("Features:\n", features.numpy()) print() print("Label: ", label.numpy())

tf.data.Dataset.sample_from_datasets๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋‘˜์„ ๋ณ‘ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

resampled_ds = tf.data.Dataset.sample_from_datasets([pos_ds, neg_ds], weights=[0.5, 0.5]) resampled_ds = resampled_ds.batch(BATCH_SIZE).prefetch(2)
for features, label in resampled_ds.take(1): print(label.numpy().mean())

์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด epoch๋‹น ์Šคํ… ์ˆ˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ฒฝ์šฐ "epoch"์˜ ์ •์˜๋Š” ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฐ ์Œ์„ฑ ์˜ˆ์‹œ๋ฅผ ํ•œ ๋ฒˆ ๋ณผ ๋•Œ ํ•„์š”ํ•œ ๋ฐฐ์น˜ ์ˆ˜๋ผ๊ณ  ํ•ด๋ด…์‹œ๋‹ค.

resampled_steps_per_epoch = np.ceil(2.0*neg/BATCH_SIZE) resampled_steps_per_epoch

์˜ค๋ฒ„ ์ƒ˜ํ”Œ๋ง ๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํ•™์Šต

์ด์ œ ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹  ๋ฆฌ ์ƒ˜ํ”Œ๋ง ๋œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์ด ์–ด๋–ป๊ฒŒ ๋น„๊ต๋˜๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

์ฐธ๊ณ : ๊ธ์ •์ ์ธ ์˜ˆ๋ฅผ ๋ณต์ œํ•˜์—ฌ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ท ํ˜•์„ ์ด๋ฃจ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ ํฌ๊ธฐ๊ฐ€ ๋” ํฌ๊ณ  ๊ฐ ์„ธ๋Œ€๊ฐ€ ๋” ๋งŽ์€ ํ•™์Šต ๋‹จ๊ณ„๋ฅผ ์œ„ํ•ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

resampled_model = make_model() resampled_model.load_weights(initial_weights) # Reset the bias to zero, since this dataset is balanced. output_layer = resampled_model.layers[-1] output_layer.bias.assign([0]) val_ds = tf.data.Dataset.from_tensor_slices((val_features, val_labels)).cache() val_ds = val_ds.batch(BATCH_SIZE).prefetch(2) resampled_history = resampled_model.fit( resampled_ds, epochs=EPOCHS, steps_per_epoch=resampled_steps_per_epoch, callbacks=[early_stopping], validation_data=val_ds)

๋งŒ์•ฝ ํ›ˆ๋ จ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๊ฐ ๊ธฐ์šธ๊ธฐ ์—…๋ฐ์ดํŠธ์—์„œ ์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ณ ๋ คํ•˜๋Š” ๊ฒฝ์šฐ, ์ด ์˜ค๋ฒ„ ์ƒ˜ํ”Œ๋ง์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์—ฌ๊ธฐ์—์„œ์™€ ๊ฐ™์ด, ๋ชจ๋ธ์„ ๋ฐฐ์น˜๋ณ„๋กœ ํ›ˆ๋ จํ•  ๋•Œ ์˜ค๋ฒ„์ƒ˜ํ”Œ๋ง๋œ ๋ฐ์ดํ„ฐ๋Š” ๋” ๋ถ€๋“œ๋Ÿฌ์šด ๊ทธ๋ž˜๋””์–ธํŠธ ์‹ ํ˜ธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์–‘์„ฑ ์˜ˆ์‹œ๊ฐ€ ํ•˜๋‚˜์˜ ๋ฐฐ์น˜์—์„œ ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ง€๊ธฐ๋ณด๋‹ค, ๋งค๋ฒˆ ์—ฌ๋Ÿฌ ๋ฐฐ์น˜์—์„œ ์ž‘์€ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ–๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ด ๋ถ€๋“œ๋Ÿฌ์šด ๊ธฐ์šธ๊ธฐ ์‹ ํ˜ธ๋Š” ๋ชจ๋ธ์„ ๋” ์‰ฝ๊ฒŒ ํ›ˆ๋ จ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ต์œก ์ด๋ ฅ ํ™•์ธ

ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ๊ฒ€์ฆ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์™€ ์™„์ „ํžˆ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๊ธฐ์„œ ์ธก์ • ํ•ญ๋ชฉ์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

plot_metrics(resampled_history)

์žฌ๊ต์œก

๊ท ํ˜• ์žกํžŒ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํ›ˆ๋ จ์ด ๋” ์‰ฝ๊ธฐ ๋•Œ๋ฌธ์— ์œ„์˜ ํ›ˆ๋ จ ์ ˆ์ฐจ๊ฐ€ ๋น ๋ฅด๊ฒŒ ๊ณผ์ ํ•ฉ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

epoch๋ฅผ ๋‚˜๋ˆ„์–ด tf.keras.callbacks.EarlyStopping๋ฅผ ๋ณด๋‹ค ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ œ์–ดํ•˜์—ฌ ํ›ˆ๋ จ ์ค‘๋‹จ ์‹œ์ ์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.

resampled_model = make_model() resampled_model.load_weights(initial_weights) # Reset the bias to zero, since this dataset is balanced. output_layer = resampled_model.layers[-1] output_layer.bias.assign([0]) resampled_history = resampled_model.fit( resampled_ds, # These are not real epochs steps_per_epoch=20, epochs=10*EPOCHS, callbacks=[early_stopping], validation_data=(val_ds))

ํ›ˆ๋ จ ์ด๋ ฅ ์žฌํ™•์ธ

plot_metrics(resampled_history)

๋ฉ”ํŠธ๋ฆญ ํ‰๊ฐ€

train_predictions_resampled = resampled_model.predict(train_features, batch_size=BATCH_SIZE) test_predictions_resampled = resampled_model.predict(test_features, batch_size=BATCH_SIZE)
resampled_results = resampled_model.evaluate(test_features, test_labels, batch_size=BATCH_SIZE, verbose=0) for name, value in zip(resampled_model.metrics_names, resampled_results): print(name, ': ', value) print() plot_cm(test_labels, test_predictions_resampled)

ROC ํ”Œ๋กœํŒ…

plot_roc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0]) plot_roc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--') plot_roc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1]) plot_roc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--') plot_roc("Train Resampled", train_labels, train_predictions_resampled, color=colors[2]) plot_roc("Test Resampled", test_labels, test_predictions_resampled, color=colors[2], linestyle='--') plt.legend(loc='lower right');

AUPRC ํ”Œ๋กœํŒ…

plot_prc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0]) plot_prc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--') plot_prc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1]) plot_prc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--') plot_prc("Train Resampled", train_labels, train_predictions_resampled, color=colors[2]) plot_prc("Test Resampled", test_labels, test_predictions_resampled, color=colors[2], linestyle='--') plt.legend(loc='lower right');

ํŠœํ† ๋ฆฌ์–ผ์„ ์ด ๋ฌธ์ œ์— ์ ์šฉ

๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜๋Š” ํ•™์Šต ํ•  ์ƒ˜ํ”Œ์ด ๋„ˆ๋ฌด ์ ๊ธฐ ๋•Œ๋ฌธ์— ๋ณธ์งˆ์ ์œผ๋กœ ์–ด๋ ค์šด ์ž‘์—…์ž…๋‹ˆ๋‹ค. ํ•ญ์ƒ ๋ฐ์ดํ„ฐ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ ๊ฐ€๋Šฅํ•œ ํ•œ ๋งŽ์€ ์ƒ˜ํ”Œ์„ ์ˆ˜์ง‘ํ•˜๊ณ  ๋ชจ๋ธ์ด ์†Œ์ˆ˜ ํด๋ž˜์Šค๋ฅผ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์–ด๋–ค ๊ธฐ๋Šฅ์ด ๊ด€๋ จ ๋  ์ˆ˜ ์žˆ๋Š”์ง€์— ๋Œ€ํ•ด ์‹ค์งˆ์ ์ธ ์ƒ๊ฐ์„ ํ•˜๋„๋ก ์ตœ์„ ์„ ๋‹คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ์‹œ์ ์—์„œ ๋ชจ๋ธ์€ ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๊ฐœ์„ ํ•˜๊ณ  ์‚ฐ์ถœํ•˜๋Š”๋ฐ ์–ด๋ ค์›€์„ ๊ฒช์„ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ฌธ์ œ์˜ ์ปจํ…์ŠคํŠธ์™€ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ์˜ค๋ฅ˜ ๊ฐ„์˜ ๊ท ํ˜•์„ ์—ผ๋‘์— ๋‘๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.