1. hotel 데이터셋

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

hotel_df = pd.read_csv('/content/drive/MyDrive/KDT/6. 머신러닝과 딥러닝/Data/hotel.csv')
hotel_df

hotel_df.info()
출력:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           119390 non-null  object 
 1   is_canceled                     119390 non-null  int64  
 2   lead_time                       119390 non-null  int64  
 3   arrival_date_year               119390 non-null  int64  
 4   arrival_date_month              119390 non-null  object 
 5   arrival_date_week_number        119390 non-null  int64  
 6   arrival_date_day_of_month       119390 non-null  int64  
 7   stays_in_weekend_nights         119390 non-null  int64  
 8   stays_in_week_nights            119390 non-null  int64  
 9   adults                          119390 non-null  int64  
 10  children                        119386 non-null  float64
 11  babies                          119390 non-null  int64  
 12  meal                            119390 non-null  object 
 13  country                         118902 non-null  object 
 14  distribution_channel            119390 non-null  object 
 15  is_repeated_guest               119390 non-null  int64  
 16  previous_cancellations          119390 non-null  int64  
 17  previous_bookings_not_canceled  119390 non-null  int64  
 18  reserved_room_type              119390 non-null  object 
 19  assigned_room_type              119390 non-null  object 
 20  booking_changes                 119390 non-null  int64  
 21  deposit_type                    119390 non-null  object 
 22  days_in_waiting_list            119390 non-null  int64  
 23  customer_type                   119390 non-null  object 
 24  adr                             119390 non-null  float64
 25  required_car_parking_spaces     119390 non-null  int64  
 26  total_of_special_requests       119390 non-null  int64  
 27  reservation_status_date         119390 non-null  object 
 28  name                            119390 non-null  object 
 29  email                           119390 non-null  object 
 30  phone-number                    119390 non-null  object 
 31  credit_card                     119390 non-null  object 
dtypes: float64(2), int64(16), object(14)
memory usage: 29.1+ MB

hotel: 호텔 종류
is_canceled: 취소 여부
lead_time: 예약 시점으로부터 체크인 될 때까지의 기간(얼마나 미리 예약했는지)
arrival_date_year: 예약 연도
arrival_date_month: 예약 월
arrival_date_week_number: 예약 주
arrival_date_day_of_month: 예약 일
stays_in_weekend_nights: 주말을 끼고 얼마나 묶었는지
stays_in_week_nights: 평일을 끼고 얼마나 묶었는지
adults: 성인 인원수
children: 어린이 인원수
babies: 아기 인원수
meal: 식사 형태
country: 지역
distribution_channel: 어떤 방식으로 예약했는지
is_repeated_guest: 예약한적이 있는 고객인지
previous_cancellations: 몇번 예약을 취소했었는지
previous_bookings_not_canceled: 예약을 취소하지 않고 정상 숙박한 횟수
reserved_room_type: 희망한 룸타입
assigned_room_type: 실제 배정된 룸타입
booking_changes: 예약 후 서비스가 몇번 변경되었는지
deposit_type: 요금 납부 방식
days_in_waiting_list: 예약을 위해 기다린 날짜
customer_type: 고객 타입
adr: 특정일에 높아지거나 낮아지는 가격
required_car_parking_spaces: 주차공간을 요구했는지
total_of_special_requests: 특별한 별도의 요청사항이 있는지
reservation_status_date: 예약한 날짜
name: 이름
email: 이메일
phone-number: 전화번호
credit_card: 카드번호

hotel_df.drop(['name', 'email', 'phone-number', 'credit_card', 'reservation_status_date'], axis=1, inplace=True)
hotel_df.head()

hotel_df.describe()

sns.displot(hotel_df['lead_time'])

sns.boxplot(hotel_df['lead_time'])

sns.barplot(x=hotel_df['distribution_channel'], y=hotel_df['is_canceled'])

hotel_df['distribution_channel'].value_counts()
출력:
distribution_channel
TA/TO        97870
Direct       14645
Corporate     6677
GDS            193
Undefined        5
Name: count, dtype: int64

sns.barplot(x=hotel_df['hotel'], y=hotel_df['is_canceled'])

sns.barplot(x=hotel_df['arrival_date_year'], y=hotel_df['is_canceled'])

plt.figure(figsize=(15, 5))
sns.barplot(x=hotel_df['arrival_date_month'], y=hotel_df['is_canceled'])

import calendar

print(calendar.month_name[1])
print(calendar.month_name[2])
print(calendar.month_name[3])
print(calendar.month_name[4])
출력:
January
February
March
April

months = []
for i in range(1, 13):
    months.append(calendar.month_name[i])

months
출력:
['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

 plt.figure(figsize=(15, 5))
sns.barplot(x=hotel_df['arrival_date_month'], y=hotel_df['is_canceled'], order=months)

sns.barplot(x=hotel_df['is_repeated_guest'], y=hotel_df['is_canceled'])

sns.barplot(x=hotel_df['deposit_type'], y=hotel_df['is_canceled'])

hotel_df['deposit_type'].value_counts()
출력:
deposit_type
No Deposit    104641
Non Refund     14587
Refundable       162
Name: count, dtype: int64



# corr(): 열들 간의 상관관계를 계산하는 함수. (피어슨 상관계수)
# -1 ~ 1까지의 범위를 가지며 0에 가까울수록 두 변수의 상관관계가 없거나 매우 약함
plt.figure(figsize=(15, 15))
sns.heatmap(hotel_df.corr(numeric_only=True), cmap='coolwarm', vmax=1, vmin=-1, annot=True)

hotel_df.isna().mean()
출력:
hotel                             0.000000
is_canceled                       0.000000
lead_time                         0.000000
arrival_date_year                 0.000000
arrival_date_month                0.000000
arrival_date_week_number          0.000000
arrival_date_day_of_month         0.000000
stays_in_weekend_nights           0.000000
stays_in_week_nights              0.000000
adults                            0.000000
children                          0.000034
babies                            0.000000
meal                              0.000000
country                           0.004087
distribution_channel              0.000000
is_repeated_guest                 0.000000
previous_cancellations            0.000000
previous_bookings_not_canceled    0.000000
reserved_room_type                0.000000
assigned_room_type                0.000000
booking_changes                   0.000000
deposit_type                      0.000000
days_in_waiting_list              0.000000
customer_type                     0.000000
adr                               0.000000
required_car_parking_spaces       0.000000
total_of_special_requests         0.000000
dtype: float64


hotel_df = hotel_df.dropna()
hotel_df

hotel_df[hotel_df['adults'] == 0]

# people 파생변수
hotel_df['people'] = hotel_df['adults'] + hotel_df['children'] + hotel_df['babies']
hotel_df.head()

hotel_df[hotel_df['people'] == 0]

hotel_df = hotel_df[hotel_df['people'] != 0]
hotel_df

hotel_df['total_nights'] = hotel_df['stays_in_week_nights'] + hotel_df['stays_in_weekend_nights']
hotel_df.head()

hotel_df[hotel_df['total_nights'] == 0]

# season 파생변수
# arrival_date_month를 참조하여 아래와 같이 생성
# 12, 1, 2: winter
# 3, 4, 5: spring
# 6, 7, 8: summer
# 9, 10, 11: fall
season_dic = {'spring':[3, 4, 5], 'summer':[6, 7, 8], 'fall':[9, 10, 11], 'winter':[12, 1, 2]}

new_season_dic = {}

for i in season_dic:
    for j in season_dic[i]:
        new_season_dic[calendar.month_name[j]] = i

new_season_dic
출력:
{'March': 'spring',
 'April': 'spring',
 'May': 'spring',
 'June': 'summer',
 'July': 'summer',
 'August': 'summer',
 'September': 'fall',
 'October': 'fall',
 'November': 'fall',
 'December': 'winter',
 'January': 'winter',
 'February': 'winter'}

hotel_df['season'] = hotel_df['arrival_date_month'].map(new_season_dic)
hotel_df.head()

hotel_df['season'].value_counts()
출력:
season
summer    37364
spring    32519
fall      28272
winter    20573
Name: count, dtype: int64

hotel_df.info()
출력:
<class 'pandas.core.frame.DataFrame'>
Index: 118728 entries, 0 to 119389
Data columns (total 30 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           118728 non-null  object 
 1   is_canceled                     118728 non-null  int64  
 2   lead_time                       118728 non-null  int64  
 3   arrival_date_year               118728 non-null  int64  
 4   arrival_date_month              118728 non-null  object 
 5   arrival_date_week_number        118728 non-null  int64  
 6   arrival_date_day_of_month       118728 non-null  int64  
 7   stays_in_weekend_nights         118728 non-null  int64  
 8   stays_in_week_nights            118728 non-null  int64  
 9   adults                          118728 non-null  int64  
 10  children                        118728 non-null  float64
 11  babies                          118728 non-null  int64  
 12  meal                            118728 non-null  object 
 13  country                         118728 non-null  object 
 14  distribution_channel            118728 non-null  object 
 15  is_repeated_guest               118728 non-null  int64  
 16  previous_cancellations          118728 non-null  int64  
 17  previous_bookings_not_canceled  118728 non-null  int64  
 18  reserved_room_type              118728 non-null  object 
 19  assigned_room_type              118728 non-null  object 
 20  booking_changes                 118728 non-null  int64  
 21  deposit_type                    118728 non-null  object 
 22  days_in_waiting_list            118728 non-null  int64  
 23  customer_type                   118728 non-null  object 
 24  adr                             118728 non-null  float64
 25  required_car_parking_spaces     118728 non-null  int64  
 26  total_of_special_requests       118728 non-null  int64  
 27  people                          118728 non-null  float64
 28  total_nights                    118728 non-null  int64  
 29  season                          118728 non-null  object 
dtypes: float64(3), int64(17), object(10)
memory usage: 28.1+ MB


hotel_df['expected_room_type'] = (hotel_df['reserved_room_type'] == hotel_df['assigned_room_type']).astype(int)
hotel_df.head()

hotel_df['cancel_rate'] = hotel_df['previous_cancellations'] / (hotel_df['previous_cancellations'] + hotel_df['previous_bookings_not_canceled'])
hotel_df.head()

hotel_df[hotel_df['cancel_rate'].isna()]

hotel_df['cancel_rate'] = hotel_df['cancel_rate'].fillna(-1)
hotel_df.info()
출력:
<class 'pandas.core.frame.DataFrame'>
Index: 118728 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           118728 non-null  object 
 1   is_canceled                     118728 non-null  int64  
 2   lead_time                       118728 non-null  int64  
 3   arrival_date_year               118728 non-null  int64  
 4   arrival_date_month              118728 non-null  object 
 5   arrival_date_week_number        118728 non-null  int64  
 6   arrival_date_day_of_month       118728 non-null  int64  
 7   stays_in_weekend_nights         118728 non-null  int64  
 8   stays_in_week_nights            118728 non-null  int64  
 9   adults                          118728 non-null  int64  
 10  children                        118728 non-null  float64
 11  babies                          118728 non-null  int64  
 12  meal                            118728 non-null  object 
 13  country                         118728 non-null  object 
 14  distribution_channel            118728 non-null  object 
 15  is_repeated_guest               118728 non-null  int64  
 16  previous_cancellations          118728 non-null  int64  
 17  previous_bookings_not_canceled  118728 non-null  int64  
 18  reserved_room_type              118728 non-null  object 
 19  assigned_room_type              118728 non-null  object 
 20  booking_changes                 118728 non-null  int64  
 21  deposit_type                    118728 non-null  object 
 22  days_in_waiting_list            118728 non-null  int64  
 23  customer_type                   118728 non-null  object 
 24  adr                             118728 non-null  float64
 25  required_car_parking_spaces     118728 non-null  int64  
 26  total_of_special_requests       118728 non-null  int64  
 27  people                          118728 non-null  float64
 28  total_nights                    118728 non-null  int64  
 29  season                          118728 non-null  object 
 30  expected_room_type              118728 non-null  int64  
 31  cancel_rate                     118728 non-null  float64
dtypes: float64(4), int64(18), object(10)
memory usage: 29.9+ MB

hotel_df['hotel'].dtype
# dtype('O')

hotel_df['is_canceled'].dtype
# dtype('int64')

hotel_df['children'].dtype
# dtype('float64')

obj_list = []

for i in hotel_df.columns:
    if hotel_df[i].dtype == 'O':
        obj_list.append(i)
obj_list
출력:
['hotel',
 'arrival_date_month',
 'meal',
 'country',
 'distribution_channel',
 'reserved_room_type',
 'assigned_room_type',
 'deposit_type',
 'customer_type',
 'season']

 for i in obj_list:
    print(i, hotel_df[i].nunique())
출력:
hotel 2
arrival_date_month 12
meal 5
country 177
distribution_channel 5
reserved_room_type 9
assigned_room_type 11
deposit_type 3
customer_type 4
season 4

hotel_df.drop(['country', 'arrival_date_month'], axis=1, inplace=True)
obj_list.remove('country')
obj_list.remove('arrival_date_month')
hotel_df = pd.get_dummies(hotel_df, columns=obj_list)
hotel_df.head()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(hotel_df.drop('is_canceled', axis=1), hotel_df['is_canceled'], test_size=0.3, random_state=2024)

X_train.shape, y_train.shape
((83109, 64), (83109,))

X_test.shape, y_test.shape
((35619, 64), (35619,))

2. 앙상블(Ensemble)모델

여러개의 머신러닝 모델을 이용해 최적의 답을 찾아내는 기법을 사용하는 모델
보팅(Voting)
- 서로 다른 알고리즘 model을 조합해서 사용
- 모델에 대해 투표로 결과를 도출
배깅(Bagging)
- 같은 알고리즘 내에서 다른 sample 조합을 사용
- 샘플 중복 생성을 통해 결과를 도출
부스팅(Boosting)
- 약한 학습기들을 순차적으로 학습시켜 강력한 학습기를 만듦
- 이전 오차를 보완해가면서 가중치를 부여
- 성능이 매우 우수하지만 잘못된 레이블이나 아웃라이어에 대해 필요이상으로 민감
- AdaBoost, Gradient Boosting, XGboosting, LightGBM
스태킹(Stacking)
- 다양한 개별 모델들을 조합하여 새로운 모델을 생성
- 다양한 모델들을 학습시켜 예측 결과를 얻은 다음 다양한 모델들의 예측 결과를 입력으로 새로운 메타 모델을 학습습

3. 랜덤 포레스트(Random Forest)

머신러닝에서 많이 사용되는 앙상블 기법 중 하나이며, 결정 나무를 기반으로 함
학습을 통해 구성해 놓은 결정 나무로부터 분류 결과를 취합해서 결론을 얻는 방식
성능은 꽤 우수한 편이나 오버피팅 하는 경향이 있음
랜덤 포레스트의 트리는 원본 데이터에서 무작위로 선택된 샘플을 기반으로 학습함
각 트리가 서로 다른 데이터셋으로 학습되어 다양한 트리가 생성되며 모델의 다양성이 증가함
각각의 트리가 예측한 결과를 기반으로 다수결 또는 평균을 이용하여 최종 예측을 수행함
분류와 회귀 문제에 모두 사용될 수 있으며 특히 데이터가 많고 복잡한 경우에 매우 효과적인 모델

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=2024)
rf.fit(X_train, y_train)

pred1 = rf.predict(X_test)
pred1
출력: array([1, 0, 0, ..., 1, 1, 1])

proba1 = rf.predict_proba(X_test)
proba1
출력:
array([[0.06      , 0.94      ],
       [0.54833333, 0.45166667],
       [0.98      , 0.02      ],
       ...,
       [0.49      , 0.51      ],
       [0.06      , 0.94      ],
       [0.        , 1.        ]])

# 첫번째 테스트 데이터에 대한 예측 결과
proba1[0] # array([0.06, 0.94])

# 모든 테스트 데이터에 대한 호텔 예약을 취소할 확률만 출력
proba1[:, 1]
출력:
array([0.94      , 0.45166667, 0.02      , ..., 0.51      , 0.94      ,
       1.        ])

4. 머신러닝/딥러닝에서 모델의 성능을 평가하는데 사용하는 측정값

Accuracy: 올바른 예측의 비율
Precision: 모델에서 수행한 총 긍정 예측 수에 참 긍정 예측의 비율
Recall: 실제 긍정 사례의 총 수에 대한 참 긍정 예측의 비율
F1 Score: 정밀도와 재현율의 조화 평균이며, 정밀도와 재현율 간의 균형을 맞추기 위한 단일 메트릭으로 사용
AUC-ROC Curve: 참양성률과 가양성률 간의 균형을 측정
- AUC: ROC 커브와 직선 사이의 면적을 의미. 범위는 0.5 ~ 1 이며 값이 클수록 예측의 정확도가 높음
- ROC Curve: 이진 분류의 성능을 측정하는 도구. 민감도와 특이도 사이의 관계
- https://bioinformaticsandme.tistory.com/328

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score
accuracy_score(y_test, pred1)
# 0.8643420646284287

confusion_matrix(y_test, pred1)
출력:
array([[20709,  1659],
       [ 3173, 10078]])

print(classification_report(y_test, pred1))
출력:
              precision    recall  f1-score   support

           0       0.87      0.93      0.90     22368
           1       0.86      0.76      0.81     13251

    accuracy                           0.86     35619
   macro avg       0.86      0.84      0.85     35619
weighted avg       0.86      0.86      0.86     35619

roc_auc_score(y_test, proba1[:, 1]) # 0.9315576511541386

# 하이퍼 파라미터 수정(max_depth=30을 적용)
rf2 = RandomForestClassifier(max_depth=30, random_state=2024)
rf2.fit(X_train, y_train)
proba2 = rf2.predict_proba(X_test)
roc_auc_score(y_test, proba2[:, 1])
# 0.9319781899069026

# 하이퍼 파라미터 수정 후
0.9319781899069026 - 0.9315576511541386

# 0.0004205387527640436

import matplotlib.pyplot as plt
from sklearn.metrics._plot.roc_curve import roc_curve

fpr, tpr, thr = roc_curve(y_test, proba2[:, 1])
print(fpr, tpr, thr)
출력:
[0.00000000e+00 4.47067239e-05 4.47067239e-05 ... 9.30525751e-01
 9.31419886e-01 1.00000000e+00] [0.         0.36465172 0.36517999 ... 0.99894348 0.99894348 1.        ] [2.00000000e+00 1.00000000e+00 9.99767442e-01 ... 6.17283951e-05
 3.15457413e-05 0.00000000e+00]

plt.plot(fpr, tpr, label='ROC Curve')
plt.plot([0, 1], [0, 1])
plt.show()

# 하이퍼 파라미터 추가 수정
# max_depth=30을 적용
# min_samples_split=5을 적용
# n_estimators=70을 적용
rf3 = RandomForestClassifier(min_samples_split=5, random_state=2024,
                             max_depth=30, n_estimators=70)

rf3.fit(X_train, y_train)
proba3 = rf3.predict_proba(X_test)
roc_auc_score(y_test, proba3[:, 1])
# 0.9304821256640659

# 하이퍼 파라미터 수정(max_depth=30을 적용): 0.9319781899069026
0.9304821256640659 - 0.9319781899069026
# -0.0014960642428367699

5. 하이퍼 파라미터 최적의 값 찾기

GridSearchCV: 원하는 모든 하이퍼 파라미터를 적용하여 최적의 값을 찾음
RandomizedSearchCV: 원하는 하이퍼 파라미터를 지정하고 n_iter 값을 설정하여 해당 수 만큼 random하게 조합하여 최적의 값을 찾음

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
params = {
    'max_depth': [30, 40, 50],
    'min_samples_split': [3, 5, 7],
    'n_estimators': [70, 120, 150]
}
rf4 = RandomForestClassifier(random_state=2024)
grid_df = GridSearchCV(rf4, params) # cv: 데이터 교차검증

grid_df.fit(X_train, y_train)

grid_df.best_params_
출력:
{'max_depth': 40, 'min_samples_split': 3, 'n_estimators': 150}


rf5 = RandomForestClassifier(random_state=2024)
rand_df = RandomizedSearchCV(rf5, params, n_iter=4, random_state=2024)
rand_df.fit(X_train, y_train)

rand_df.best_params_
출력:
{'n_estimators': 150, 'min_samples_split': 3, 'max_depth': 50}

6. 피처 중요도(Feature Importances)

결정 나무에서 노드를 분기할 때 해당 피처가 클래스를 나누는데 얼마나 영향을 미쳤는지 표기하는 척도
0에 가까우면 클래스르르 구분하는데 해당 피처의 영향이 거의 없다는 것이며, 1에 가까우면 해당 피처가 클래스를 나누는데 영향을 많이 줬다는 의미

rf6 = RandomForestClassifier(random_state=2024, max_depth=40, min_samples_split=3, n_estimators=150)
rf6.fit(X_train, y_train)
proba6 = rf6.predict_proba(X_test)
roc_auc_score(y_test, proba6[:, 1])
출력:
0.9316574459006468

rf6.feature_importances_
출력:
array([1.27823051e-01, 2.15106897e-02, 6.08530338e-02, 6.56231673e-02,
       2.18457194e-02, 3.26658278e-02, 9.84420098e-03, 5.24362470e-03,
       7.92731535e-04, 1.64528429e-03, 2.29997289e-02, 2.88230691e-03,
       2.02572330e-02, 2.26772762e-03, 9.57062769e-02, 2.07068228e-02,
       5.55073952e-02, 1.23728476e-02, 3.63467496e-02, 2.90236557e-02,
       3.11376863e-02, 6.76495945e-03, 6.07191041e-03, 6.89702660e-03,
       1.37997322e-03, 5.36402653e-03, 4.61283329e-03, 1.08787793e-03,
       2.79929352e-03, 7.58261231e-03, 2.29578249e-04, 1.05126162e-02,
       0.00000000e+00, 5.24637263e-03, 7.64748681e-04, 6.23406908e-04,
       3.49206823e-03, 2.06972837e-03, 1.07386134e-03, 9.79190777e-04,
       3.08828771e-04, 2.58304355e-05, 9.22132027e-03, 1.13887810e-03,
       9.73523759e-04, 4.70736878e-03, 2.39773229e-03, 1.41322456e-03,
       1.04562217e-03, 3.91421067e-04, 6.72654170e-05, 6.61483390e-05,
       8.57559071e-06, 8.96005631e-02, 9.00179693e-02, 5.85978513e-04,
       2.63186282e-03, 3.93763157e-04, 1.53764252e-02, 1.13568115e-02,
       5.18292064e-03, 6.11591793e-03, 6.96445996e-03, 5.39974249e-03])

feature_imp = pd.DataFrame({
    'feature': X_train.columns,
    'importances': rf6.feature_importances_
})
feature_imp

top10 = feature_imp.sort_values('importances', ascending=False).head(10)
top10

plt.figure(figsize=(5, 10))
sns.barplot(x='importances', y='feature', data=top10, palette='Set2')

'코딩 > 머신러닝과 딥러닝' 카테고리의 다른 글

다양한 모델 적용 (0)	2024.07.17
LightGBM (0)	2024.07.17
서포트 벡터 머신 (0)	2024.06.12
의사 결정 나무 (1)	2024.06.11
선형 회귀 (0)	2024.06.11

개발일지

랜덤 포레스트

1. hotel 데이터셋

2. 앙상블(Ensemble)모델

3. 랜덤 포레스트(Random Forest)

4. 머신러닝/딥러닝에서 모델의 성능을 평가하는데 사용하는 측정값

5. 하이퍼 파라미터 최적의 값 찾기

6. 피처 중요도(Feature Importances)

'코딩 > 머신러닝과 딥러닝' 카테고리의 다른 글

티스토리툴바

랜덤 포레스트

1. hotel 데이터셋

2. 앙상블(Ensemble)모델

3. 랜덤 포레스트(Random Forest)

4. 머신러닝/딥러닝에서 모델의 성능을 평가하는데 사용하는 측정값

5. 하이퍼 파라미터 최적의 값 찾기

6. 피처 중요도(Feature Importances)

'코딩 > 머신러닝과 딥러닝' 카테고리의 다른 글

관련글

티스토리툴바