Right Now, Polars | 위니북스

다이아몬드 데이터 분석

1. 데이터 불러오기

# seaborn의 diamonds 데이터셋 로드
diamonds_df = pl.from_pandas(sns.load_dataset('diamonds'))
diamonds_df

# seaborn의 diamonds 데이터셋 로드
diamonds_df = pl.from_pandas(sns.load_dataset('diamonds'))
diamonds_df

2. 데이터 정보 확인

print("데이터 기본 정보:")
print(diamonds_df.glimpse())
print(diamonds_df.describe())

print("데이터 기본 정보:")
print(diamonds_df.glimpse())
print(diamonds_df.describe())

데이터 기본 정보:
Rows: 53940
Columns: 10
$ carat   <f64> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23
$ cut     <cat> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Very Good, Fair, Very Good
$ color   <cat> E, E, E, I, J, J, I, H, E, H
$ clarity <cat> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1
$ depth   <f64> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4
$ table   <f64> 55.0, 61.0, 65.0, 58.0, 58.0, 57.0, 57.0, 55.0, 61.0, 61.0
$ price   <i64> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338
$ x       <f64> 3.95, 3.89, 4.05, 4.2, 4.34, 3.94, 3.95, 4.07, 3.87, 4.0
$ y       <f64> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05
$ z       <f64> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39

None
shape: (9, 11)
┌─────────┬─────────┬───────┬───────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│ statist ┆ carat   ┆ cut   ┆ color ┆ clarit ┆ depth  ┆ table  ┆ price  ┆ x      ┆ y      ┆ z      │
│ ic      ┆ ---     ┆ ---   ┆ ---   ┆ y      ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---    │
│ ---     ┆ f64     ┆ str   ┆ str   ┆ ---    ┆ f64    ┆ f64    ┆ f64    ┆ f64    ┆ f64    ┆ f64    │
│ str     ┆         ┆       ┆       ┆ str    ┆        ┆        ┆        ┆        ┆        ┆        │
╞═════════╪═════════╪═══════╪═══════╪════════╪════════╪════════╪════════╪════════╪════════╪════════╡
│ count   ┆ 53940.0 ┆ 53940 ┆ 53940 ┆ 53940  ┆ 53940. ┆ 53940. ┆ 53940. ┆ 53940. ┆ 53940. ┆ 53940. │
│         ┆         ┆       ┆       ┆        ┆ 0      ┆ 0      ┆ 0      ┆ 0      ┆ 0      ┆ 0      │
│ null_co ┆ 0.0     ┆ 0     ┆ 0     ┆ 0      ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    │
│ unt     ┆         ┆       ┆       ┆        ┆        ┆        ┆        ┆        ┆        ┆        │
│ mean    ┆ 0.79794 ┆ null  ┆ null  ┆ null   ┆ 61.749 ┆ 57.457 ┆ 3932.7 ┆ 5.7311 ┆ 5.7345 ┆ 3.5387 │
│         ┆         ┆       ┆       ┆        ┆ 405    ┆ 184    ┆ 99722  ┆ 57     ┆ 26     ┆ 34     │
│ std     ┆ 0.47401 ┆ null  ┆ null  ┆ null   ┆ 1.4326 ┆ 2.2344 ┆ 3989.4 ┆ 1.1217 ┆ 1.1421 ┆ 0.7056 │
│         ┆ 1       ┆       ┆       ┆        ┆ 21     ┆ 91     ┆ 39738  ┆ 61     ┆ 35     ┆ 99     │
│ min     ┆ 0.2     ┆ null  ┆ null  ┆ null   ┆ 43.0   ┆ 43.0   ┆ 326.0  ┆ 0.0    ┆ 0.0    ┆ 0.0    │
│ 25%     ┆ 0.4     ┆ null  ┆ null  ┆ null   ┆ 61.0   ┆ 56.0   ┆ 950.0  ┆ 4.71   ┆ 4.72   ┆ 2.91   │
│ 50%     ┆ 0.7     ┆ null  ┆ null  ┆ null   ┆ 61.8   ┆ 57.0   ┆ 2401.0 ┆ 5.7    ┆ 5.71   ┆ 3.53   │
│ 75%     ┆ 1.04    ┆ null  ┆ null  ┆ null   ┆ 62.5   ┆ 59.0   ┆ 5324.0 ┆ 6.54   ┆ 6.54   ┆ 4.04   │
│ max     ┆ 5.01    ┆ null  ┆ null  ┆ null   ┆ 79.0   ┆ 95.0   ┆ 18823. ┆ 10.74  ┆ 58.9   ┆ 31.8   │
│         ┆         ┆       ┆       ┆        ┆        ┆        ┆ 0      ┆        ┆        ┆        │
└─────────┴─────────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘

데이터 기본 정보:
Rows: 53940
Columns: 10
$ carat   <f64> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23
$ cut     <cat> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Very Good, Fair, Very Good
$ color   <cat> E, E, E, I, J, J, I, H, E, H
$ clarity <cat> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1
$ depth   <f64> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4
$ table   <f64> 55.0, 61.0, 65.0, 58.0, 58.0, 57.0, 57.0, 55.0, 61.0, 61.0
$ price   <i64> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338
$ x       <f64> 3.95, 3.89, 4.05, 4.2, 4.34, 3.94, 3.95, 4.07, 3.87, 4.0
$ y       <f64> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05
$ z       <f64> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39

None
shape: (9, 11)
┌─────────┬─────────┬───────┬───────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│ statist ┆ carat   ┆ cut   ┆ color ┆ clarit ┆ depth  ┆ table  ┆ price  ┆ x      ┆ y      ┆ z      │
│ ic      ┆ ---     ┆ ---   ┆ ---   ┆ y      ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---    │
│ ---     ┆ f64     ┆ str   ┆ str   ┆ ---    ┆ f64    ┆ f64    ┆ f64    ┆ f64    ┆ f64    ┆ f64    │
│ str     ┆         ┆       ┆       ┆ str    ┆        ┆        ┆        ┆        ┆        ┆        │
╞═════════╪═════════╪═══════╪═══════╪════════╪════════╪════════╪════════╪════════╪════════╪════════╡
│ count   ┆ 53940.0 ┆ 53940 ┆ 53940 ┆ 53940  ┆ 53940. ┆ 53940. ┆ 53940. ┆ 53940. ┆ 53940. ┆ 53940. │
│         ┆         ┆       ┆       ┆        ┆ 0      ┆ 0      ┆ 0      ┆ 0      ┆ 0      ┆ 0      │
│ null_co ┆ 0.0     ┆ 0     ┆ 0     ┆ 0      ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    │
│ unt     ┆         ┆       ┆       ┆        ┆        ┆        ┆        ┆        ┆        ┆        │
│ mean    ┆ 0.79794 ┆ null  ┆ null  ┆ null   ┆ 61.749 ┆ 57.457 ┆ 3932.7 ┆ 5.7311 ┆ 5.7345 ┆ 3.5387 │
│         ┆         ┆       ┆       ┆        ┆ 405    ┆ 184    ┆ 99722  ┆ 57     ┆ 26     ┆ 34     │
│ std     ┆ 0.47401 ┆ null  ┆ null  ┆ null   ┆ 1.4326 ┆ 2.2344 ┆ 3989.4 ┆ 1.1217 ┆ 1.1421 ┆ 0.7056 │
│         ┆ 1       ┆       ┆       ┆        ┆ 21     ┆ 91     ┆ 39738  ┆ 61     ┆ 35     ┆ 99     │
│ min     ┆ 0.2     ┆ null  ┆ null  ┆ null   ┆ 43.0   ┆ 43.0   ┆ 326.0  ┆ 0.0    ┆ 0.0    ┆ 0.0    │
│ 25%     ┆ 0.4     ┆ null  ┆ null  ┆ null   ┆ 61.0   ┆ 56.0   ┆ 950.0  ┆ 4.71   ┆ 4.72   ┆ 2.91   │
│ 50%     ┆ 0.7     ┆ null  ┆ null  ┆ null   ┆ 61.8   ┆ 57.0   ┆ 2401.0 ┆ 5.7    ┆ 5.71   ┆ 3.53   │
│ 75%     ┆ 1.04    ┆ null  ┆ null  ┆ null   ┆ 62.5   ┆ 59.0   ┆ 5324.0 ┆ 6.54   ┆ 6.54   ┆ 4.04   │
│ max     ┆ 5.01    ┆ null  ┆ null  ┆ null   ┆ 79.0   ┆ 95.0   ┆ 18823. ┆ 10.74  ┆ 58.9   ┆ 31.8   │
│         ┆         ┆       ┆       ┆        ┆        ┆        ┆ 0      ┆        ┆        ┆        │
└─────────┴─────────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘

print(diamonds_df.shape)

print(diamonds_df.shape)

(150, 5)

(150, 5)

3. 데이터 전처리

3.1 결측값 확인

print("결측치 개수:")
print(diamonds_df.null_count())

print("결측치 개수:")
print(diamonds_df.null_count())

결측치 개수:
shape: (1, 10)
┌───────┬─────┬───────┬─────────┬───────┬───────┬───────┬─────┬─────┬─────┐
│ carat ┆ cut ┆ color ┆ clarity ┆ depth ┆ table ┆ price ┆ x   ┆ y   ┆ z   │
│ ---   ┆ --- ┆ ---   ┆ ---     ┆ ---   ┆ ---   ┆ ---   ┆ --- ┆ --- ┆ --- │
│ u32   ┆ u32 ┆ u32   ┆ u32     ┆ u32   ┆ u32   ┆ u32   ┆ u32 ┆ u32 ┆ u32 │
╞═══════╪═════╪═══════╪═════════╪═══════╪═══════╪═══════╪═════╪═════╪═════╡
│ 0     ┆ 0   ┆ 0     ┆ 0       ┆ 0     ┆ 0     ┆ 0     ┆ 0   ┆ 0   ┆ 0   │
└───────┴─────┴───────┴─────────┴───────┴───────┴───────┴─────┴─────┴─────┘

결측치 개수:
shape: (1, 10)
┌───────┬─────┬───────┬─────────┬───────┬───────┬───────┬─────┬─────┬─────┐
│ carat ┆ cut ┆ color ┆ clarity ┆ depth ┆ table ┆ price ┆ x   ┆ y   ┆ z   │
│ ---   ┆ --- ┆ ---   ┆ ---     ┆ ---   ┆ ---   ┆ ---   ┆ --- ┆ --- ┆ --- │
│ u32   ┆ u32 ┆ u32   ┆ u32     ┆ u32   ┆ u32   ┆ u32   ┆ u32 ┆ u32 ┆ u32 │
╞═══════╪═════╪═══════╪═════════╪═══════╪═══════╪═══════╪═════╪═════╪═════╡
│ 0     ┆ 0   ┆ 0     ┆ 0       ┆ 0     ┆ 0     ┆ 0     ┆ 0   ┆ 0   ┆ 0   │
└───────┴─────┴───────┴─────────┴───────┴───────┴───────┴─────┴─────┴─────┘

4. 컷팅 등급별 가격 분석

cut_analysis = (
    diamonds_df.group_by('cut')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('price').median().alias('중앙값_가격'),
        pl.col('price').std().alias('가격_표준편차'),
        pl.count().alias('개수')
    ])
    .sort('평균_가격', descending=True)
)
print("컷팅 등급별 가격 분석:")
print(cut_analysis)

cut_analysis = (
    diamonds_df.group_by('cut')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('price').median().alias('중앙값_가격'),
        pl.col('price').std().alias('가격_표준편차'),
        pl.count().alias('개수')
    ])
    .sort('평균_가격', descending=True)
)
print("컷팅 등급별 가격 분석:")
print(cut_analysis)

컷팅 등급별 가격 분석:
shape: (5, 5)
┌───────────┬─────────────┬─────────────┬───────────────┬───────┐
│ cut       ┆ 평균_가격   ┆ 중앙값_가격 ┆ 가격_표준편차 ┆ 개수  │
│ ---       ┆ ---         ┆ ---         ┆ ---           ┆ ---   │
│ cat       ┆ f64         ┆ f64         ┆ f64           ┆ u32   │
╞═══════════╪═════════════╪═════════════╪═══════════════╪═══════╡
│ Premium   ┆ 4584.257704 ┆ 3185.0      ┆ 4349.204961   ┆ 13791 │
│ Fair      ┆ 4358.757764 ┆ 3282.0      ┆ 3560.386612   ┆ 1610  │
│ Very Good ┆ 3981.759891 ┆ 2648.0      ┆ 3935.862161   ┆ 12082 │
│ Good      ┆ 3928.864452 ┆ 3050.5      ┆ 3681.589584   ┆ 4906  │
│ Ideal     ┆ 3457.54197  ┆ 1810.0      ┆ 3808.401172   ┆ 21551 │
└───────────┴─────────────┴─────────────┴───────────────┴───────┘

컷팅 등급별 가격 분석:
shape: (5, 5)
┌───────────┬─────────────┬─────────────┬───────────────┬───────┐
│ cut       ┆ 평균_가격   ┆ 중앙값_가격 ┆ 가격_표준편차 ┆ 개수  │
│ ---       ┆ ---         ┆ ---         ┆ ---           ┆ ---   │
│ cat       ┆ f64         ┆ f64         ┆ f64           ┆ u32   │
╞═══════════╪═════════════╪═════════════╪═══════════════╪═══════╡
│ Premium   ┆ 4584.257704 ┆ 3185.0      ┆ 4349.204961   ┆ 13791 │
│ Fair      ┆ 4358.757764 ┆ 3282.0      ┆ 3560.386612   ┆ 1610  │
│ Very Good ┆ 3981.759891 ┆ 2648.0      ┆ 3935.862161   ┆ 12082 │
│ Good      ┆ 3928.864452 ┆ 3050.5      ┆ 3681.589584   ┆ 4906  │
│ Ideal     ┆ 3457.54197  ┆ 1810.0      ┆ 3808.401172   ┆ 21551 │
└───────────┴─────────────┴─────────────┴───────────────┴───────┘

# 그래프 크기 설정
plt.figure(figsize=(12, 6))
 
# 데이터 준비
x = np.arange(len(cut_analysis))
width = 0.35
 
# numpy 배열로 변환
avg_prices = cut_analysis['평균_가격'].to_numpy()
median_prices = cut_analysis['중앙값_가격'].to_numpy()
counts = cut_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격과 중앙값 가격
bars1 = ax1.bar(x - width/2, avg_prices,
                width, label='평균 가격', color='lightblue')
bars2 = ax1.bar(x + width/2, median_prices,
                width, label='중앙값 가격', color='lightgreen')
 
# 선 그래프: 개수
line = ax2.plot(x, counts, 'r-o',
                label='다이아몬드 개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('다이아몬드 컷팅 등급별 가격 분석', pad=20, size=15)
ax1.set_xlabel('컷팅 등급')
ax1.set_ylabel('가격 ($)')
ax2.set_ylabel('개수')
 
# x축 레이블 설정
plt.xticks(x, cut_analysis['cut'])
 
# 데이터 레이블 추가
for i in x:
    # 평균 가격
    ax1.text(i - width/2, avg_prices[i],
             f'${avg_prices[i]:,.0f}',
             ha='center', va='bottom', rotation=90)
 
    # 중앙값 가격
    ax1.text(i + width/2, median_prices[i],
             f'${median_prices[i]:,.0f}',
             ha='center', va='bottom', rotation=90)
 
    # 개수
    ax2.text(i, counts[i],
             f'{counts[i]:,}개',
             ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(12, 6))
 
# 데이터 준비
x = np.arange(len(cut_analysis))
width = 0.35
 
# numpy 배열로 변환
avg_prices = cut_analysis['평균_가격'].to_numpy()
median_prices = cut_analysis['중앙값_가격'].to_numpy()
counts = cut_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격과 중앙값 가격
bars1 = ax1.bar(x - width/2, avg_prices,
                width, label='평균 가격', color='lightblue')
bars2 = ax1.bar(x + width/2, median_prices,
                width, label='중앙값 가격', color='lightgreen')
 
# 선 그래프: 개수
line = ax2.plot(x, counts, 'r-o',
                label='다이아몬드 개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('다이아몬드 컷팅 등급별 가격 분석', pad=20, size=15)
ax1.set_xlabel('컷팅 등급')
ax1.set_ylabel('가격 ($)')
ax2.set_ylabel('개수')
 
# x축 레이블 설정
plt.xticks(x, cut_analysis['cut'])
 
# 데이터 레이블 추가
for i in x:
    # 평균 가격
    ax1.text(i - width/2, avg_prices[i],
             f'${avg_prices[i]:,.0f}',
             ha='center', va='bottom', rotation=90)
 
    # 중앙값 가격
    ax1.text(i + width/2, median_prices[i],
             f'${median_prices[i]:,.0f}',
             ha='center', va='bottom', rotation=90)
 
    # 개수
    ax2.text(i, counts[i],
             f'{counts[i]:,}개',
             ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

5. 캐럿별 가격 분석

carat_analysis = (
    diamonds_df.with_columns([
        pl.col('carat')
        .cut(breaks=[0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 5],
             labels=['0', '0-0.5', '0.5-1', '1-1.5', '1.5-2', '2-2.5', '2.5-3', '3-3.5', '3.5-4', '4-5', '5+'])
        .alias('carat_group')
    ])
    .group_by('carat_group')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('price').median().alias('중앙값_가격'),
        pl.count().alias('개수')
    ])
    .sort('carat_group')
)
print("캐럿 구간별 가격 분석:")
print(carat_analysis)

carat_analysis = (
    diamonds_df.with_columns([
        pl.col('carat')
        .cut(breaks=[0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 5],
             labels=['0', '0-0.5', '0.5-1', '1-1.5', '1.5-2', '2-2.5', '2.5-3', '3-3.5', '3.5-4', '4-5', '5+'])
        .alias('carat_group')
    ])
    .group_by('carat_group')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('price').median().alias('중앙값_가격'),
        pl.count().alias('개수')
    ])
    .sort('carat_group')
)
print("캐럿 구간별 가격 분석:")
print(carat_analysis)

캐럿 구간별 가격 분석:
shape: (10, 4)
┌─────────────┬──────────────┬─────────────┬───────┐
│ carat_group ┆ 평균_가격    ┆ 중앙값_가격 ┆ 개수  │
│ ---         ┆ ---          ┆ ---         ┆ ---   │
│ cat         ┆ f64          ┆ f64         ┆ u32   │
╞═════════════╪══════════════╪═════════════╪═══════╡
│ 0-0.5       ┆ 839.718149   ┆ 788.0       ┆ 18932 │
│ 0.5-1       ┆ 2811.342683  ┆ 2528.0      ┆ 17506 │
│ 1-1.5       ┆ 6513.526534  ┆ 5846.0      ┆ 12060 │
│ 1.5-2       ┆ 11321.774838 ┆ 11040.0     ┆ 3553  │
│ 2-2.5       ┆ 14918.141237 ┆ 15320.0     ┆ 1763  │
│ 2.5-3       ┆ 15472.904255 ┆ 16390.0     ┆ 94    │
│ 3-3.5       ┆ 14822.0      ┆ 15964.0     ┆ 23    │
│ 3.5-4       ┆ 15636.5      ┆ 16088.5     ┆ 4     │
│ 4-5         ┆ 16576.5      ┆ 16276.0     ┆ 4     │
│ 5+          ┆ 18018.0      ┆ 18018.0     ┆ 1     │
└─────────────┴──────────────┴─────────────┴───────┘

캐럿 구간별 가격 분석:
shape: (10, 4)
┌─────────────┬──────────────┬─────────────┬───────┐
│ carat_group ┆ 평균_가격    ┆ 중앙값_가격 ┆ 개수  │
│ ---         ┆ ---          ┆ ---         ┆ ---   │
│ cat         ┆ f64          ┆ f64         ┆ u32   │
╞═════════════╪══════════════╪═════════════╪═══════╡
│ 0-0.5       ┆ 839.718149   ┆ 788.0       ┆ 18932 │
│ 0.5-1       ┆ 2811.342683  ┆ 2528.0      ┆ 17506 │
│ 1-1.5       ┆ 6513.526534  ┆ 5846.0      ┆ 12060 │
│ 1.5-2       ┆ 11321.774838 ┆ 11040.0     ┆ 3553  │
│ 2-2.5       ┆ 14918.141237 ┆ 15320.0     ┆ 1763  │
│ 2.5-3       ┆ 15472.904255 ┆ 16390.0     ┆ 94    │
│ 3-3.5       ┆ 14822.0      ┆ 15964.0     ┆ 23    │
│ 3.5-4       ┆ 15636.5      ┆ 16088.5     ┆ 4     │
│ 4-5         ┆ 16576.5      ┆ 16276.0     ┆ 4     │
│ 5+          ┆ 18018.0      ┆ 18018.0     ┆ 1     │
└─────────────┴──────────────┴─────────────┴───────┘

# 그래프 크기 설정
plt.figure(figsize=(12, 6))
 
# 데이터 준비
x = np.arange(len(carat_analysis))
width = 0.35
 
# numpy 배열로 변환
avg_prices = carat_analysis['평균_가격'].to_numpy()
median_prices = carat_analysis['중앙값_가격'].to_numpy()
counts = carat_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격과 중앙값 가격
bars1 = ax1.bar(x - width/2, avg_prices,
               width, label='평균 가격', color='lightblue')
bars2 = ax1.bar(x + width/2, median_prices,
               width, label='중앙값 가격', color='lightgreen')
 
# 선 그래프: 개수 (로그 스케일)
ax2.set_yscale('log')
line = ax2.plot(x, counts, 'r-o', label='다이아몬드 개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('캐럿 구간별 가격과 개수 분석', pad=20, size=15)
ax1.set_xlabel('캐럿 구간')
ax1.set_ylabel('가격 ($)')
ax2.set_ylabel('개수 (로그 스케일)')
 
# x축 레이블 설정
plt.xticks(x, carat_analysis['carat_group'], rotation=45)
 
# 데이터 레이블 추가
for i in x:
   # 평균 가격
   ax1.text(i - width/2, avg_prices[i],
            f'${avg_prices[i]:,.0f}',
            ha='center', va='bottom', rotation=90)
 
   # 중앙값 가격
   ax1.text(i + width/2, median_prices[i],
            f'${median_prices[i]:,.0f}',
            ha='center', va='bottom', rotation=90)
 
   # 개수
   ax2.text(i, counts[i],
            f'{counts[i]:,}개',
            ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left')
 
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(12, 6))
 
# 데이터 준비
x = np.arange(len(carat_analysis))
width = 0.35
 
# numpy 배열로 변환
avg_prices = carat_analysis['평균_가격'].to_numpy()
median_prices = carat_analysis['중앙값_가격'].to_numpy()
counts = carat_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격과 중앙값 가격
bars1 = ax1.bar(x - width/2, avg_prices,
               width, label='평균 가격', color='lightblue')
bars2 = ax1.bar(x + width/2, median_prices,
               width, label='중앙값 가격', color='lightgreen')
 
# 선 그래프: 개수 (로그 스케일)
ax2.set_yscale('log')
line = ax2.plot(x, counts, 'r-o', label='다이아몬드 개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('캐럿 구간별 가격과 개수 분석', pad=20, size=15)
ax1.set_xlabel('캐럿 구간')
ax1.set_ylabel('가격 ($)')
ax2.set_ylabel('개수 (로그 스케일)')
 
# x축 레이블 설정
plt.xticks(x, carat_analysis['carat_group'], rotation=45)
 
# 데이터 레이블 추가
for i in x:
   # 평균 가격
   ax1.text(i - width/2, avg_prices[i],
            f'${avg_prices[i]:,.0f}',
            ha='center', va='bottom', rotation=90)
 
   # 중앙값 가격
   ax1.text(i + width/2, median_prices[i],
            f'${median_prices[i]:,.0f}',
            ha='center', va='bottom', rotation=90)
 
   # 개수
   ax2.text(i, counts[i],
            f'{counts[i]:,}개',
            ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left')
 
plt.tight_layout()
plt.show()

6. 색상별 분석

color_analysis = (
    diamonds_df.group_by('color')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.col('depth').mean().alias('평균_깊이'),
        pl.col('table').mean().alias('평균_테이블'),
        pl.count().alias('개수')
    ])
    .sort('color')
)
print("색상별 분석:")
print(color_analysis)

color_analysis = (
    diamonds_df.group_by('color')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.col('depth').mean().alias('평균_깊이'),
        pl.col('table').mean().alias('평균_테이블'),
        pl.count().alias('개수')
    ])
    .sort('color')
)
print("색상별 분석:")
print(color_analysis)

색상별 분석:
shape: (7, 6)
┌───────┬─────────────┬───────────┬───────────┬─────────────┬───────┐
│ color ┆ 평균_가격   ┆ 평균_캐럿 ┆ 평균_깊이 ┆ 평균_테이블 ┆ 개수  │
│ ---   ┆ ---         ┆ ---       ┆ ---       ┆ ---         ┆ ---   │
│ cat   ┆ f64         ┆ f64       ┆ f64       ┆ f64         ┆ u32   │
╞═══════╪═════════════╪═══════════╪═══════════╪═════════════╪═══════╡
│ D     ┆ 3169.954096 ┆ 0.657795  ┆ 61.698125 ┆ 57.40459    ┆ 6775  │
│ E     ┆ 3076.752475 ┆ 0.657867  ┆ 61.66209  ┆ 57.491201   ┆ 9797  │
│ F     ┆ 3724.886397 ┆ 0.736538  ┆ 61.694582 ┆ 57.433536   ┆ 9542  │
│ G     ┆ 3999.135671 ┆ 0.77119   ┆ 61.757111 ┆ 57.288629   ┆ 11292 │
│ H     ┆ 4486.669196 ┆ 0.911799  ┆ 61.83685  ┆ 57.517811   ┆ 8304  │
│ I     ┆ 5091.874954 ┆ 1.026927  ┆ 61.846385 ┆ 57.577278   ┆ 5422  │
│ J     ┆ 5323.81802  ┆ 1.162137  ┆ 61.887215 ┆ 57.812393   ┆ 2808  │
└───────┴─────────────┴───────────┴───────────┴─────────────┴───────┘

색상별 분석:
shape: (7, 6)
┌───────┬─────────────┬───────────┬───────────┬─────────────┬───────┐
│ color ┆ 평균_가격   ┆ 평균_캐럿 ┆ 평균_깊이 ┆ 평균_테이블 ┆ 개수  │
│ ---   ┆ ---         ┆ ---       ┆ ---       ┆ ---         ┆ ---   │
│ cat   ┆ f64         ┆ f64       ┆ f64       ┆ f64         ┆ u32   │
╞═══════╪═════════════╪═══════════╪═══════════╪═════════════╪═══════╡
│ D     ┆ 3169.954096 ┆ 0.657795  ┆ 61.698125 ┆ 57.40459    ┆ 6775  │
│ E     ┆ 3076.752475 ┆ 0.657867  ┆ 61.66209  ┆ 57.491201   ┆ 9797  │
│ F     ┆ 3724.886397 ┆ 0.736538  ┆ 61.694582 ┆ 57.433536   ┆ 9542  │
│ G     ┆ 3999.135671 ┆ 0.77119   ┆ 61.757111 ┆ 57.288629   ┆ 11292 │
│ H     ┆ 4486.669196 ┆ 0.911799  ┆ 61.83685  ┆ 57.517811   ┆ 8304  │
│ I     ┆ 5091.874954 ┆ 1.026927  ┆ 61.846385 ┆ 57.577278   ┆ 5422  │
│ J     ┆ 5323.81802  ┆ 1.162137  ┆ 61.887215 ┆ 57.812393   ┆ 2808  │
└───────┴─────────────┴───────────┴───────────┴─────────────┴───────┘

# 그래프 크기 설정
plt.figure(figsize=(15, 10))
 
# 2x2 서브플롯 생성
# 1. 가격과 개수
plt.subplot(2, 2, 1)
x = np.arange(len(color_analysis))
width = 0.35
 
# numpy 배열로 변환
prices = color_analysis['평균_가격'].to_numpy()
counts = color_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격
bars = ax1.bar(x, prices, width, color='lightblue', label='평균 가격')
line = ax2.plot(x, counts, 'r-o', label='개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('색상별 평균 가격과 개수')
ax1.set_xlabel('색상')
ax1.set_ylabel('평균 가격 ($)')
ax2.set_ylabel('개수')
plt.xticks(x, color_analysis['color'])
 
# 데이터 레이블
for i, v in enumerate(prices):
   ax1.text(i, v, f'${v:,.0f}', ha='center', va='bottom')
for i, v in enumerate(counts):
   ax2.text(i, v, f'{v:,}', ha='center', va='bottom')
 
# 범례
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left')
 
# 2. 평균 캐럿
plt.subplot(2, 2, 2)
plt.bar(color_analysis['color'], color_analysis['평균_캐럿'], color='lightgreen')
plt.title('색상별 평균 캐럿')
plt.xlabel('색상')
plt.ylabel('평균 캐럿')
for i, v in enumerate(color_analysis['평균_캐럿']):
   plt.text(i, v, f'{v:.2f}', ha='center', va='bottom')
 
# 3. 평균 깊이
plt.subplot(2, 2, 3)
plt.bar(color_analysis['color'], color_analysis['평균_깊이'], color='lightcoral')
plt.title('색상별 평균 깊이')
plt.xlabel('색상')
plt.ylabel('평균 깊이')
for i, v in enumerate(color_analysis['평균_깊이']):
   plt.text(i, v, f'{v:.2f}', ha='center', va='bottom')
 
# 4. 평균 테이블
plt.subplot(2, 2, 4)
plt.bar(color_analysis['color'], color_analysis['평균_테이블'], color='lightskyblue')
plt.title('색상별 평균 테이블')
plt.xlabel('색상')
plt.ylabel('평균 테이블')
for i, v in enumerate(color_analysis['평균_테이블']):
   plt.text(i, v, f'{v:.2f}', ha='center', va='bottom')
 
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(15, 10))
 
# 2x2 서브플롯 생성
# 1. 가격과 개수
plt.subplot(2, 2, 1)
x = np.arange(len(color_analysis))
width = 0.35
 
# numpy 배열로 변환
prices = color_analysis['평균_가격'].to_numpy()
counts = color_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격
bars = ax1.bar(x, prices, width, color='lightblue', label='평균 가격')
line = ax2.plot(x, counts, 'r-o', label='개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('색상별 평균 가격과 개수')
ax1.set_xlabel('색상')
ax1.set_ylabel('평균 가격 ($)')
ax2.set_ylabel('개수')
plt.xticks(x, color_analysis['color'])
 
# 데이터 레이블
for i, v in enumerate(prices):
   ax1.text(i, v, f'${v:,.0f}', ha='center', va='bottom')
for i, v in enumerate(counts):
   ax2.text(i, v, f'{v:,}', ha='center', va='bottom')
 
# 범례
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left')
 
# 2. 평균 캐럿
plt.subplot(2, 2, 2)
plt.bar(color_analysis['color'], color_analysis['평균_캐럿'], color='lightgreen')
plt.title('색상별 평균 캐럿')
plt.xlabel('색상')
plt.ylabel('평균 캐럿')
for i, v in enumerate(color_analysis['평균_캐럿']):
   plt.text(i, v, f'{v:.2f}', ha='center', va='bottom')
 
# 3. 평균 깊이
plt.subplot(2, 2, 3)
plt.bar(color_analysis['color'], color_analysis['평균_깊이'], color='lightcoral')
plt.title('색상별 평균 깊이')
plt.xlabel('색상')
plt.ylabel('평균 깊이')
for i, v in enumerate(color_analysis['평균_깊이']):
   plt.text(i, v, f'{v:.2f}', ha='center', va='bottom')
 
# 4. 평균 테이블
plt.subplot(2, 2, 4)
plt.bar(color_analysis['color'], color_analysis['평균_테이블'], color='lightskyblue')
plt.title('색상별 평균 테이블')
plt.xlabel('색상')
plt.ylabel('평균 테이블')
for i, v in enumerate(color_analysis['평균_테이블']):
   plt.text(i, v, f'{v:.2f}', ha='center', va='bottom')
 
plt.tight_layout()
plt.show()

7. 선명도별 분석

clarity_analysis = (
    diamonds_df.group_by('clarity')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.count().alias('개수')
    ])
    .sort('평균_가격', descending=True)
)
print("선명도별 분석:")
print(clarity_analysis)

clarity_analysis = (
    diamonds_df.group_by('clarity')
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.count().alias('개수')
    ])
    .sort('평균_가격', descending=True)
)
print("선명도별 분석:")
print(clarity_analysis)

선명도별 분석:
shape: (8, 4)
┌─────────┬─────────────┬───────────┬───────┐
│ clarity ┆ 평균_가격   ┆ 평균_캐럿 ┆ 개수  │
│ ---     ┆ ---         ┆ ---       ┆ ---   │
│ cat     ┆ f64         ┆ f64       ┆ u32   │
╞═════════╪═════════════╪═══════════╪═══════╡
│ SI2     ┆ 5063.028606 ┆ 1.077648  ┆ 9194  │
│ SI1     ┆ 3996.001148 ┆ 0.850482  ┆ 13065 │
│ VS2     ┆ 3924.989395 ┆ 0.763935  ┆ 12258 │
│ I1      ┆ 3924.168691 ┆ 1.283846  ┆ 741   │
│ VS1     ┆ 3839.455391 ┆ 0.727158  ┆ 8171  │
│ VVS2    ┆ 3283.737071 ┆ 0.596202  ┆ 5066  │
│ IF      ┆ 2864.839106 ┆ 0.505123  ┆ 1790  │
│ VVS1    ┆ 2523.114637 ┆ 0.503321  ┆ 3655  │
└─────────┴─────────────┴───────────┴───────┘

선명도별 분석:
shape: (8, 4)
┌─────────┬─────────────┬───────────┬───────┐
│ clarity ┆ 평균_가격   ┆ 평균_캐럿 ┆ 개수  │
│ ---     ┆ ---         ┆ ---       ┆ ---   │
│ cat     ┆ f64         ┆ f64       ┆ u32   │
╞═════════╪═════════════╪═══════════╪═══════╡
│ SI2     ┆ 5063.028606 ┆ 1.077648  ┆ 9194  │
│ SI1     ┆ 3996.001148 ┆ 0.850482  ┆ 13065 │
│ VS2     ┆ 3924.989395 ┆ 0.763935  ┆ 12258 │
│ I1      ┆ 3924.168691 ┆ 1.283846  ┆ 741   │
│ VS1     ┆ 3839.455391 ┆ 0.727158  ┆ 8171  │
│ VVS2    ┆ 3283.737071 ┆ 0.596202  ┆ 5066  │
│ IF      ┆ 2864.839106 ┆ 0.505123  ┆ 1790  │
│ VVS1    ┆ 2523.114637 ┆ 0.503321  ┆ 3655  │
└─────────┴─────────────┴───────────┴───────┘

# 그래프 크기 설정
plt.figure(figsize=(12, 6))
 
# 데이터 준비
x = np.arange(len(clarity_analysis))
width = 0.35
 
# numpy 배열로 변환
avg_prices = clarity_analysis['평균_가격'].to_numpy()
avg_carats = clarity_analysis['평균_캐럿'].to_numpy()
counts = clarity_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격과 평균 캐럿
bars1 = ax1.bar(x - width/2, avg_prices,
               width, label='평균 가격($)', color='lightblue')
bars2 = ax1.bar(x + width/2, avg_carats * 1000, # 캐럿값을 1000배 스케일링하여 가격과 비교 가능하게
               width, label='평균 캐럿(×1000)', color='lightgreen')
 
# 선 그래프: 개수
line = ax2.plot(x, counts, 'r-o', label='개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('다이아몬드 선명도별 가격, 캐럿, 개수 분석', pad=20, size=15)
ax1.set_xlabel('선명도')
ax1.set_ylabel('가격($) / 캐럿×1000')
ax2.set_ylabel('개수')
 
# x축 레이블 설정
plt.xticks(x, clarity_analysis['clarity'])
 
# 데이터 레이블 추가
for i in x:
   # 평균 가격
   ax1.text(i - width/2, avg_prices[i],
            f'${avg_prices[i]:,.0f}',
            ha='center', va='bottom', rotation=90)
 
   # 평균 캐럿
   ax1.text(i + width/2, avg_carats[i] * 1000,
            f'{avg_carats[i]:.2f}ct',
            ha='center', va='bottom', rotation=90)
 
   # 개수
   ax2.text(i, counts[i],
            f'{counts[i]:,}개',
            ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(12, 6))
 
# 데이터 준비
x = np.arange(len(clarity_analysis))
width = 0.35
 
# numpy 배열로 변환
avg_prices = clarity_analysis['평균_가격'].to_numpy()
avg_carats = clarity_analysis['평균_캐럿'].to_numpy()
counts = clarity_analysis['개수'].to_numpy()
 
# 주 축 설정
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 막대 그래프: 평균 가격과 평균 캐럿
bars1 = ax1.bar(x - width/2, avg_prices,
               width, label='평균 가격($)', color='lightblue')
bars2 = ax1.bar(x + width/2, avg_carats * 1000, # 캐럿값을 1000배 스케일링하여 가격과 비교 가능하게
               width, label='평균 캐럿(×1000)', color='lightgreen')
 
# 선 그래프: 개수
line = ax2.plot(x, counts, 'r-o', label='개수', linewidth=2)
 
# 그래프 꾸미기
plt.title('다이아몬드 선명도별 가격, 캐럿, 개수 분석', pad=20, size=15)
ax1.set_xlabel('선명도')
ax1.set_ylabel('가격($) / 캐럿×1000')
ax2.set_ylabel('개수')
 
# x축 레이블 설정
plt.xticks(x, clarity_analysis['clarity'])
 
# 데이터 레이블 추가
for i in x:
   # 평균 가격
   ax1.text(i - width/2, avg_prices[i],
            f'${avg_prices[i]:,.0f}',
            ha='center', va='bottom', rotation=90)
 
   # 평균 캐럿
   ax1.text(i + width/2, avg_carats[i] * 1000,
            f'{avg_carats[i]:.2f}ct',
            ha='center', va='bottom', rotation=90)
 
   # 개수
   ax2.text(i, counts[i],
            f'{counts[i]:,}개',
            ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

8. 가격대별 분포

price_distribution = (
    diamonds_df.with_columns([
        pl.col('price')
        .cut(breaks=[0, 1000, 2000, 3000, 4000, 5000, 10000, 20000],
             labels=['0', '0-1k', '1k-2k', '2k-3k', '3k-4k', '4k-5k', '5k-10k', '10k-20k', '20k+'])
        .alias('price_range')
    ])
    .group_by('price_range')
    .agg([
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.col('cut').mode().alias('주요_컷팅'),
        pl.col('color').mode().alias('주요_색상'),
        pl.count().alias('개수')
    ])
    .sort('price_range')
)
print("가격대별 분포:")
print(price_distribution)

price_distribution = (
    diamonds_df.with_columns([
        pl.col('price')
        .cut(breaks=[0, 1000, 2000, 3000, 4000, 5000, 10000, 20000],
             labels=['0', '0-1k', '1k-2k', '2k-3k', '3k-4k', '4k-5k', '5k-10k', '10k-20k', '20k+'])
        .alias('price_range')
    ])
    .group_by('price_range')
    .agg([
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.col('cut').mode().alias('주요_컷팅'),
        pl.col('color').mode().alias('주요_색상'),
        pl.count().alias('개수')
    ])
    .sort('price_range')
)
print("가격대별 분포:")
print(price_distribution)

가격대별 분포:
shape: (7, 5)
┌─────────────┬───────────┬─────────────┬───────────┬───────┐
│ price_range ┆ 평균_캐럿 ┆ 주요_컷팅   ┆ 주요_색상 ┆ 개수  │
│ ---         ┆ ---       ┆ ---         ┆ ---       ┆ ---   │
│ cat         ┆ f64       ┆ list[cat]   ┆ list[cat] ┆ u32   │
╞═════════════╪═══════════╪═════════════╪═══════════╪═══════╡
│ 0-1k        ┆ 0.334937  ┆ ["Ideal"]   ┆ ["G"]     ┆ 14524 │
│ 1k-2k       ┆ 0.500942  ┆ ["Ideal"]   ┆ ["E"]     ┆ 9683  │
│ 2k-3k       ┆ 0.706221  ┆ ["Ideal"]   ┆ ["F"]     ┆ 6129  │
│ 3k-4k       ┆ 0.871524  ┆ ["Ideal"]   ┆ ["F"]     ┆ 4225  │
│ 4k-5k       ┆ 1.017509  ┆ ["Premium"] ┆ ["H"]     ┆ 4665  │
│ 5k-10k      ┆ 1.209044  ┆ ["Ideal"]   ┆ ["G"]     ┆ 9492  │
│ 10k-20k     ┆ 1.741111  ┆ ["Premium"] ┆ ["G"]     ┆ 5222  │
└─────────────┴───────────┴─────────────┴───────────┴───────┘

가격대별 분포:
shape: (7, 5)
┌─────────────┬───────────┬─────────────┬───────────┬───────┐
│ price_range ┆ 평균_캐럿 ┆ 주요_컷팅   ┆ 주요_색상 ┆ 개수  │
│ ---         ┆ ---       ┆ ---         ┆ ---       ┆ ---   │
│ cat         ┆ f64       ┆ list[cat]   ┆ list[cat] ┆ u32   │
╞═════════════╪═══════════╪═════════════╪═══════════╪═══════╡
│ 0-1k        ┆ 0.334937  ┆ ["Ideal"]   ┆ ["G"]     ┆ 14524 │
│ 1k-2k       ┆ 0.500942  ┆ ["Ideal"]   ┆ ["E"]     ┆ 9683  │
│ 2k-3k       ┆ 0.706221  ┆ ["Ideal"]   ┆ ["F"]     ┆ 6129  │
│ 3k-4k       ┆ 0.871524  ┆ ["Ideal"]   ┆ ["F"]     ┆ 4225  │
│ 4k-5k       ┆ 1.017509  ┆ ["Premium"] ┆ ["H"]     ┆ 4665  │
│ 5k-10k      ┆ 1.209044  ┆ ["Ideal"]   ┆ ["G"]     ┆ 9492  │
│ 10k-20k     ┆ 1.741111  ┆ ["Premium"] ┆ ["G"]     ┆ 5222  │
└─────────────┴───────────┴─────────────┴───────────┴───────┘

# 그래프 크기 설정
plt.figure(figsize=(15, 6))
 
# 막대 그래프와 선 그래프를 결합할 주축과 보조축 생성
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 데이터 준비
x = np.arange(len(price_distribution))
width = 0.35
 
# numpy 배열로 변환
carats = price_distribution['평균_캐럿'].to_numpy()
counts = price_distribution['개수'].to_numpy()
 
# 막대 그래프: 개수
bars = ax1.bar(x - width/2, counts, width,
              label='다이아몬드 개수', color='lightblue')
 
# 선 그래프: 평균 캐럿
line = ax2.plot(x, carats, 'ro-', linewidth=2, label='평균 캐럿')
 
# 주요 컷팅과 색상 정보 추출
cuts = [c[0] for c in price_distribution['주요_컷팅']]
colors = [c[0] for c in price_distribution['주요_색상']]
 
# 그래프 꾸미기
plt.title('가격대별 분포 분석', pad=20, size=15)
ax1.set_xlabel('가격 범위')
ax1.set_ylabel('개수')
ax2.set_ylabel('평균 캐럿')
 
# x축 레이블 설정 및 회전
plt.xticks(x, price_distribution['price_range'], rotation=45)
 
# 데이터 레이블 추가
for i in x:
   # 개수
   ax1.text(i - width/2, counts[i],
            f'{counts[i]:,}개\n({cuts[i]},{colors[i]})',
            ha='center', va='bottom')
 
   # 평균 캐럿
   ax2.text(i, carats[i],
            f'{carats[i]:.2f}ct',
            ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(15, 6))
 
# 막대 그래프와 선 그래프를 결합할 주축과 보조축 생성
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 데이터 준비
x = np.arange(len(price_distribution))
width = 0.35
 
# numpy 배열로 변환
carats = price_distribution['평균_캐럿'].to_numpy()
counts = price_distribution['개수'].to_numpy()
 
# 막대 그래프: 개수
bars = ax1.bar(x - width/2, counts, width,
              label='다이아몬드 개수', color='lightblue')
 
# 선 그래프: 평균 캐럿
line = ax2.plot(x, carats, 'ro-', linewidth=2, label='평균 캐럿')
 
# 주요 컷팅과 색상 정보 추출
cuts = [c[0] for c in price_distribution['주요_컷팅']]
colors = [c[0] for c in price_distribution['주요_색상']]
 
# 그래프 꾸미기
plt.title('가격대별 분포 분석', pad=20, size=15)
ax1.set_xlabel('가격 범위')
ax1.set_ylabel('개수')
ax2.set_ylabel('평균 캐럿')
 
# x축 레이블 설정 및 회전
plt.xticks(x, price_distribution['price_range'], rotation=45)
 
# 데이터 레이블 추가
for i in x:
   # 개수
   ax1.text(i - width/2, counts[i],
            f'{counts[i]:,}개\n({cuts[i]},{colors[i]})',
            ha='center', va='bottom')
 
   # 평균 캐럿
   ax2.text(i, carats[i],
            f'{carats[i]:.2f}ct',
            ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

9. 상관관계 분석

correlations = diamonds_df.select([
    pl.corr('price', 'carat').alias('가격_캐럿_상관계수'),
    pl.corr('price', 'depth').alias('가격_깊이_상관계수'),
    pl.corr('price', 'table').alias('가격_테이블_상관계수'),
    pl.corr('carat', 'depth').alias('캐럿_깊이_상관계수')
])
print("특성간 상관관계:")
print(correlations)

correlations = diamonds_df.select([
    pl.corr('price', 'carat').alias('가격_캐럿_상관계수'),
    pl.corr('price', 'depth').alias('가격_깊이_상관계수'),
    pl.corr('price', 'table').alias('가격_테이블_상관계수'),
    pl.corr('carat', 'depth').alias('캐럿_깊이_상관계수')
])
print("특성간 상관관계:")
print(correlations)

특성간 상관관계:
shape: (1, 4)
┌────────────────────┬────────────────────┬──────────────────────┬────────────────────┐
│ 가격_캐럿_상관계수 ┆ 가격_깊이_상관계수 ┆ 가격_테이블_상관계수 ┆ 캐럿_깊이_상관계수 │
│ ---                ┆ ---                ┆ ---                  ┆ ---                │
│ f64                ┆ f64                ┆ f64                  ┆ f64                │
╞════════════════════╪════════════════════╪══════════════════════╪════════════════════╡
│ 0.921591           ┆ -0.010647          ┆ 0.127134             ┆ 0.028224           │
└────────────────────┴────────────────────┴──────────────────────┴────────────────────┘

특성간 상관관계:
shape: (1, 4)
┌────────────────────┬────────────────────┬──────────────────────┬────────────────────┐
│ 가격_캐럿_상관계수 ┆ 가격_깊이_상관계수 ┆ 가격_테이블_상관계수 ┆ 캐럿_깊이_상관계수 │
│ ---                ┆ ---                ┆ ---                  ┆ ---                │
│ f64                ┆ f64                ┆ f64                  ┆ f64                │
╞════════════════════╪════════════════════╪══════════════════════╪════════════════════╡
│ 0.921591           ┆ -0.010647          ┆ 0.127134             ┆ 0.028224           │
└────────────────────┴────────────────────┴──────────────────────┴────────────────────┘

# 그래프 크기 설정
plt.figure(figsize=(10, 8))
 
# 상관계수 행렬 생성
features = ['가격', '캐럿', '깊이', '테이블']
corr_matrix = np.array([
   [1.0, correlations['가격_캐럿_상관계수'][0],
    correlations['가격_깊이_상관계수'][0],
    correlations['가격_테이블_상관계수'][0]],
   [correlations['가격_캐럿_상관계수'][0], 1.0,
    correlations['캐럿_깊이_상관계수'][0], 0],
   [correlations['가격_깊이_상관계수'][0],
    correlations['캐럿_깊이_상관계수'][0], 1.0, 0],
   [correlations['가격_테이블_상관계수'][0], 0, 0, 1.0]
])
 
# 히트맵 생성
sns.heatmap(corr_matrix,
           annot=True,  # 값 표시
           fmt='.3f',   # 소수점 3자리
           cmap='RdYlBu_r',  # 색상 맵
           xticklabels=features,
           yticklabels=features,
           center=0,    # 중앙값 (색상 기준)
           vmin=-1, vmax=1)  # 값의 범위
 
plt.title('다이아몬드 특성간 상관관계 히트맵')
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(10, 8))
 
# 상관계수 행렬 생성
features = ['가격', '캐럿', '깊이', '테이블']
corr_matrix = np.array([
   [1.0, correlations['가격_캐럿_상관계수'][0],
    correlations['가격_깊이_상관계수'][0],
    correlations['가격_테이블_상관계수'][0]],
   [correlations['가격_캐럿_상관계수'][0], 1.0,
    correlations['캐럿_깊이_상관계수'][0], 0],
   [correlations['가격_깊이_상관계수'][0],
    correlations['캐럿_깊이_상관계수'][0], 1.0, 0],
   [correlations['가격_테이블_상관계수'][0], 0, 0, 1.0]
])
 
# 히트맵 생성
sns.heatmap(corr_matrix,
           annot=True,  # 값 표시
           fmt='.3f',   # 소수점 3자리
           cmap='RdYlBu_r',  # 색상 맵
           xticklabels=features,
           yticklabels=features,
           center=0,    # 중앙값 (색상 기준)
           vmin=-1, vmax=1)  # 값의 범위
 
plt.title('다이아몬드 특성간 상관관계 히트맵')
plt.tight_layout()
plt.show()

10. 고가 다이아몬드 분석 (상위 1%)

expensive_diamonds = (
    diamonds_df.filter(
        pl.col('price') >= pl.col('price').quantile(0.99)
    )
    .group_by(['cut', 'color', 'clarity'])
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.count().alias('개수')
    ])
    .sort('개수', descending=True)
    .head(10)
)
print("고가 다이아몬드 특성 (상위 1%):")
print(expensive_diamonds)

expensive_diamonds = (
    diamonds_df.filter(
        pl.col('price') >= pl.col('price').quantile(0.99)
    )
    .group_by(['cut', 'color', 'clarity'])
    .agg([
        pl.col('price').mean().alias('평균_가격'),
        pl.col('carat').mean().alias('평균_캐럿'),
        pl.count().alias('개수')
    ])
    .sort('개수', descending=True)
    .head(10)
)
print("고가 다이아몬드 특성 (상위 1%):")
print(expensive_diamonds)

고가 다이아몬드 특성 (상위 1%):
shape: (10, 6)
┌───────────┬───────┬─────────┬──────────────┬───────────┬──────┐
│ cut       ┆ color ┆ clarity ┆ 평균_가격    ┆ 평균_캐럿 ┆ 개수 │
│ ---       ┆ ---   ┆ ---     ┆ ---          ┆ ---       ┆ ---  │
│ cat       ┆ cat   ┆ cat     ┆ f64          ┆ f64       ┆ u32  │
╞═══════════╪═══════╪═════════╪══════════════╪═══════════╪══════╡
│ Ideal     ┆ H     ┆ SI1     ┆ 18050.481481 ┆ 2.064444  ┆ 27   │
│ Premium   ┆ H     ┆ SI1     ┆ 18045.545455 ┆ 2.080455  ┆ 22   │
│ Very Good ┆ H     ┆ SI1     ┆ 18073.761905 ┆ 2.058095  ┆ 21   │
│ Premium   ┆ I     ┆ VS2     ┆ 18062.529412 ┆ 2.122353  ┆ 17   │
│ Ideal     ┆ G     ┆ SI2     ┆ 18174.117647 ┆ 2.104706  ┆ 17   │
│ Ideal     ┆ F     ┆ SI2     ┆ 18110.75     ┆ 2.044375  ┆ 16   │
│ Premium   ┆ F     ┆ SI2     ┆ 18227.692308 ┆ 2.086923  ┆ 13   │
│ Premium   ┆ E     ┆ SI2     ┆ 18075.75     ┆ 2.06      ┆ 12   │
│ Premium   ┆ G     ┆ SI2     ┆ 17805.272727 ┆ 2.102727  ┆ 11   │
│ Premium   ┆ H     ┆ SI2     ┆ 17815.545455 ┆ 2.288182  ┆ 11   │
└───────────┴───────┴─────────┴──────────────┴───────────┴──────┘

고가 다이아몬드 특성 (상위 1%):
shape: (10, 6)
┌───────────┬───────┬─────────┬──────────────┬───────────┬──────┐
│ cut       ┆ color ┆ clarity ┆ 평균_가격    ┆ 평균_캐럿 ┆ 개수 │
│ ---       ┆ ---   ┆ ---     ┆ ---          ┆ ---       ┆ ---  │
│ cat       ┆ cat   ┆ cat     ┆ f64          ┆ f64       ┆ u32  │
╞═══════════╪═══════╪═════════╪══════════════╪═══════════╪══════╡
│ Ideal     ┆ H     ┆ SI1     ┆ 18050.481481 ┆ 2.064444  ┆ 27   │
│ Premium   ┆ H     ┆ SI1     ┆ 18045.545455 ┆ 2.080455  ┆ 22   │
│ Very Good ┆ H     ┆ SI1     ┆ 18073.761905 ┆ 2.058095  ┆ 21   │
│ Premium   ┆ I     ┆ VS2     ┆ 18062.529412 ┆ 2.122353  ┆ 17   │
│ Ideal     ┆ G     ┆ SI2     ┆ 18174.117647 ┆ 2.104706  ┆ 17   │
│ Ideal     ┆ F     ┆ SI2     ┆ 18110.75     ┆ 2.044375  ┆ 16   │
│ Premium   ┆ F     ┆ SI2     ┆ 18227.692308 ┆ 2.086923  ┆ 13   │
│ Premium   ┆ E     ┆ SI2     ┆ 18075.75     ┆ 2.06      ┆ 12   │
│ Premium   ┆ G     ┆ SI2     ┆ 17805.272727 ┆ 2.102727  ┆ 11   │
│ Premium   ┆ H     ┆ SI2     ┆ 17815.545455 ┆ 2.288182  ┆ 11   │
└───────────┴───────┴─────────┴──────────────┴───────────┴──────┘

# 그래프 크기 설정
plt.figure(figsize=(15, 6))
 
# 막대 그래프와 선 그래프를 결합할 주축과 보조축 생성
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 데이터 준비
x = np.arange(len(expensive_diamonds))
width = 0.35
 
# numpy 배열로 변환
prices = expensive_diamonds['평균_가격'].to_numpy()
carats = expensive_diamonds['평균_캐럿'].to_numpy()
counts = expensive_diamonds['개수'].to_numpy()
 
# 막대 그래프: 평균 가격 (스케일 조정 없이)
bars = ax1.bar(x - width/2, prices, width,
               label='평균 가격($)', color='lightblue')
 
# 막대 그래프: 평균 캐럿 (스케일 조정)
bars2 = ax1.bar(x + width/2, carats * 10000, width,
                label='평균 캐럿(×10000)', color='lightgreen')
 
# 선 그래프: 개수
line = ax2.plot(x, counts, 'ro-', linewidth=2, label='개수')
 
# x축 레이블 생성
labels = [f"{row['cut']}\n{row['color']}-{row['clarity']}"
          for row in expensive_diamonds.iter_rows(named=True)]
 
# 그래프 꾸미기
plt.title('고가 다이아몬드(상위 1%) 특성 분석', pad=20, size=15)
ax1.set_xlabel('컷팅-색상-선명도')
ax1.set_ylabel('가격($) / 캐럿×10000')
ax2.set_ylabel('개수')
 
# x축 레이블 설정
plt.xticks(x, labels, rotation=45, ha='right')
 
# 데이터 레이블 추가
for i in x:
    # 평균 가격
    ax1.text(i - width/2, prices[i],
             f'${prices[i]:,.0f}',
             ha='center', va='bottom', rotation=90)
 
    # 평균 캐럿
    ax1.text(i + width/2, carats[i] * 10000,
             f'{carats[i]:.2f}ct',
             ha='center', va='bottom', rotation=90)
 
    # 개수
    ax2.text(i, counts[i],
             f'{counts[i]}개',
             ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(15, 6))
 
# 막대 그래프와 선 그래프를 결합할 주축과 보조축 생성
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 데이터 준비
x = np.arange(len(expensive_diamonds))
width = 0.35
 
# numpy 배열로 변환
prices = expensive_diamonds['평균_가격'].to_numpy()
carats = expensive_diamonds['평균_캐럿'].to_numpy()
counts = expensive_diamonds['개수'].to_numpy()
 
# 막대 그래프: 평균 가격 (스케일 조정 없이)
bars = ax1.bar(x - width/2, prices, width,
               label='평균 가격($)', color='lightblue')
 
# 막대 그래프: 평균 캐럿 (스케일 조정)
bars2 = ax1.bar(x + width/2, carats * 10000, width,
                label='평균 캐럿(×10000)', color='lightgreen')
 
# 선 그래프: 개수
line = ax2.plot(x, counts, 'ro-', linewidth=2, label='개수')
 
# x축 레이블 생성
labels = [f"{row['cut']}\n{row['color']}-{row['clarity']}"
          for row in expensive_diamonds.iter_rows(named=True)]
 
# 그래프 꾸미기
plt.title('고가 다이아몬드(상위 1%) 특성 분석', pad=20, size=15)
ax1.set_xlabel('컷팅-색상-선명도')
ax1.set_ylabel('가격($) / 캐럿×10000')
ax2.set_ylabel('개수')
 
# x축 레이블 설정
plt.xticks(x, labels, rotation=45, ha='right')
 
# 데이터 레이블 추가
for i in x:
    # 평균 가격
    ax1.text(i - width/2, prices[i],
             f'${prices[i]:,.0f}',
             ha='center', va='bottom', rotation=90)
 
    # 평균 캐럿
    ax1.text(i + width/2, carats[i] * 10000,
             f'{carats[i]:.2f}ct',
             ha='center', va='bottom', rotation=90)
 
    # 개수
    ax2.text(i, counts[i],
             f'{counts[i]}개',
             ha='center', va='bottom')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

11. 치수 분석

dimension_analysis = (
    diamonds_df.group_by('cut')
    .agg([
        pl.col('x').mean().alias('평균_길이'),
        pl.col('y').mean().alias('평균_너비'),
        pl.col('z').mean().alias('평균_높이'),
        ((pl.col('x') * pl.col('y') * pl.col('z')).mean()).alias('평균_부피')
    ])
    .sort('평균_부피', descending=True)
)
print("컷팅별 치수 분석:")
print(dimension_analysis)

dimension_analysis = (
    diamonds_df.group_by('cut')
    .agg([
        pl.col('x').mean().alias('평균_길이'),
        pl.col('y').mean().alias('평균_너비'),
        pl.col('z').mean().alias('평균_높이'),
        ((pl.col('x') * pl.col('y') * pl.col('z')).mean()).alias('평균_부피')
    ])
    .sort('평균_부피', descending=True)
)
print("컷팅별 치수 분석:")
print(dimension_analysis)

컷팅별 치수 분석:
shape: (5, 5)
┌───────────┬───────────┬───────────┬───────────┬────────────┐
│ cut       ┆ 평균_길이 ┆ 평균_너비 ┆ 평균_높이 ┆ 평균_부피  │
│ ---       ┆ ---       ┆ ---       ┆ ---       ┆ ---        │
│ cat       ┆ f64       ┆ f64       ┆ f64       ┆ f64        │
╞═══════════╪═══════════╪═══════════╪═══════════╪════════════╡
│ Fair      ┆ 6.246894  ┆ 6.182652  ┆ 3.98277   ┆ 164.950549 │
│ Premium   ┆ 5.973887  ┆ 5.944879  ┆ 3.647124  ┆ 145.052128 │
│ Good      ┆ 5.838785  ┆ 5.850744  ┆ 3.639507  ┆ 136.257267 │
│ Very Good ┆ 5.740696  ┆ 5.770026  ┆ 3.559801  ┆ 130.999722 │
│ Ideal     ┆ 5.507451  ┆ 5.52008   ┆ 3.401448  ┆ 115.394912 │
└───────────┴───────────┴───────────┴───────────┴────────────┘

컷팅별 치수 분석:
shape: (5, 5)
┌───────────┬───────────┬───────────┬───────────┬────────────┐
│ cut       ┆ 평균_길이 ┆ 평균_너비 ┆ 평균_높이 ┆ 평균_부피  │
│ ---       ┆ ---       ┆ ---       ┆ ---       ┆ ---        │
│ cat       ┆ f64       ┆ f64       ┆ f64       ┆ f64        │
╞═══════════╪═══════════╪═══════════╪═══════════╪════════════╡
│ Fair      ┆ 6.246894  ┆ 6.182652  ┆ 3.98277   ┆ 164.950549 │
│ Premium   ┆ 5.973887  ┆ 5.944879  ┆ 3.647124  ┆ 145.052128 │
│ Good      ┆ 5.838785  ┆ 5.850744  ┆ 3.639507  ┆ 136.257267 │
│ Very Good ┆ 5.740696  ┆ 5.770026  ┆ 3.559801  ┆ 130.999722 │
│ Ideal     ┆ 5.507451  ┆ 5.52008   ┆ 3.401448  ┆ 115.394912 │
└───────────┴───────────┴───────────┴───────────┴────────────┘

# 그래프 크기 설정
plt.figure(figsize=(12, 8))
 
# 데이터 준비
cuts = dimension_analysis['cut']
measures = ['평균_길이', '평균_너비', '평균_높이']
volumes = dimension_analysis['평균_부피'].to_numpy()
 
# 바 플롯을 위한 x 위치
x = np.arange(len(cuts))
width = 0.25
 
# 두 개의 축 생성
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 치수별 막대 그래프
for i, measure in enumerate(measures):
   values = dimension_analysis[measure].to_numpy()
   ax1.bar(x + (i-1)*width, values, width,
           label=measure.replace('평균_', ''),
           alpha=0.7)
 
# 부피는 선 그래프로 표시
line = ax2.plot(x, volumes, 'r-o', linewidth=2, label='부피', color='red')
 
# 그래프 꾸미기
plt.title('다이아몬드 컷팅별 치수와 부피 분석', pad=20, size=15)
ax1.set_xlabel('컷팅 등급')
ax1.set_ylabel('길이 (mm)')
ax2.set_ylabel('부피 (mm³)')
 
# x축 레이블
plt.xticks(x, cuts)
 
# 데이터 레이블 추가
for i, measure in enumerate(measures):
   values = dimension_analysis[measure].to_numpy()
   for j in range(len(values)):
       ax1.text(j + (i-1)*width, values[j],
               f'{values[j]:.2f}',
               ha='center', va='bottom',
               rotation=90)
 
for i in range(len(volumes)):
   ax2.text(i, volumes[i],
            f'{volumes[i]:.1f}',
            ha='center', va='bottom',
            color='red')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

# 그래프 크기 설정
plt.figure(figsize=(12, 8))
 
# 데이터 준비
cuts = dimension_analysis['cut']
measures = ['평균_길이', '평균_너비', '평균_높이']
volumes = dimension_analysis['평균_부피'].to_numpy()
 
# 바 플롯을 위한 x 위치
x = np.arange(len(cuts))
width = 0.25
 
# 두 개의 축 생성
ax1 = plt.gca()
ax2 = ax1.twinx()
 
# 치수별 막대 그래프
for i, measure in enumerate(measures):
   values = dimension_analysis[measure].to_numpy()
   ax1.bar(x + (i-1)*width, values, width,
           label=measure.replace('평균_', ''),
           alpha=0.7)
 
# 부피는 선 그래프로 표시
line = ax2.plot(x, volumes, 'r-o', linewidth=2, label='부피', color='red')
 
# 그래프 꾸미기
plt.title('다이아몬드 컷팅별 치수와 부피 분석', pad=20, size=15)
ax1.set_xlabel('컷팅 등급')
ax1.set_ylabel('길이 (mm)')
ax2.set_ylabel('부피 (mm³)')
 
# x축 레이블
plt.xticks(x, cuts)
 
# 데이터 레이블 추가
for i, measure in enumerate(measures):
   values = dimension_analysis[measure].to_numpy()
   for j in range(len(values)):
       ax1.text(j + (i-1)*width, values[j],
               f'{values[j]:.2f}',
               ha='center', va='bottom',
               rotation=90)
 
for i in range(len(volumes)):
   ax2.text(i, volumes[i],
            f'{volumes[i]:.1f}',
            ha='center', va='bottom',
            color='red')
 
# 범례 통합
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
 
plt.tight_layout()
plt.show()

12. 종합 통계

summary_stats = {
    '전체_다이아몬드_수': diamonds_df.shape[0],
    '평균_가격': round(diamonds_df['price'].mean(),2),
    '평균_캐럿': round(diamonds_df['carat'].mean(),3),
    '최고_가격': diamonds_df['price'].max(),
    '최대_캐럿': diamonds_df['carat'].max(),
    '가장_많은_컷팅': diamonds_df['cut'].mode()[0],
    '가장_많은_색상': diamonds_df['color'].mode()[0],
    '가장_많은_선명도': diamonds_df['clarity'].mode()[0]
}
print("종합 통계:")
print(summary_stats)

summary_stats = {
    '전체_다이아몬드_수': diamonds_df.shape[0],
    '평균_가격': round(diamonds_df['price'].mean(),2),
    '평균_캐럿': round(diamonds_df['carat'].mean(),3),
    '최고_가격': diamonds_df['price'].max(),
    '최대_캐럿': diamonds_df['carat'].max(),
    '가장_많은_컷팅': diamonds_df['cut'].mode()[0],
    '가장_많은_색상': diamonds_df['color'].mode()[0],
    '가장_많은_선명도': diamonds_df['clarity'].mode()[0]
}
print("종합 통계:")
print(summary_stats)

종합 통계:
{'전체_다이아몬드_수': 53940, '평균_가격': 3932.8, '평균_캐럿': 0.798, '최고_가격': 18823, '최대_캐럿': 5.01, '가장_많은_컷팅': 'Ideal', '가장_많은_색상': 'G', '가장_많은_선명도': 'SI1'}

종합 통계:
{'전체_다이아몬드_수': 53940, '평균_가격': 3932.8, '평균_캐럿': 0.798, '최고_가격': 18823, '최대_캐럿': 5.01, '가장_많은_컷팅': 'Ideal', '가장_많은_색상': 'G', '가장_많은_선명도': 'SI1'}

{"packages":["numpy","pandas","matplotlib","lxml"]}