견고한 파이썬 | 위니북스

자주 사용되는 라이브러리

1. 자주 사용되는 라이브러리

파이썬은 다양한 라이브러리를 제공합니다. 이러한 라이브러리를 사용하면 프로그램을 더 쉽게 작성할 수 있습니다. 어떤 코드를 작성하기 전 이미 거인들이 짜놓은 라이브러리가 없는지 찾아보는 것이 좋습니다. 라이브러리를 사용하면 코드를 더 간결하게 작성할 수 있고, 더 빠르게 작업을 수행할 수 있습니다. 이번 장에서는 자주 사용되는 라이브러리를 소개하겠습니다.

1.1 os 모듈

운영체제와 상호작용하는 데 사용되는 함수들을 제공합니다. 파일과 디렉토리를 만들고, 읽고, 쓰는 것을 비롯하여 환경 변수에 접근하거나 운영 체제 명령을 실행하는 등의 작업을 수행할 수 있습니다.

import os
 
os.mkdir('licat') # licat이란 폴더 생성, 삭제는 os.rmdir()
os.path.join('hello', 'world') # 경로를 합쳐줍니다.
os.rename('a.txt', 'b.txt') # a.txt파일을 b.txt파일로 변경
os.listdir() # 현재 디렉토리의 파일 목록을 반환합니다.
os.getcwd() # 현재 경로 반환
os.open('a.txt') # 파일 생성
os.remove('b.txt')

import os
 
os.mkdir('licat') # licat이란 폴더 생성, 삭제는 os.rmdir()
os.path.join('hello', 'world') # 경로를 합쳐줍니다.
os.rename('a.txt', 'b.txt') # a.txt파일을 b.txt파일로 변경
os.listdir() # 현재 디렉토리의 파일 목록을 반환합니다.
os.getcwd() # 현재 경로 반환
os.open('a.txt') # 파일 생성
os.remove('b.txt')

예를 들어, 아래와 같은 데이터가 있어 해당 데이터로 폴더를 만들고 파일을 생성하는 코드를 생각해볼 수 있습니다.

프로젝트1, 회의록.txt, 프로젝트 1에 회의록입니다.
프로젝트2, 보고서.txt, 프로젝트 2에 보고서입니다.
프로젝트3, 일정.txt, 프로젝트 3에 일정입니다.

프로젝트1, 회의록.txt, 프로젝트 1에 회의록입니다.
프로젝트2, 보고서.txt, 프로젝트 2에 보고서입니다.
프로젝트3, 일정.txt, 프로젝트 3에 일정입니다.

import os
import csv
 
with open('data.csv', 'r', encoding='utf-8') as f:
    data = csv.reader(f)
    data = list(data)
 
for i in data:
    # 폴더 생성
    if not os.path.exists(i[0]):
        os.mkdir(i[0])
    # 경로명 생성
    path = os.path.join(i[0], i[1].strip())
    # 파일 생성 및 내용 작성
    with open(path, 'w', encoding='utf-8') as f:
        f.write(i[2])

import os
import csv
 
with open('data.csv', 'r', encoding='utf-8') as f:
    data = csv.reader(f)
    data = list(data)
 
for i in data:
    # 폴더 생성
    if not os.path.exists(i[0]):
        os.mkdir(i[0])
    # 경로명 생성
    path = os.path.join(i[0], i[1].strip())
    # 파일 생성 및 내용 작성
    with open(path, 'w', encoding='utf-8') as f:
        f.write(i[2])

1.2 pathlib 모듈

os 모듈과 비슷한 기능을 제공하지만, os 모듈보다 더 간결하고 직관적인 코드를 작성할 수 있습니다.

from pathlib import Path as p
 
p.cwd() # 현재 경로 확인
pwd = p.cwd()
pwd / 'hello' / 'world' # /로 경로 결합
 
dire = pwd / 'hello100' / 'world100' / 'hello200'
dire.mkdir(parents=True, exist_ok=True) # 부모까지 폴더 생성
# parents 옵션을 제거하면 폴더 하나밖에 생성이 안됩니다.
 
for i in pwd.iterdir():
    print(i)

from pathlib import Path as p
 
p.cwd() # 현재 경로 확인
pwd = p.cwd()
pwd / 'hello' / 'world' # /로 경로 결합
 
dire = pwd / 'hello100' / 'world100' / 'hello200'
dire.mkdir(parents=True, exist_ok=True) # 부모까지 폴더 생성
# parents 옵션을 제거하면 폴더 하나밖에 생성이 안됩니다.
 
for i in pwd.iterdir():
    print(i)

아래와 같이 손쉽게 경로 이동이 가능합니다.

from pathlib import Path as p

pwd = p.cwd()
dire = pwd / 'hello100' / 'world100' / 'hello200'

dire.parent
dire.parent.parent

from pathlib import Path as p

pwd = p.cwd()
dire = pwd / 'hello100' / 'world100' / 'hello200'

dire.parent
dire.parent.parent

1.3 math 모듈

수학적 연산을 수행하는데 필요한 함수들을 제공합니다. 제곱근, 로그, 삼각함수 등의 기능이 포함되어 있습니다.

import math math.pi # 출력: 3.141592653589793

1.4 datetime 모듈

날짜와 시간을 다루는 클래스를 제공합니다. 현재 날짜와 시간을 가져오는 것부터 날짜와 시간의 차이를 계산하는 것까지 다양한 기능을 제공합니다.

가장 많이 사용되는 기능은 아래와 같습니다.

현재 시간을 가져오는 datetime.datetime.now()
시간 연산을 위한 datetime.timedelta()
날짜와 시간을 문자열로 변환하는 datetime.strftime()

import datetime s = datetime.datetime(2023, 9, 19, 14, 10) print(s) print(s.year, s.month, s.day, s.hour, s.minute) s = datetime.datetime(2023, 9, 18, 14, 10) print(s.weekday()) # 월요일0, 화요일1, 수요일2 ... 일요일6 today = datetime.date.today() days = datetime.timedelta(days=100) today + days # 100일 후 시간 graduation_date = datetime.date(2023, 12, 29) today = datetime.date.today() print(graduation_date - today) # 졸업까지 남은 일자 date = '2024-12-10' date = datetime.strptime(date, '%Y-%m-%d') print(date)

날짜 형식에 대한 자세한 내용은 날짜 형식을 참고하세요.

1.5 json 모듈

JSON 형식의 데이터를 읽고 쓰는 데 사용됩니다. Python 데이터 구조를 JSON 문자열로 직렬화하거나, JSON 문자열을 Python 데이터 구조로 역직렬화하는 기능을 제공합니다. 일반적으로 클래스로 구현하여 만든 인스턴스는 직렬화되지 않습니다.

직렬화란 데이터를 저장/전송 형식으로 변환하는 것을 얘기합니다. 보통 문자열로 변환됩니다.

import json d = { 'one': 1, 'two': 2, 'three': 3 } s = json.dumps(d) print(type(s)) # str d = json.loads(s) print(type(d)) # dict

다만 이 모듈을 통해 파이썬의 모든 자료형이 완벽히 JSON으로 변환되지는 않습니다. 예를 들어, 파이썬의 dict는 key로 숫자를 사용할 수 있지만, JSON은 key로 숫자를 사용할 수 없습니다.

1.6 collections 모듈

기본 데이터 컨테이너 외에도 다양한 데이터 컨테이너 타입을 제공합니다. deque, Counter, OrderedDict, defaultdict 등과 같은 고급 데이터 구조를 제공합니다.

import collections d = collections.deque([1, 2, 3, 4]) d.rotate(1) # 1번 오른쪽으로 쉬프트 합니다. 숫자를 2로 바꾸어 비교해보세요. print(d) c = collections.Counter('hello world') print(c) # 요소의 개수를 세어 딕셔너리로 반환합니다. print(c.most_common()) # 가장 많이 나온 문자를 반환합니다.

다만 해당 모듈 중 OrderedDict는 파이썬 3.7부터는 기본 딕셔너리가 순서를 보장하기 때문에 사용할 필요가 없어졌습니다. 이처럼 시간이 지나면서 라이브러리의 사용성이 변화할 수 있습니다.

1.7 itertools 모듈

반복자를 만드는 데 사용되는 함수들을 제공합니다. 이 모듈은 반복자를 만드는 데 사용되는 여러 유용한 함수들을 제공합니다. count, cycle, repeat, chain, compress, dropwhile, takewhile, groupby, islice, starmap, tee, zip_longest 등의 함수가 있습니다.

import itertools for i in itertools.count(10, 2): print(i) if i > 20: break

무한히 반복하는 반복자를 만들 수도 있습니다.

import itertools number = 10 for i in itertools.cycle('ABCD'): print(i) number += 1 if number > 20: break

import itertools number = 10 for i in itertools.repeat(None): print(i) number += 1 if number > 20: break

순열과 조합도 만들 수 있습니다.

import itertools arr = [1, 2, 3] print('순열') for i in itertools.permutations(arr, 2): print(i) print('조합') for i in itertools.combinations(arr, 2): print(i)

1.8 requests 모듈

HTTP 요청을 쉽게 보낼 수 있는 기능을 제공하는 외부 라이브러리입니다. REST API와 통신하거나 웹 페이지를 스크래핑하는 등의 작업을 수행하는 데 유용합니다. 보통 BeautifulSoup 모듈과 함께 사용됩니다. BeautifulSoup는 가지고 온 데이터를 우리가 보다 쉽게 접근할 수 있게 파싱합니다.

파싱은 문자열이나 데이터를 분석하여 구조화된 정보로 변환하는 과정을 말합니다.

import requests
 
# for i in range(1000): # 이렇게 하시면 안됩니다. 공격이에요.
paullab_url = 'https://paullab.co.kr/bookservice/'
response = requests.get(paullab_url)
response.encoding = 'utf-8'
response.text

import requests
 
# for i in range(1000): # 이렇게 하시면 안됩니다. 공격이에요.
paullab_url = 'https://paullab.co.kr/bookservice/'
response = requests.get(paullab_url)
response.encoding = 'utf-8'
response.text

위 요청은 단순히 요청을 보내는 코드입니다.

from bs4 import BeautifulSoup
 
html = '''
<h1>대제목</h1>
<p class='one'>hello world 1</p>
<p class='two'>hello world 2</p>
'''
 
soup = BeautifulSoup(html, 'html.parser')
soup.select('h1')
soup.select('h1')[0].text
soup.select('.one')[0].text

from bs4 import BeautifulSoup
 
html = '''
<h1>대제목</h1>
<p class='one'>hello world 1</p>
<p class='two'>hello world 2</p>
'''
 
soup = BeautifulSoup(html, 'html.parser')
soup.select('h1')
soup.select('h1')[0].text
soup.select('.one')[0].text

위 코드는 BeautifulSoup 모듈을 사용하여 HTML을 파싱하는 코드입니다.

import requests
from bs4 import BeautifulSoup
 
paullab_url = 'https://paullab.co.kr/bookservice/'
response = requests.get(paullab_url)
response.encoding = 'utf-8'
html = response.text
 
soup = BeautifulSoup(html, 'html.parser')
 
bookservices = soup.select('.col-lg-6 > h2') # col-lg-6 클래스 안의 h2 태그 탐색
for no, book in enumerate(bookservices, 1):
    print(no, book.text)

import requests
from bs4 import BeautifulSoup
 
paullab_url = 'https://paullab.co.kr/bookservice/'
response = requests.get(paullab_url)
response.encoding = 'utf-8'
html = response.text
 
soup = BeautifulSoup(html, 'html.parser')
 
bookservices = soup.select('.col-lg-6 > h2') # col-lg-6 클래스 안의 h2 태그 탐색
for no, book in enumerate(bookservices, 1):
    print(no, book.text)

이러한 코드를 조합하여 위와 같이 웹 페이지의 정보를 가져올 수 있습니다.

{"packages":["numpy","pandas","matplotlib","lxml"]}