본문 바로가기
Side Projects/오리지널 프로젝트

파이썬 스크래퍼(python scrapper)

by devraphy 2020. 9. 7.

* 플레이 시 광고가 나옵니다.(카카오 스테이션 사용으로 인해 광고발생)

 

1) 기능 설명

  1. 검색어를 기반으로 원격근무 채용정보를 StackOverflow, WeWork, RemoteOk 세가지 사이트에서 크롤링

    Based on a search word, the program scraps remote jobs' info from the three different platforms which are StackOverflow, WeWork, RemoteOk. 

  2. fake DB를 사용하여 검색한 자료를 저장하고 재검색되면 불러올 수 있도록 하여 크롤링 검색 속도를 향상

    While the program is running, if a user searches the same word that has searched before, the program will bring stored data, that linked to the search word, from the fake DB which is an array. Once the program is turned off, all the fake DB will be empty. 

  3. 검색된 자료를 출력 및 통합된 CSV파일 다운로드 가능

    The search result will show the scrapped data from three different platforms and be able to download an integrated CSV file format.

 

2) 사용한 기술 

  • Language/script: python, Jinja2, HTML/CSS 
  • Framework: Flask
  • DB: none(fake DB)
  • ETC: BeautifulSoup(BS4) 

 

3) Github

https://github.com/devraphy/python-crawler

 

GitHub - devraphy/python-crawler: A repository for the python crawler project.

A repository for the python crawler project. Contribute to devraphy/python-crawler development by creating an account on GitHub.

github.com

 

4) Repl - 직접 구동 가능 

repl.it/@devraphy/Python-Scrapper-project

 

Python Scrapper project

A Python repl by devraphy

repl.it

 

댓글