BeautifulSoup Webscraper

This projects aim is to build a webscraper for Data harvesting to better understand how it works and its application. It will be based around using the Python libraries requests and Beautiful Soup which are powerful tools for the job.

As this is for educational purposes we will be targetting a dummy site built for this purpose (https://realpython.github.io/fake-jobs/).

Parse the HTML code;

import requests
URL = "https://realpython.github.io/fake-jobs/"
page = requests.get(URL)
print(page.text)

Find elements by ID;

results = soup.find(id="ResultsContainer")

Extract Text from the HTML;

for job_element in job_elements:

title_element = job_element.find("h2", class_="title")
company_element = job_element.find("h3", class_="company")
location_element = job_element.find("p", class_="location")
print(title_element.text.strip())
print(company_element.text.strip())
print(location_element.text.strip())
print()

The reconnaissance of each site would have to be carried out in advance to set the parameters for the strings to be filled in, but this would ultimatly save so much time once the template is set up to pull all this data automatically.

BeautifulSoup Webscraper

BeautifulSoup Webscraper BeautifulSoup Webscraper

BeautifulSoup Webscraper