Extraction of products from the site
Write a program that receives the following HTML file information and saves in an Excel file.
Get Product Title, Price, and Product Link
Save information in Excel
Display Products in order of low to high
Write a program that receives the following HTML file information and saves in an Excel file.
Get Product Title, Price, and Product Link
Save information in Excel
Display Products in order of low to high
To do this, you can from librariesBeautifulSoupTo extract information from HTML andpandasUse to save in the Excel file. Below is the relevant code:
import pandas as pd
from bs4 import BeautifulSoup
html_content = """
<!DOCTYPE html>
<html lang="fa">
<head>
<meta charset="UTF-8">
<title>فروشگاه تستی</title>
</head>
<body>
<div class="product">
<span class="title">لپتاپ ایسوس مدل X515</span>
<span class="price">25,000,000 تومان</span>
<a class="link" href="https://example.com/laptop-x515">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">گوشی سامسونگ Galaxy A54</span>
<span class="price">15,500,000 تومان</span>
<a class="link" href="https://example.com/galaxy-a54">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">هدفون بیسیم Sony WH-1000XM4</span>
<span class="price">9,200,000 تومان</span>
<a class="link" href="https://example.com/sony-wh1000xm4">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">مانیتور الجی مدل 24MP400</span>
<span class="price">6,500,000 تومان</span>
<a class="link" href="https://example.com/lg-24mp400">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">کنسول بازی PlayStation 5</span>
<span class="price">32,000,000 تومان</span>
<a class="link" href="https://example.com/ps5">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">موس گیمینگ Logitech G Pro X</span>
<span class="price">4,800,000 تومان</span>
<a class="link" href="https://example.com/logitech-gpro">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">کیبورد مکانیکی Razer BlackWidow V3</span>
<span class="price">7,300,000 تومان</span>
<a class="link" href="https://example.com/razer-bw-v3">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">ساعت هوشمند Apple Watch Series 8</span>
<span class="price">18,000,000 تومان</span>
<a class="link" href="https://example.com/apple-watch8">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">دوربین دیجیتال Canon EOS 250D</span>
<span class="price">22,500,000 تومان</span>
<a class="link" href="https://example.com/canon-eos250d">مشاهده محصول</a>
</div>
<div class="product">
<span class="title">هارد اکسترنال WD My Passport 2TB</span>
<span class="price">3,200,000 تومان</span>
<a class="link" href="https://example.com/wd-2tb">مشاهده محصول</a>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html_content, 'html.parser')
products = []
for product in soup.find_all(class_='product'):
title = product.find(class_='title').text
price = int(product.find(class_='price').text.replace(',', '').replace(' تومان', ''))
link = product.find(class_='link')['href']
products.append({'Title': title, 'Price': price, 'Link': link})
# Sort products by price
products.sort(key=lambda x: x['Price'])
# Create DataFrame and save to Excel
df = pd.DataFrame(products)
df.to_excel('products.xlsx', index=False)
This code extracted the product information from HTML and put them in an excel file calledproducts.xlsxSaves. Products have been arranged in low prices, respectively.
Submitting answers is currently unavailable.
You must be logged in to access this section.
Login/Sign up If you don’t understand the exercise or can’t solve it for any reason, that’s completely
normal—don’t worry 😊
Try checking out easier exercises and reviewing different answers
submitted by others. Gradually, you can move on to more challenging exercises. Also, your answer
might be correct even if it’s different from others.