CodeSolved

Solved Programming Questions & Exercises

Extraction of products from the site

Practice Easy 1816/ Download 386 Views

Write a program that receives the following HTML file information and saves in an Excel file.

Get Product Title, Price, and Product Link
Save information in Excel
Display Products in order of low to high


<! HTML DOCTYPE>
<html lang = "FA">
<Head>
<meta charst = "UTF-8">
<title> Test Store </title>
</hread>
<body>
<div class = "Product">
<span class = "title"> Asus X515 laptop </span>
<span class = "PRICE"> 25,000,000 USD </span>
<a class = "link" href = "https://example.com/lapsop-x515"> Product View </a>
</div>
<div class = "Product">
<span class = "title"> Samsung Galaxy A54 </span>
<span class = "PRICE"> 15,500,000 USD </span>
<a class = "link" href = "https://example.com/galaxy-a54"> Product View </a>
</div>
<div class = "Product">
<span class = "Title"> Wireless Headphones Sony W-1000xm4 </span>
<span class = "PRICE"> 9,200,000 USD </span>
<a class = "link" href = "https://example.com/sony-wh1000xm4"> Product View </a>
</div>
<div class = "Product">
<span class = "title"> Model 24MP400 Monitor </span>
<span class = "PRICE"> 6,500,000 USD </span>
<a class = "link" href = "https://example.com/lg-24MP400"> Product View </a>
</div>
<div class = "Product">
<span class = "Title"> PlayStation Game Console </span>
<span class = "PRICE"> 32,000,000 USD </span>
<a class = "link" href = "https://example.com/ps5"> View Product </a>
</div>
<div class = "Product">
<span class = "title"> Logitech G Pro X </span>
<span class = "PRICE"> 4,800,000 USD </span>
<a class = "link" href = "https://example.com/logitech-gpro"> Product View </a>
</div>
<div class = "Product">
<span class = "Title"> Mechanical Razer Blackwidow v3 </span>
<span class = "PRICE"> 7,300,000 USD </span>
<a class = "link" href = "https://example.com/razer-bw-v3"> Product View </a>
</div>
<div class = "Product">
<span class = "Title"> Apple Watch Series 8 Smart Watch </span>
<span class = "PRICE"> 18,000,000 USD </span>
<a class = "link" href = "https://example.com/apple-watch8"> Product View </a>
</div>
<div class = "Product">
<span class = "title"> Canon EOS 250D Digital Camera </span>
<span class = "PRICE"> 22,500,000 USD </span>
<a class = "link" href = "https://example.com/canon-eos250d"> Product View </a>
</div>
<div class = "Product">
<span class = "Title"> WD My Passport 2TB </span>
<span class = "PRICE"> 3,200,000 USD </span>
<a class = "link" href = "https://example.com/wd-2tb"> Product View </a>
</div>
</body>
</tml>

1 Answers

To do this, you can from librariesBeautifulSoupTo extract information from HTML andpandasUse to save in the Excel file. Below is the relevant code:

import pandas as pd
from bs4 import BeautifulSoup

html_content = """
<!DOCTYPE html>
<html lang="fa">
<head>
    <meta charset="UTF-8">
    <title>فروشگاه تستی</title>
</head>
<body>
    <div class="product">
        <span class="title">لپ‌تاپ ایسوس مدل X515</span>
        <span class="price">25,000,000 تومان</span>
        <a class="link" href="https://example.com/laptop-x515">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">گوشی سامسونگ Galaxy A54</span>
        <span class="price">15,500,000 تومان</span>
        <a class="link" href="https://example.com/galaxy-a54">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">هدفون بی‌سیم Sony WH-1000XM4</span>
        <span class="price">9,200,000 تومان</span>
        <a class="link" href="https://example.com/sony-wh1000xm4">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">مانیتور ال‌جی مدل 24MP400</span>
        <span class="price">6,500,000 تومان</span>
        <a class="link" href="https://example.com/lg-24mp400">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">کنسول بازی PlayStation 5</span>
        <span class="price">32,000,000 تومان</span>
        <a class="link" href="https://example.com/ps5">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">موس گیمینگ Logitech G Pro X</span>
        <span class="price">4,800,000 تومان</span>
        <a class="link" href="https://example.com/logitech-gpro">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">کیبورد مکانیکی Razer BlackWidow V3</span>
        <span class="price">7,300,000 تومان</span>
        <a class="link" href="https://example.com/razer-bw-v3">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">ساعت هوشمند Apple Watch Series 8</span>
        <span class="price">18,000,000 تومان</span>
        <a class="link" href="https://example.com/apple-watch8">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">دوربین دیجیتال Canon EOS 250D</span>
        <span class="price">22,500,000 تومان</span>
        <a class="link" href="https://example.com/canon-eos250d">مشاهده محصول</a>
    </div>
    <div class="product">
        <span class="title">هارد اکسترنال WD My Passport 2TB</span>
        <span class="price">3,200,000 تومان</span>
        <a class="link" href="https://example.com/wd-2tb">مشاهده محصول</a>
    </div>
</body>
</html>
"""

soup = BeautifulSoup(html_content, 'html.parser')
products = []

for product in soup.find_all(class_='product'):
    title = product.find(class_='title').text
    price = int(product.find(class_='price').text.replace(',', '').replace(' تومان', ''))
    link = product.find(class_='link')['href']
    products.append({'Title': title, 'Price': price, 'Link': link})

# Sort products by price
products.sort(key=lambda x: x['Price'])

# Create DataFrame and save to Excel
df = pd.DataFrame(products)
df.to_excel('products.xlsx', index=False)

This code extracted the product information from HTML and put them in an excel file calledproducts.xlsxSaves. Products have been arranged in low prices, respectively.

Ai Download Python

Submit answer

Submitting answers is currently unavailable.

×
×
Close