[程序分享] 爬虫的铺垫——网页HTML树状结构解析利器BeautifulSoup [推广有奖]

20关注
13粉丝

随心所欲不逾矩

已卖：1249份资源

教授

61%

还不是VIP/贵宾

威望: 0 级
论坛币: 8276 个
通用积分: 771.3593
学术水平: 18 点
热心指数: 22 点
信用等级: 13 点
经验: 39777 点
帖子: 901
精华: 0
在线时间: 1414 小时
注册时间: 2007-9-27
最后登录: 2026-4-2

楼主

shadowaver

发表于 2021-12-27 14:26:06 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Dec 27 13:52:57 2021

@author: apache
"""

from bs4 import BeautifulSoup

html = """
<html><head><title>The Dormouse's story</title></head>
<body>
The Dormouse's story
Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.
...
"""

soup=BeautifulSoup(html)

# 上面这句代码便是将本地 index.html 文件打开，用它来创建 soup 对象
# 下面我们来打印一下 soup 对象的内容，格式化输出

print(soup.prettify())

print(soup.title.string)

import requests
s=requests.Session()
xq=s.get('https://sh.fang.lianjia.com/loupan/')
xq.text
soup=BeautifulSoup(xq.text)

soup.prettify()
soup.title.string