博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
school
阅读量:4494 次
发布时间:2019-06-08

本文共 1779 字,大约阅读时间需要 5 分钟。

'''     爬取中国每个省份的大学名称和官网地址 ''' import requests from lxml import etree class School(object):     def __init__(self):         self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36" } self.url = 'http://www.cnxiaoyuan.com/' # 省份学校 def province_school_url(self): province = list() response = requests.get(url=self.url, headers=self.headers) html = etree.HTML(response.content.decode('utf-8')) # 省份学校url li_list = html.xpath("//div[@id='homecate']/ul/li")[0:-3] for li in li_list: province_school_url = li.xpath("./a/@href") for province_school in province_school_url: province_school = 'http://www.cnxiaoyuan.com/' + province_school province.append(province_school) return province # 获取每个省份的学校的url def school_url(self, province): school_list = list() for school in province: response = requests.get(url=school, headers=self.headers) html = etree.HTML(response.content.decode('utf-8')) # 每个省份的学校title和url li_list = html.xpath("//ul[@class='sitelist']/li") for li in li_list: school_title = li.xpath("./div/h3/a/text()") school_url = li.xpath("./div/address/a/text()") school_list.append(school_url) school_list.append(school_title) print(school_title, school_url) if __name__ == '__main__': s = School() province = s.province_school_url() i = 0 while i < 21: s.school_url(province) i += 1

转载于:https://www.cnblogs.com/victorstudy/p/11425914.html

你可能感兴趣的文章
[LeetCode] 23. Merge k Sorted Lists
查看>>
Webform(分页、组合查询)
查看>>
Foundation - NSDate
查看>>
geatpy - 遗传和进化算法相关算子的库函数(python)
查看>>
iOS 线程安全
查看>>
mysql 分组之后统计记录条数
查看>>
New STL Algorithms That Will Make A More Productive Developer
查看>>
js 对象 浅拷贝 和 深拷贝
查看>>
初识 python
查看>>
PCL Examples
查看>>
spring boot
查看>>
浏览器URL传参最大长度问题
查看>>
学习进度条
查看>>
Linux crontab 定时任务详解
查看>>
string成员函数
查看>>
onSaveInstanceState()方法问题
查看>>
[转]CocoaChina上一位工程师整理的开发经验(非常nice)
查看>>
大数据时代侦查机制有哪些改变
查看>>
雷林鹏分享:jQuery EasyUI 菜单与按钮 - 创建链接按钮
查看>>
Apache Traffic Server服务搭建
查看>>