Python爬蟲(chóng)+數(shù)據(jù)分析：分析一下懂車(chē)帝現(xiàn)階段哪款車(chē)值得我們?nèi)_

東西二王 2023-01-26 發(fā)布于重慶

展開(kāi)全文

Python爬蟲(chóng)+數(shù)據(jù)分析：分析一下懂車(chē)帝現(xiàn)階段哪款車(chē)值得我們?nèi)_

2021-11-27 13:15·即將蘇醒的Python

文章目錄

一、寫(xiě)在前面

兄弟們，你們的熱情讓我都不敢斷更了，沖！

爬妹子什么的，雖然大家都很喜歡，但是也不能經(jīng)常去爬對(duì)吧，身體重要，當(dāng)然如果你們有什么好的網(wǎng)站，都可以推薦下，下次我爬完了給你們分享~

網(wǎng)友：其實(shí)就是你自己想看吧

二、準(zhǔn)備工作

1、知識(shí)點(diǎn)

requests 發(fā)送網(wǎng)絡(luò)請(qǐng)求
parsel 解析數(shù)據(jù)
csv 保存數(shù)據(jù)

2、使用的軟件

環(huán)境版本： python3.8
編輯器版本：pycharm2021.2

不會(huì)安裝軟件的看我之前發(fā)的：Python入門(mén)合集

Python安裝/環(huán)境配置/pycharm安裝/基本操作/快捷鍵/永久使用都有

3、第三方庫(kù)

requests
parsel
這些是需要安裝的第三方庫(kù)，直接pip安裝就好了。pip install requests
pip install parsel

安裝慢就使用鏡像源安裝

pip install requests -i https://pypi.tuna./simple/

鏡像源有很多，我這里用的清華的。

實(shí)在不會(huì)安裝模塊看我以前的文章： Python安裝第三方模塊及解決pip下載慢/安裝報(bào)錯(cuò)

三、大致流程

找到目標(biāo)網(wǎng)址
https://www./usedcar/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x?sh_city_name=%E5%85%A8%E5%9B%BD&page=1a 確定我們要采集的目標(biāo) 年份品牌…b 確定數(shù)據(jù)來(lái)源 (靜態(tài)頁(yè)面True 和動(dòng)態(tài)頁(yè)面)
發(fā)送請(qǐng)求
獲取數(shù)據(jù) html網(wǎng)頁(yè)源代碼
解析數(shù)據(jù) re css xpath bs4 …
保存數(shù)據(jù)
數(shù)據(jù)分析簡(jiǎn)單的數(shù)據(jù)可視化推薦功能
工具是不一樣的 anaconda(python解釋器) 里面的 jupyter notebook

四、代碼展示分析

1、爬蟲(chóng)部分

1.1 代碼展示

import requests     # 發(fā)送網(wǎng)絡(luò)請(qǐng)求import parsel       # 解析數(shù)據(jù)import csv          # 保存數(shù)據(jù)csv_dcd = open('dcd.csv', mode='a', encoding='utf-8', newline='')
csv_write = csv.writer(csv_dcd)
csv_write.writerow(['品牌', '車(chē)齡', '里程(萬(wàn)公里)', '城市', '認(rèn)證', '售價(jià)(萬(wàn)元)', '原價(jià)(萬(wàn)元)', '鏈接'])for page in range(1, 168):    # 1. 找到 目標(biāo)網(wǎng)址
    url = f'https://www./usedcar/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x?sh_city_name=%E5%85%A8%E5%9B%BD&page={
   page}'
    # 2. 發(fā)送請(qǐng)求
    # 3. 獲取數(shù)據(jù) html網(wǎng)頁(yè)源代碼
    # <Response [200]>: 請(qǐng)求成功的狀態(tài)碼 訪(fǎng)問(wèn)這個(gè)網(wǎng)站成功了
    html_data = requests.get(url).text    # 4. 解析數(shù)據(jù) re css xpath bs4 ...
    selector = parsel.Selector(html_data)    # get(): 獲取一個(gè)
    # getall(): 獲取全部
    lis = selector.css('#__next > div:nth-child(2) > div.new-main.new > div > div > div.wrap > ul li')    for li in lis:        # 二次提取
        # ::text： 提取文本內(nèi)容
        # 品牌
        title = li.css('a dl dt p::text').get()        # 信息 年份 里程 城市
        # :nth-child(2)：偽類(lèi)選擇器
        info = li.css('a dl dd:nth-child(2)::text').getall()        # info  列表里面有兩個(gè)元素
        # 列表合并為字符串
        info_str = ''.join(info)        # 字符串的分割
        info_list = info_str.split('|')
        car_age = info_list[0]
        mileage = info_list[1].replace('萬(wàn)公里', '')
        city = info_list[2].strip()        # 鏈接
        link = 'https://www.' + li.css('a::attr(href)').get()
        dds = li.css('a dl dd')        # 如果當(dāng)前 有 4個(gè)dd標(biāo)簽
        if len(dds) == 4:            # 懂車(chē)帝認(rèn)證
            dcd_auth = li.css('a dl dd:nth-child(3) span::text').get()
            price = li.css('a dl dd:nth-child(4)::text').get()
            original_price = li.css('a dl dd:nth-child(5)::text').get()        else:
            dcd_auth = '無(wú)認(rèn)證'
            price = li.css('a dl dd:nth-child(3)::text').get()
            original_price = li.css('a dl dd:nth-child(4)::text').get()
        price = price.replace('萬(wàn)', '')
        original_price = original_price.replace('新車(chē)含稅價(jià): ', '').replace('萬(wàn)', '')        print(title, car_age, mileage, city, dcd_auth, price, original_price, link)
        csv_write.writerow([title, car_age, mileage, city, dcd_auth, price, original_price, link])
csv_dcd.close()

2、效果展示

2.1 爬取中

用pycharm打印出來(lái)有點(diǎn)亂碼，它這個(gè)地方是有字體加密了，加密的部分就不顯示，解密今天就先不分享了。

2.2 保存的數(shù)據(jù)

這是保存在Excel里面的數(shù)據(jù)，等下分析就分析這里面保存好的數(shù)據(jù)。

3、數(shù)據(jù)分析部分

3.1 導(dǎo)入模塊

import pandas as pdfrom pyecharts.charts import *from pyecharts.commons.utils import JsCodefrom pyecharts import options as opts

pyecharts 沒(méi)有的話(huà)需要安裝一下

3.2 Pandas數(shù)據(jù)處理

3.21 讀取數(shù)據(jù)

df = pd.read_csv('dcd.csv', encoding = 'utf-8')
df.head()

3.22 查看表格數(shù)據(jù)描述

df.describe()

一共有10000條數(shù)據(jù)

3.23 查看表格是否有數(shù)據(jù)缺失

df.isnull().sum()

3.3 Pyecharts可視化

3.31 Pyecharts可視化

counts = df.groupby('城市')['品牌'].count().sort_values(ascending=False).head(20)

bar=(
    Bar(init_opts=opts.InitOpts(height='500px',width='1000px',theme='dark'))
    .add_xaxis(counts.index.tolist())
    .add_yaxis(        '城市二手車(chē)數(shù)量',
        counts.values.tolist(),
        label_opts=opts.LabelOpts(is_show=True,position='top'),
        itemstyle_opts=opts.ItemStyleOpts(
            color=JsCode("""new echarts.graphic.LinearGradient(
            0, 0, 0, 1,[{offset: 0,color: 'rgb(255,99,71)'}, {offset: 1,color: 'rgb(32,178,170)'}])
            """
            )
        )
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title='各個(gè)城市二手車(chē)數(shù)量柱狀圖'),
            xaxis_opts=opts.AxisOpts(name='書(shū)籍名稱(chēng)',
            type_='category',                                           
            axislabel_opts=opts.LabelOpts(rotate=90),
        ),
        yaxis_opts=opts.AxisOpts(
            name='數(shù)量',
            min_=0,
            max_=1400.0,
            splitline_opts=opts.SplitLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(type_='dash'))
        ),
        tooltip_opts=opts.TooltipOpts(trigger='axis',axis_pointer_type='cross')
    )

    .set_series_opts(
        markline_opts=opts.MarkLineOpts(
            data=[
                opts.MarkLineItem(type_='average',name='均值'),
                opts.MarkLineItem(type_='max',name='最大值'),
                opts.MarkLineItem(type_='min',name='最小值'),
            ]
        )
    )
)
bar.render_notebook()

可以看到成都的二手車(chē)數(shù)量是最多的，遠(yuǎn)超第二。

3.32 各省市二手車(chē)平均價(jià)格柱狀圖

means = df.groupby('城市')['售價(jià)(萬(wàn)元)'].mean().astype('int64').head(20)

bar=(
    Bar(init_opts=opts.InitOpts(height='500px',width='1000px',theme='dark'))
    .add_xaxis(means.index.tolist())
    .add_yaxis(        '城市二手車(chē)平均價(jià)格',
        means.values.tolist(),
        label_opts=opts.LabelOpts(is_show=True,position='top'),
        itemstyle_opts=opts.ItemStyleOpts(
            color=JsCode("""new echarts.graphic.LinearGradient(
            0, 0, 0, 1,[{offset: 0,color: 'rgb(255,99,71)'}, {offset: 1,color: 'rgb(32,178,170)'}])
            """
            )
        )
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title='各個(gè)城市二手車(chē)平均價(jià)格柱狀圖'),
            xaxis_opts=opts.AxisOpts(name='城市名稱(chēng)',
            type_='category',                                           
            axislabel_opts=opts.LabelOpts(rotate=90),
        ),
        yaxis_opts=opts.AxisOpts(
            name='平均價(jià)格',
            min_=0,
            max_=40.0,
            splitline_opts=opts.SplitLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(type_='dash'))
        ),
        tooltip_opts=opts.TooltipOpts(trigger='axis',axis_pointer_type='cross')
    )

    .set_series_opts(
        markline_opts=opts.MarkLineOpts(
            data=[
                opts.MarkLineItem(type_='average',name='均值'),
                opts.MarkLineItem(type_='max',name='最大值'),
                opts.MarkLineItem(type_='min',name='最小值'),
            ]
        )
    )
)
bar.render_notebook()

不過(guò)價(jià)格的話(huà)，成都就比較平均，帝都遙遙領(lǐng)先。

3.33 二手車(chē)品牌占比情況

dcd_pinpai = df['品牌'].apply(lambda x:x.split(' ')[0])
df['品牌'] = dcd_pinpai
pinpai = df['品牌'].value_counts()
pinpai = pinpai[:5]
datas_pair_1 = [[i, int(j)] for i, j in zip(pinpai.index, pinpai.values)]
datas_pair_1

pie1 = (
    Pie(init_opts=opts.InitOpts(theme='dark',width='1000px',height='600px'))
    .add('', datas_pair_1, radius=['35%', '60%'])
    .set_series_opts(label_opts=opts.LabelOpts(formatter=":bh51tjlzh%"))
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title="懂車(chē)帝二手車(chē)\n\n數(shù)量占比區(qū)間", 
            pos_left='center', 
            pos_top='center',
            title_textstyle_opts=opts.TextStyleOpts(
                color='#F0F8FF',
                font_size=20,
                font_weight='bold'
            ),
        )
    )
)
pie1.render_notebook()

以寶馬奧迪這幾款車(chē)型來(lái)看，二手車(chē)品牌占比情況，寶馬比奧迪勝出一籌。

2.34 二手車(chē)?yán)锍虆^(qū)間

def tranform_price(x):
    if x <= 5.0:        return '0~5萬(wàn)公里'
    elif x <= 10.0:        return '5~10萬(wàn)公里'
    elif x <= 15.0:        return '10~15萬(wàn)公里'
    elif x <= 20.0:        return '15~20萬(wàn)公里'
    else:        return '20萬(wàn)公里以上'

df['里程分級(jí)'] = df['里程(萬(wàn)公里)'].apply(lambda x:tranform_price(x))
price_1 = df['里程分級(jí)'].value_counts()
datas_pair_1 = [(i, int(j)) for i, j in zip(price_1.index, price_1.values)]

pie1 = (
    Pie(init_opts=opts.InitOpts(theme='dark',width='1000px',height='600px'))
    .add('', datas_pair_1, radius=['35%', '60%'])
    .set_series_opts(label_opts=opts.LabelOpts(formatter=":bh51tjlzh%"))
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title="懂車(chē)帝二手車(chē)\n\n里程占比區(qū)間", 
            pos_left='center', 
            pos_top='center',
            title_textstyle_opts=opts.TextStyleOpts(
                color='#F0F8FF',
                font_size=20,
                font_weight='bold'
            ),
        )
    )
)
pie1.render_notebook()

基本上都是10公里以?xún)?nèi)的里程，還是非常有搞頭的?？吹梦叶枷肴_兩臺(tái)了~

3.4 二手車(chē)推薦

k_list = []
the_list = []
keyword = input('請(qǐng)輸入品牌：')
data5 = df.loc[df['品牌'].str.contains(str(keyword))]
keyword1 = eval(input('請(qǐng)輸入里程(萬(wàn)公里)上限：'))
data6 = data5[data5['里程(萬(wàn)公里)'] <= keyword1]
city = input('請(qǐng)輸入城市：')
data7 = data6[data6['城市'] == str(city)]
day1 = eval(input('請(qǐng)輸入售價(jià)(萬(wàn)元)下限：'))
day2 = eval(input('請(qǐng)輸入售價(jià)(萬(wàn)元)上限：'))
data8 = data7[(data7['售價(jià)(萬(wàn)元)']>=day1)&(data7['售價(jià)(萬(wàn)元)']<=day2)]
data8

哈哈長(zhǎng)沙居然沒(méi)有奧迪，不給力啊

4、數(shù)據(jù)分析代碼運(yùn)行

數(shù)據(jù)分析代碼的話(huà)，一般都是ipynb格式的，對(duì)于剛學(xué)數(shù)據(jù)分析的兄弟來(lái)說(shuō)，就比較迷茫了，我簡(jiǎn)單分享下。

首先打開(kāi)我們存放代碼的文件夾，然后在地址欄輸入 jupyter notebook 然后按回車(chē)。

如果你實(shí)在找不到代碼存放的位置，右鍵點(diǎn)擊代碼打開(kāi)屬性。

比如我是放在C:\Users\Administrator\Desktop

然后打開(kāi)一個(gè)新的文件窗口，把這個(gè)地址粘貼進(jìn)去按回車(chē)進(jìn)入這個(gè)位置。

繼續(xù)前面講的，我們按回車(chē)之后就會(huì)彈出這個(gè)窗口。

找到你要運(yùn)行的代碼點(diǎn)進(jìn)去就打開(kāi)這個(gè)代碼了

運(yùn)行都是一樣的點(diǎn) run 就好了，運(yùn)行之前你下載的數(shù)據(jù)一定要準(zhǔn)備好，沒(méi)數(shù)據(jù)怎么分析呢，對(duì)吧~

兄弟們，文章看不會(huì)的話(huà)，我把視頻教程放在評(píng)論區(qū)置頂了。

原文
https://blog.csdn.net/fei347795790/article/details/121516389

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶(hù)發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

午夜视频在线网站,日韩视频精品在线,中文字幕精品一区二区三区在线,在线播放精品,1024你懂我懂的旧版人,欧美日韩一级黄色片,一区二区三区在线观看视频

Python爬蟲(chóng)+數(shù)據(jù)分析：分析一下懂車(chē)帝現(xiàn)階段哪款車(chē)值得我們?nèi)_

Python爬蟲(chóng)+數(shù)據(jù)分析：分析一下懂車(chē)帝現(xiàn)階段哪款車(chē)值得我們?nèi)_

文章目錄

一、寫(xiě)在前面

二、準(zhǔn)備工作

1、知識(shí)點(diǎn)

2、使用的軟件

3、第三方庫(kù)

三、大致流程

四、代碼展示分析

1、爬蟲(chóng)部分

1.1 代碼展示

2、效果展示

2.1 爬取中

2.2 保存的數(shù)據(jù)

3、數(shù)據(jù)分析部分

3.1 導(dǎo)入模塊

3.2 Pandas數(shù)據(jù)處理

3.3 Pyecharts可視化

3.4 二手車(chē)推薦

4、數(shù)據(jù)分析代碼運(yùn)行

一、寫(xiě)在前面

二、準(zhǔn)備工作

1、知識(shí)點(diǎn)

3、第三方庫(kù)

三、大致流程

四、代碼展示分析

1、爬蟲(chóng)部分

2、效果展示

3、數(shù)據(jù)分析部分

4、數(shù)據(jù)分析代碼運(yùn)行