声明:
本篇文章仅针对公开数据进行合法爬取,不违规使用与传播
章节:
这是python项目专栏--第一期(pythonA股实时数据分析)


前言:
本章针对A股数据分析基础读者,本章以获取东方财富网A股数据、哪吒2背后公司光线传媒股票数据为例进行数据分析,
应用场景:像政府采购网站,一般类似于东方财富网的表格数据
详细想要深入了解,可以阅读
python项目专栏--第二期(pythonA股实时数据分析(进阶版))

一、东方财富网A股数据获取
1.东方财富网首页-->沪深京-->沪深京个股-->沪深京A股

2025年6月15日这里总共有5723个上市公司的股票数据

2.打开开发者工具-->网络-->全部-->刷新界面-->搜索科力股份-->进入网址-->标头复制链接
构造请求头,请求该数据网页

import requests
import re
url='https://push2.eastmoney.com/api/qt/clist/get?np=1&fltt=1&invt=2&cb=jQuery37109326831773859228_1749957681848&fs=m%3A0%2Bt%3A6%2Cm%3A0%2Bt%3A80%2Cm%3A1%2Bt%3A2%2Cm%3A1%2Bt%3A23%2Cm%3A0%2Bt%3A81%2Bs%3A2048&fields=f12%2Cf13%2Cf14%2Cf1%2Cf2%2Cf4%2Cf3%2Cf152%2Cf5%2Cf6%2Cf7%2Cf15%2Cf18%2Cf16%2Cf17%2Cf10%2Cf8%2Cf9%2Cf23&fid=f3&pn=1&pz=20&po=1&dect=1&ut=fa5fd1943c7b386f172d6893dbfba10b&wbp2u=%7C0%7C0%7C0%7Cweb&_=1749957681852'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36 Edg/137.0.0.0',
         'Referer':'https://quote.eastmoney.com/center/gridlist.html',
         'Cookie':'qgqp_b_id=1f6f4d1fe5f6dc768b0c86464ff4ca12; websitepoptg_api_time=1749954789719; st_si=15357815110768; st_asi=delete; fullscreengg=1; fullscreengg2=1; st_pvi=60193717251270; st_sp=2025-06-15%2010%3A33%3A09; st_inirUrl=https%3A%2F%2Fcn.bing.com%2F; st_sn=7; st_psi=2025061511212222-113200301321-0920417507'}
res=requests.get(url,headers=headers)
print(res.text)

3.根据请求的网页数据,用re正则表达式获取name、Code、price,用for循环打印

name = re.findall('"f14":"(.*?)","f15"', res.text)
code = re.findall('"f12":"(.*?)","f13"', res.text)
new_price = re.findall('"f2":(.*?),"f3"', res.text)
open = re.findall('"f17":(.*?),"f18"', res.text)
close = re.findall('"f18":(.*?),"f23"', res.text)
high = re.findall('"f15":(.*?),"f16"', res.text)
low = re.findall('"f16":(.*?),"f17"', res.text)
volume = re.findall('"f5":(.*?),"f6"', res.text)
amount = re.findall('"f6":(.*?),"f7"', res.text)
amplitude = re.findall('"f7":(.*?),"f8"', res.text)
price_limit = re.findall('"f3":(.*?),"f4"', res.text)
price_limit_amount = re.findall('"f4":(.*?),"f5"', res.text)
turnover_rate = re.findall('"f8":(.*?),"f9"', res.text)
for i in range(len(name)):
    print(name[i],code[i],new_price[i],open[i],close[i],high[i],low[i],volume[i],amount[i],amplitude[i],price_limit[i],price_limit_amount[i],turnover_rate[i])

4.获取所有页的上市股票公司数据,并存储在A股数据获取.csv
url='https://push2.eastmoney.com/api/qt/clist/get?np=1&fltt=1&invt=2&cb=jQuery37109326831773859228_1749957681848&fs=m%3A0%2Bt%3A6%2Cm%3A0%2Bt%3A80%2Cm%3A1%2Bt%3A2%2Cm%3A1%2Bt%3A23%2Cm%3A0%2Bt%3A81%2Bs%3A2048&fields=f12%2Cf13%2Cf14%2Cf1%2Cf2%2Cf4%2Cf3%2Cf152%2Cf5%2Cf6%2Cf7%2Cf15%2Cf18%2Cf16%2Cf17%2Cf10%2Cf8%2Cf9%2Cf23&fid=f3&pn=1&pz=20&po=1&dect=1&ut=fa5fd1943c7b386f172d6893dbfba10b&wbp2u=%7C0%7C0%7C0%7Cweb&_=1749957681852'
通过地址发现pn=1为第一页数据,pn=2为第二页数据,pn=3为第三页数据,一共是第一页到第287页

import requests
import re
import pandas as pd
stock_list=[]
for pn in range(1,288):
    url=f'https://push2.eastmoney.com/api/qt/clist/get?np=1&fltt=1&invt=2&cb=jQuery37109326831773859228_1749957681848&fs=m%3A0%2Bt%3A6%2Cm%3A0%2Bt%3A80%2Cm%3A1%2Bt%3A2%2Cm%3A1%2Bt%3A23%2Cm%3A0%2Bt%3A81%2Bs%3A2048&fields=f12%2Cf13%2Cf14%2Cf1%2Cf2%2Cf4%2Cf3%2Cf152%2Cf5%2Cf6%2Cf7%2Cf15%2Cf18%2Cf16%2Cf17%2Cf10%2Cf8%2Cf9%2Cf23&fid=f3&pn={pn}&pz=20&po=1&dect=1&ut=fa5fd1943c7b386f172d6893dbfba10b&wbp2u=%7C0%7C0%7C0%7Cweb&_=1749957681852'
    headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36 Edg/137.0.0.0',
                 'Referer':'https://quote.eastmoney.com/center/gridlist.html',
                 'Cookie':'qgqp_b_id=1f6f4d1fe5f6dc768b0c86464ff4ca12; websitepoptg_api_time=1749954789719; st_si=15357815110768; st_asi=delete; fullscreengg=1; fullscreengg2=1; st_pvi=60193717251270; st_sp=2025-06-15%2010%3A33%3A09; st_inirUrl=https%3A%2F%2Fcn.bing.com%2F; st_sn=7; st_psi=2025061511212222-113200301321-0920417507'}
    res=requests.get(url,headers=headers)
    # print(res.text)
    name = re.findall('"f14":"(.*?)","f15"', res.text)
    code = re.findall('"f12":"(.*?)","f13"', res.text)
    new_price = re.findall('"f2":(.*?),"f3"', res.text)
    open = re.findall('"f17":(.*?),"f18"', res.text)
    close = re.findall('"f18":(.*?),"f23"', res.text)
    high = re.findall('"f15":(.*?),"f16"', res.text)
    low = re.findall('"f16":(.*?),"f17"', res.text)
    volume = re.findall('"f5":(.*?),"f6"', res.text)
    amount = re.findall('"f6":(.*?),"f7"', res.text)
    amplitude = re.findall('"f7":(.*?),"f8"', res.text)
    price_limit = re.findall('"f3":(.*?),"f4"', res.text)
    price_limit_amount = re.findall('"f4":(.*?),"f5"', res.text)
    turnover_rate = re.findall('"f8":(.*?),"f9"', res.text)
    for i in range(len(name)):
        stock = [name[i],code[i],new_price[i],open[i],close[i],high[i],low[i],volume[i],amount[i],amplitude[i],price_limit[i],price_limit_amount[i],turnover_rate[i]]
        print(stock)
        stock_list.append(stock)
data=pd.DataFrame(stock_list,columns=['公司名称','股票代码','最新价','开盘','收盘','最高','最低','成交量','成交额','振幅','涨跌幅','涨跌额','换手率'])
data.to_csv(r'E:\data_pachong\A股实时数据分析\A股原始底层数据获取.csv',index=False,encoding='gbk')

5.但是有些数据不符合原始数据格式,比如科力股份的最新价41.14,但是我们爬取的是4114
现在我们对爬取的原始数据进行数据处理,可以对爬取的数据用excel进行处理,比如C2/100,振幅=J2/100&"%"

6.当然这里是python专栏,推荐用python进行数据处理

import requests
import re
import pandas as pd
# 创建空列表存储股票数据
stock_list = []

for pn in range(1, 288):  # 遍历288页数据
    url = f'https://push2.eastmoney.com/api/qt/clist/get?np=1&fltt=1&invt=2&cb=jQuery37109326831773859228_1749957681848&fs=m%3A0%2Bt%3A6%2Cm%3A0%2Bt%3A80%2Cm%3A1%2Bt%3A2%2Cm%3A1%2Bt%3A23%2Cm%3A0%2Bt%3A81%2Bs%3A2048&fields=f12%2Cf13%2Cf14%2Cf1%2Cf2%2Cf4%2Cf3%2Cf152%2Cf5%2Cf6%2Cf7%2Cf15%2Cf18%2Cf16%2Cf17%2Cf10%2Cf8%2Cf9%2Cf23&fid=f3&pn={pn}&pz=20&po=1&dect=1&ut=fa5fd1943c7b386f172d6893dbfba10b&wbp2u=%7C0%7C0%7C0%7Cweb&_=1749957681852'
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36 Edg/137.0.0.0',
        'Referer': 'https://quote.eastmoney.com/center/gridlist.html',
        'Cookie': 'qgqp_b_id=1f6f4d1fe5f6dc768b0c86464ff4ca12; websitepoptg_api_time=1749954789719; st_si=15357815110768; st_asi=delete; fullscreengg=1; fullscreengg2=1; st_pvi=60193717251270; st_sp=2025-06-15%2010%3A33%3A09; st_inirUrl=https%3A%2F%2Fcn.bing.com%2F; st_sn=7; st_psi=2025061511212222-113200301321-0920417507'
    }
    
    # 发送请求
    res = requests.get(url, headers=headers, timeout=10)
    res.raise_for_status()  # 检查请求是否成功
    
    # 使用正则表达式提取各字段数据
    name = re.findall('"f14":"(.*?)","f15"', res.text)
    code = re.findall('"f12":"(.*?)","f13"', res.text)
    new_price = re.findall('"f2":(.*?),"f3"', res.text)
    open_price = re.findall('"f17":(.*?),"f18"', res.text)  # 今开
    close_yesterday = re.findall('"f18":(.*?),"f23"', res.text)  # 昨收
    high = re.findall('"f15":(.*?),"f16"', res.text)
    low = re.findall('"f16":(.*?),"f17"', res.text)
    volume = re.findall('"f5":(.*?),"f6"', res.text)
    amount = re.findall('"f6":(.*?),"f7"', res.text)
    amplitude = re.findall('"f7":(.*?),"f8"', res.text)
    price_limit = re.findall('"f3":(.*?),"f4"', res.text)  # 涨跌幅
    price_limit_amount = re.findall('"f4":(.*?),"f5"', res.text)  # 涨跌额
    turnover_rate = re.findall('"f8":(.*?),"f9"', res.text)  # 换手率
    
    # 处理每只股票的数据
    for i in range(len(name)):
        # 转换并格式化数据
        # 价格类数据除以100转换为元
        latest_price = float(new_price[i]) / 100
        open_val = float(open_price[i]) / 100
        close_yes = float(close_yesterday[i]) / 100
        high_val = float(high[i]) / 100
        low_val = float(low[i]) / 100
        
        # 成交量转换为万手(保留2位小数)
        volume_hand = float(volume[i]) / 10000
        
        # 成交额转换为亿元(保留2位小数)
        amount_yuan = float(amount[i]) / 100000000
        
        # 涨跌幅、振幅、换手率除以100转换为百分比值
        change_percent = float(price_limit[i]) / 100
        amplitude_val = float(amplitude[i]) / 100
        turnover_val = float(turnover_rate[i]) / 100
        
        # 涨跌额除以100转换为元
        change_amount = float(price_limit_amount[i]) / 100
        
        # 创建股票数据列表
        stock = [
            name[i], 
            code[i], 
            round(latest_price, 2), 
            round(open_val, 2), 
            round(close_yes, 2), 
            round(high_val, 2), 
            round(low_val, 2), 
            round(volume_hand, 2), 
            round(amount_yuan, 2), 
            round(amplitude_val, 2), 
            round(change_percent, 2), 
            round(change_amount, 2), 
            round(turnover_val, 2)
        ]
        
        stock_list.append(stock)
        print(f"已处理: {name[i]}-{code[i]}")

# 创建DataFrame
columns = [
    '名称', '代码', '最新价(元)', '今开(元)', '昨收(元)', 
    '最高(元)', '最低(元)', '成交量(万手)', '成交额(亿元)', 
    '振幅(%)', '涨跌幅(%)', '涨跌额(元)', '换手率(%)'
]

data = pd.DataFrame(stock_list, columns=columns)

# 保存为CSV文件
save_path = r'E:\data_pachong\A股实时数据分析\A股数据获取.csv'
data.to_csv(save_path, index=False, encoding='gbk')
print(f"数据已保存至: {save_path}")
print(f"共处理 {len(data)} 条股票数据")

7.这里也可以用akshare模块一键从数据库里获取股票数据(相比之前使用 requests 和正则表达式的方法,akshare 的方式更加简洁高效,且数据格式已经处理得很好,无需额外的数据清洗步骤)

import akshare as ak
data=ak.stock_zh_a_spot()
data.to_csv(r'E:\data_pachong\A股实时数据分析\A股实时数据获取akshare.csv',index=False,encoding='gbk')

二、光线传媒股票数据分析

1.这里以获取光线传媒历史股价数据为例(哪吒2爆火,股价有大变动,数据分析更显著)

import akshare as ak
data=ak.stock_zh_a_hist(symbol="300251",period="daily",start_date="20180101",end_date="20250531",adjust="qfq")
data.to_csv(r'E:\data_pachong\A股实时数据分析\光线传媒[300251]历史股价.csv',index=False,encoding='gbk')

2.读取光线传媒[300251]历史股价.csv,并将日期列作为索引

import pandas as pd
df=pd.read_csv(r'E:\data_pachong\A股实时数据分析\光线传媒[300251]历史股价.csv',index_col=0,encoding='gbk')
print(df)

3.这里以收盘价为例,计算收盘的5日均值、10日均值、30日均值、60日均值

import pandas as pd
import matplotlib
matplotlib.use('TkAgg')  # 使用 Tkinter 后端,这是最常用的桌面后端
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = ['SimHei']  # 设置中文字体
plt.figure(figsize=(12,8))
df=pd.read_csv(r'E:\data_pachong\A股实时数据分析\光线传媒[300251]历史股价.csv',index_col=0,encoding='gbk')
#5日均值、10日均值、30日均值、60日均值 走势图
df['收盘'].rolling(window=5).mean().plot(label='5日均值')
df['收盘'].rolling(window=10).mean().plot(label='10日均值')
df['收盘'].rolling(window=30).mean().plot(label='30日均值')
df['收盘'].rolling(window=60).mean().plot(label='60日均值')
plt.legend(loc='best')
plt.show()

4.假如只取开盘、收盘、高价、低价这四个列,因为它们处在同一量级上,所以可以直接进行可视化

import pandas as pd
import matplotlib
matplotlib.use('TkAgg')  # 使用 Tkinter 后端,这是最常用的桌面后端
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = ['SimHei']  # 设置中文字体
df=pd.read_csv(r'E:\data_pachong\A股实时数据分析\光线传媒[300251]历史股价.csv',index_col=0,encoding='gbk')
df=df[['开盘','收盘','最高','最低']]
df.plot(figsize=(12,8))
plt.show()

5.假如取开盘、收盘、高价、低价、成交额这五列,成交额与其它四个维度不在同一量级上,所以要先进行归一化再进行可视化绘图

import pandas as pd
import matplotlib
matplotlib.use('TkAgg')  # 使用 Tkinter 后端,这是最常用的桌面后端
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = ['SimHei']  # 设置中文字体
df=pd.read_csv(r'E:\data_pachong\A股实时数据分析\光线传媒[300251]历史股价.csv',index_col=0,encoding='gbk')
# 成交额单位体量与另外四个维度不在一个体量上
df=df[['开盘','收盘','最高','最低','成交额']]
#归一化
df_max_min=(df-df.min())/(df.max()-df.min())
df_max_min.plot(figsize=(12,8))
plt.show()

比如2025年初,成交额很高,但是股价也很高,买入的人也很多,符合现实2025年初哪吒2爆火,光线传媒股价暴涨的趋势

Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐