pandas——数据透视表

数据透视表import pandas as pdexample = pd.DataFrame({'Amount': [74., 235., 175., 100., 115., 245., 180., 90., 88., 129., 273., 300.],'Category': ['Transportation', 'Grocery', 'Household', 'Entertainment','

One_T.

1624人浏览 · 2022-03-26 20:45:57

One_T. · 2022-03-26 20:45:57 发布

数据透视表

import pandas as pd

example = pd.DataFrame({'Amount': [74., 235., 175., 100., 115., 245., 180., 90., 88., 129., 273., 300.],
              'Category': ['Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment'],
              'Month': ['January', 'January', 'January', 'January', 
                        'February', 'February', 'February', 'February', 
                        'Marth', 'Marth', 'Marth', 'Marth']})
example

	Amount	Category	Month
0	74.0	Transportation	January
1	235.0	Grocery	January
2	175.0	Household	January
3	100.0	Entertainment	January
4	115.0	Transportation	February
5	245.0	Grocery	February
6	180.0	Household	February
7	90.0	Entertainment	February
8	88.0	Transportation	Marth
9	129.0	Grocery	Marth
10	273.0	Household	Marth
11	300.0	Entertainment	Marth

上面这个数据表感觉非常的杂乱，数据透视的意思就是按照我们的意思重新组织这张数据表

new_example = example.pivot(index='Category', columns='Month', values='Amount')

index 表示以原先数据的哪个属性的值为行坐标进行分类 columns 表示以原先数据的哪个属性的值为新的属性

values 表示以原先数据的哪个属性的值填充前面两个 index 和 columns 定义出的新表格

new_example

Month	February	January	Marth
Category
Entertainment	90.0	100.0	300.0
Grocery	245.0	235.0	129.0
Household	180.0	175.0	273.0
Transportation	115.0	74.0	88.0

new_example.sum(axis=1)

Category
Entertainment     490.0
Grocery           609.0
Household         628.0
Transportation    277.0
dtype: float64

new_example.sum(axis=0)

Month
February    630.0
January     584.0
Marth       790.0
dtype: float64

经过重新组织后，数据表的呈现方式更有意义

df = pd.read_csv('../../datasets/titanic/test.csv')
df.head(5)

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.8292	NaN	Q
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	7.0000	NaN	S
2	894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	9.6875	NaN	Q
3	895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	8.6625	NaN	S
4	896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.0	1	1	3101298	12.2875	NaN	S

需求：统计男女在不同船舱等级的票价

df.pivot_table(index='Sex', columns='Pclass', values='Fare')   # 默认就是求平均值

Pclass	1	2	3
Sex
female	115.591168	26.438750	13.735129
male	75.586551	20.184654	11.826350

df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='min')  # 求最小值

Pclass	1	2	3
Sex
female	25.7	10.5000	6.9500
male	0.0	9.6875	3.1708

需求：统计不同船舱等级里男女的平均年龄

df.pivot_table(index='Pclass', columns='Sex', values='Age')

Sex	female	male
Pclass
1	41.333333	40.520000
2	24.376552	30.940678
3	23.073400	24.525104

DAMO开发者矩阵

DAMO开发者矩阵，由阿里巴巴达摩院和中国互联网协会联合发起，致力于探讨最前沿的技术趋势与应用成果，搭建高质量的交流与分享平台，推动技术创新与产业应用链接，围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐

KingbaseES数据库：医疗信创潮涌沅江，国产化信创打造医疗新质生产力

DAMO开发者矩阵

告别迁移焦虑：金仓KDMS V4带你轻松搞定数据库国产化替代

DAMO开发者矩阵

KingbaseES数据库在常德二院全栈国产化信创中产生新质生产力,医疗信创的部署如何实现，如何操作？

DAMO开发者矩阵

所有评论(0)

查看更多评论

One_T.

@D_Low

已为社区贡献1条内容

pandas——数据透视表

One_T.

数据透视表

上面这个数据表感觉非常的杂乱，数据透视的意思就是按照我们的意思重新组织这张数据表

index 表示以原先数据的哪个属性的值为行坐标进行分类 columns 表示以原先数据的哪个属性的值为新的属性

values 表示以原先数据的哪个属性的值填充前面两个 index 和 columns 定义出的新表格

经过重新组织后，数据表的呈现方式更有意义

需求： 统计男女在不同船舱等级的票价

需求： 统计不同船舱等级里男女的平均年龄

所有评论(0)

One_T.

需求：统计男女在不同船舱等级的票价

需求：统计不同船舱等级里男女的平均年龄