数据透视表
import pandas as pd
example = pd.DataFrame({'Amount': [74., 235., 175., 100., 115., 245., 180., 90., 88., 129., 273., 300.],
'Category': ['Transportation', 'Grocery', 'Household', 'Entertainment',
'Transportation', 'Grocery', 'Household', 'Entertainment',
'Transportation', 'Grocery', 'Household', 'Entertainment'],
'Month': ['January', 'January', 'January', 'January',
'February', 'February', 'February', 'February',
'Marth', 'Marth', 'Marth', 'Marth']})
example
|
Amount |
Category |
Month |
0 |
74.0 |
Transportation |
January |
1 |
235.0 |
Grocery |
January |
2 |
175.0 |
Household |
January |
3 |
100.0 |
Entertainment |
January |
4 |
115.0 |
Transportation |
February |
5 |
245.0 |
Grocery |
February |
6 |
180.0 |
Household |
February |
7 |
90.0 |
Entertainment |
February |
8 |
88.0 |
Transportation |
Marth |
9 |
129.0 |
Grocery |
Marth |
10 |
273.0 |
Household |
Marth |
11 |
300.0 |
Entertainment |
Marth |
上面这个数据表感觉非常的杂乱,数据透视的意思就是按照我们的意思重新组织这张数据表
new_example = example.pivot(index='Category', columns='Month', values='Amount')
index 表示以原先数据的哪个属性的值为行坐标进行分类 columns 表示以原先数据的哪个属性的值为新的属性
values 表示以原先数据的哪个属性的值填充前面两个 index 和 columns 定义出的新表格
new_example
Month |
February |
January |
Marth |
Category |
|
|
|
Entertainment |
90.0 |
100.0 |
300.0 |
Grocery |
245.0 |
235.0 |
129.0 |
Household |
180.0 |
175.0 |
273.0 |
Transportation |
115.0 |
74.0 |
88.0 |
new_example.sum(axis=1)
Category
Entertainment 490.0
Grocery 609.0
Household 628.0
Transportation 277.0
dtype: float64
new_example.sum(axis=0)
Month
February 630.0
January 584.0
Marth 790.0
dtype: float64
经过重新组织后,数据表的呈现方式更有意义
df = pd.read_csv('../../datasets/titanic/test.csv')
df.head(5)
|
PassengerId |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
892 |
3 |
Kelly, Mr. James |
male |
34.5 |
0 |
0 |
330911 |
7.8292 |
NaN |
Q |
1 |
893 |
3 |
Wilkes, Mrs. James (Ellen Needs) |
female |
47.0 |
1 |
0 |
363272 |
7.0000 |
NaN |
S |
2 |
894 |
2 |
Myles, Mr. Thomas Francis |
male |
62.0 |
0 |
0 |
240276 |
9.6875 |
NaN |
Q |
3 |
895 |
3 |
Wirz, Mr. Albert |
male |
27.0 |
0 |
0 |
315154 |
8.6625 |
NaN |
S |
4 |
896 |
3 |
Hirvonen, Mrs. Alexander (Helga E Lindqvist) |
female |
22.0 |
1 |
1 |
3101298 |
12.2875 |
NaN |
S |
需求: 统计男女在不同船舱等级的票价
df.pivot_table(index='Sex', columns='Pclass', values='Fare')
Pclass |
1 |
2 |
3 |
Sex |
|
|
|
female |
115.591168 |
26.438750 |
13.735129 |
male |
75.586551 |
20.184654 |
11.826350 |
df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='min')
Pclass |
1 |
2 |
3 |
Sex |
|
|
|
female |
25.7 |
10.5000 |
6.9500 |
male |
0.0 |
9.6875 |
3.1708 |
需求: 统计不同船舱等级里男女的平均年龄
df.pivot_table(index='Pclass', columns='Sex', values='Age')
Sex |
female |
male |
Pclass |
|
|
1 |
41.333333 |
40.520000 |
2 |
24.376552 |
30.940678 |
3 |
23.073400 |
24.525104 |
所有评论(0)