数据分析 如何使用sql_我如何使用SQL分析DVD租赁数据
数据分析 如何使用sqlby Okoh Anita 通过安妮塔(Okoh Anita)我如何使用SQL分析DVD租赁数据 (How I analyzed DVD rental data with SQL)介绍 (Introduction)I recently completed some training in Data Foundation facilitated by Bertel...
数据分析 如何使用sql
by Okoh Anita
通过安妮塔(Okoh Anita)
我如何使用SQL分析DVD租赁数据 (How I analyzed DVD rental data with SQL)
介绍 (Introduction)
I recently completed some training in Data Foundation facilitated by Bertelsmann’s School of Data Science (in partnership with Udacity). For a personal project, I decided to analyze the database for a DVD rental company we will call Rent A Film. Let’s take a look at a case study detailing my process and output.
我最近完成了贝塔斯曼数据科学学院(与Udacity合作)的数据基础培训。 对于个人项目,我决定分析一家DVD租赁公司的数据库,我们将其称为Rent A Film 。 让我们看一个详细研究我的过程和输出的案例研究。
数据集 (Data-set)
I began by taking a look at the database. The database DvdRental has 15 tables. Below are the different tables and a brief description of them.
我首先看一下数据库。 数据库DvdRental具有15个表。 以下是不同的表格及其简要说明。
- actor — contains actors data including first name and last name. 演员-包含演员数据,包括名字和姓氏。
- film — contains films data such as title, release year, length, rating, etc. 电影—包含电影数据,例如标题,发行年份,时长,等级等。
- film_actor — contains the relationships between films and actors. film_actor —包含电影和演员之间的关系。
- category — contains film’s categories data. 类别-包含电影的类别数据。
- film_category — containing the relationships between films and categories. film_category —包含电影和类别之间的关系。
- store — contains the store data including manager staff and address. 商店-包含商店数据,包括经理人员和地址。
- inventory — stores inventory data. 库存-存储库存数据。
- rental — stores rental data. 租赁—存储租赁数据。
- payment — stores customer’s payments. 付款-存储客户的付款。
- staff — stores staff data. 人员-存储人员数据。
- customer — stores customer’s data. 客户-存储客户的数据。
- address — stores address data for staff and customers 地址-存储员工和客户的地址数据
- city — stores the city names. city-存储城市名称。
- country — stores the country names. country-存储国家名称。
Note: I analyzed this database using PostgreSQL. You can get details to install PostgreSQL here and download the DVD rental database here.
注意:我使用PostgreSQL分析了该数据库。 你可以得到细节安装PostgreSQL 这里并下载DVD租赁数据库这里 。
目的与目标 (Objective & Goals)
In this project, I’ll aim to answer the following questions:
在这个项目中,我将致力于回答以下问题:
- What are the top and least rented (in-demand) genres and what are their total sales? 需求最高和租金最低的类型是什么,它们的总销售额是多少?
- Can we know how many distinct users have rented each genre? 我们可以知道每种类型有多少不同的用户租用了吗?
- What is the average rental rate for each genre? (from the highest to the lowest) 每个流派的平均租金是多少? (从最高到最低)
- How many rented films were returned late, early, and on time? 有多少部租借的电影迟,早,按时归还?
-
In which countries does Rent A Film have a presence and what is the customer base in each country? What are the total sales in each country? (from most to least)
租借电影在哪些国家/地区拥有业务,每个国家/地区的客户群是什么? 每个国家的总销售额是多少? (从最大到最小)
-
Who are the top 5 customers per total sales and can we get their details just in case Rent A Film wants to reward them?
谁是销售总额的前5名客户?我们可以获取他们的详细信息,以防Rent A Film想要奖励他们吗?
Before getting started with analyses, I first tried understanding the ERM (Entity Relationship Model) of this database also known as Schema. Here is the Schema below:
在开始分析之前,我首先尝试了解该数据库的ERM(实体关系模型),也称为Schema。 这是下面的模式:
You can view my code on my GitHub profile here.
你可以在我的GitHub的个人资料查看我的代码在这里 。
分析 (Analysis)
To answer the first question “What are the top and least rented (in-demand) genres and what are what are their total sales?”, I first identified with tables I would need to Join, which are:
要回答第一个问题: “需求最高,租金最低的类型是什么?总销售额是多少?” ,我首先确定了需要联接的表,这些表是:
Category >film_Category >film>inventory>rental >customer >payment
分类>电影类别>电影>库存>租金>库斯托梅尔>付款
Below is the query I used to extract to answer the question:
下面是我用来提取来回答问题的查询:
Insights
见解
From the above table, we can draw 3 major insights:
从上表中,我们可以得出3个主要见解:
-
Rent A Film has 16 available genres
租借电影有16种类型
- The sports category seems to be the most rented genre in terms of the number of times being rented, and it also has the highest total sales in terms of money. 就租用次数而言,体育类似乎是租用最多的类型,在金钱方面,它也有最高的总销售额。
- The music category is the least rented genre in terms of the number of times being rented and has the lowest total sales in terms of money 就租用次数而言,音乐类别是租借类型最少的类型,在金钱方面,其总销售额最低
Question 2: Can we know how many distinct users have rented each genre? In short, yes we can.
问题2:我们可以知道每种类型的租户有多少吗? 简而言之,是的,我们可以。
The tables to join are as follows:
要连接的表如下:
Category > film_Category > film > inventory > rental > customer
类别>电影类别>电影>库存>租借 >客户
Below is my query for this question:
以下是我对这个问题的查询:
Insights
见解
I wanted to know how many distinct customers rented each of the genres. One fascinating thing from the query is that although the music genre has the least total rented record, it does not have the least number of distinct customers who rented the genre. The travel genre holds that record.
我想知道有多少不同的客户租用了每种流派。 该查询令人着迷的一件事是,尽管音乐流派的总租借唱片数量最少,但其租借该流派的独特客户数量却最少。 旅行类型保持该记录。
By taking a step back and connecting the insights derived from question 1 and 2, we can say that the travel genre was re-rented more times than the music genre.
通过退后一步并结合从问题1和问题2得出的见解,我们可以说旅行类型的重租次数比音乐类型的重租次数更多。
And of course, the sports genre has the highest number of distinct customers who rented the genre.
当然,体育类型在租用该类型的客户中拥有最多的数量。
Question 3: What is the Average rental rate for each genre? (from the highest to the lowest)
问题3:每种类型的平均租金是多少? (从最高到最低)
The tables to join are as follows:
要连接的表如下:
Category > film_Category >; film
类别>电影类别> ; 电影
Below is my query for this question:
以下是我对这个问题的查询:
Insights
见解
I went ahead to see if the number of times a category has been rented has anything to do the average rental rate of each genre. From the above table, we can easily conclude that average rental rate may not be a factor.
我继续查看某个类别的租借次数是否与每种类型的平均租借率有关。 从上表中,我们可以轻松得出结论,平均租金率可能不是一个因素。
Although the game genre has the lowest average rental rate, it is one of the top five most rented genre. Surprisingly, the Music genre is not the most expensive — Action is, even though the action genre is one of the most rented genres.
尽管游戏类型的平均租金率最低,但它是租金最高的前五种游戏类型之一。 令人惊讶的是,音乐类型不是最昂贵的-动作是,即使动作类型是租借最多的类型之一。
We can safely say that most of the customers are lovers of sport-related movies and are least interested in musical movies.
可以肯定地说,大多数顾客是体育电影的爱好者,对音乐电影的兴趣最小。
Question 4: How many rented films were returned late, early and on time?
问题4:有多少部租借的影片迟到,早到,准时归还?
The tables to join are as follows:
要连接的表如下:
film > inventory > rental
电影>库存>租赁
Insights
见解
The return status of movies is arguably is one the most important aspects to monitor in a DVD rental business. From the above query, 48% of the movies are returned earlier than the due date while 41% of the movies are returned late and 11% arrive on time.
电影的归还状态可以说是DVD租赁业务中最重要的方面之一。 通过以上查询,有48%的电影比到期日期早退回,而有41%的电影被延迟退回,有11%的电影按时到达。
There could be a number of factors for why this could be happening, like the shipping distance of these movies from stores which could be totally out of the control of the customers and so on. We would need to dive deeper into the data to get the gist of the issue.
造成这种情况发生的原因可能有很多,例如这些电影从商店的运输距离可能完全不受客户控制,等等。 我们需要更深入地研究数据,以获取问题的要点。
However, it is wise to note that a significant percentage of movies are returned late. Introducing a penalty fee for late arrival could be an extra source of income and in turn, discourage late returns.
但是,明智的是要注意,有很大比例的电影迟到了。 引入迟到的罚款可能是一种额外的收入来源,反过来又阻止了迟到的回报。
But such a decision can only make sense if we know why the issue is occurring.
但是,只有当我们知道问题发生的原因时,这样的决定才有意义。
Question 5: In which countries do Rent A Film have a presence in and what is the customer base in each country? What are the total sales in each country? (From most to least)
问题5:“租借电影”在哪些国家/地区拥有业务,每个国家/地区的客户群是什么? 每个国家的总销售额是多少? (从最大到最小)
The tables to join are as follows:
要连接的表如下:
Country > City > Address > customer > payment
国家>城市>地址>客户 >付款
See query below:
请参阅下面的查询:
Insights
见解
Rent A Film has a presence in 108 countries with India having the highest customer base of 60 customers and the largest total sales in terms of money. Afghanistan has the smallest total sales in terms of money, even though it is not the only country with the smallest customer base of 1 customer
Rent A Film在全球108个国家/地区拥有业务,印度拥有60个客户,客户群最高,以货币计算,总销售额最大。 就货币而言,阿富汗的总销售额最小,尽管它不是唯一拥有1名客户的客户群最少的国家
Question 6: Who are the top 5 customers per total sales and can we get their detail just in case Rent A Film wants to reward them?
问题6:谁是每笔销售总额的前5名客户?我们可以了解他们的详细信息,以防Rent A Film想要奖励他们吗?
The tables to join are as follows:
要连接的表如下:
Country > City > Address > customer > payment
国家>城市>地址>客户 >付款
See query below:
请参阅下面的查询:
Insights
见解
Assuming we wish to reward or send physical gifts to the top customers, the above table shows their full names, addresses, email etc.
假设我们希望奖励或向顶级客户发送礼物,上表显示了他们的全名,地址,电子邮件等。
This information can be sent to the marketing team of the company in order to use their domain knowledge to decide how to reward them.
可以将这些信息发送给公司的营销团队,以使用他们的领域知识来决定如何奖励他们。
结论 (Conclusion)
In this project, we analyzed data from a DVD rental company we decided to call ‘Rent A Film’ to find insights about the customers and their preference. We got 3 major conclusions:
在这个项目中,我们分析了DVD租赁公司的数据,我们决定将其称为“出租电影”,以找到有关客户及其偏好的见解。 我们得出3个主要结论:
- The company has sport-loving customers and they would be advisable to stock more sport-related films to increase total sales compared to music-related movies. It would be a good idea to increase the average rental rate of sport genre films since it is not a major factor in renting for the customers. This, in turn, increases total revenue. However further analysis needs to be done to conclude on this. 该公司拥有爱好运动的客户,因此建议他们库存更多与运动有关的电影,以增加与音乐有关的电影的总销量。 提高体育类型电影的平均租金率是一个好主意,因为这不是吸引客户租借的主要因素。 反过来,这增加了总收入。 但是,需要做进一步的分析以得出结论。
- There is potential to have an extra source of revenue through a fee on late film returns. 通过延迟电影归还费用有可能获得额外的收入来源。
-
Rent A Film has a presence in 108 countries with India been the largest market in people and revenue. Additionally, 20% of the countries they have presence in contribute to 80% of the total customer base.
Rent A Film在108个国家/地区拥有业务,其中印度是最大的人员和收入市场。 此外,他们在20%的国家/地区中占有80%的总客户群。
P.S Like me, anyone can learn to be a data analyst and if you want to be notified on my next project or updates on my learning, feel free to sign up to my newsletter
PS与我一样,任何人都可以学习成为数据分析师,并且如果您想在我的下一个项目中得到通知或对我的学习进行更新,请随时订阅我的新闻通讯
翻译自: https://www.freecodecamp.org/news/project-1-analyzing-dvd-rentals-with-sql-fd12dd674a64/
数据分析 如何使用sql

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。
更多推荐
所有评论(0)