前言

最近开发的时候遇到一个任务,需要对重复的数据进行筛选,只取插入时间最早的一条数据。这里介绍一下解决这类去重问题的几种思路

先看样例数据

image.png

解决思路一:先group by找到每个人最新的数据插入时间(insert_time),再通过insert_time作为条件表关联的条件筛选出每个人最新的数据

1. 先group by找到每个人最新的数据插入时间(insert_time)

SELECT

T.u_name,

MAX( T.insert_time ) AS t_inserttime

FROM

user_test T

GROUP BY

T.u_name

2. 通过insert_time作为条件表关联的条件筛选出每个人最新的数据

SELECT

T1.id,

T1.u_name,

T1.u_sex,

T1.u_phone,

T1.insert_time,

T1.update_by

FROM

`user_test` T1,

( SELECT T2.u_name, MAX( T2.insert_time ) AS t_inserttime FROM user_test T2 GROUP BY T2.u_name ) T3

WHERE

T1.u_name = T3.u_name

AND T1.insert_time = T3.t_inserttime

结果如下:

image.png

解法2:通过row_number()over()函数解决 (适用于Oracle)

row_number() over()函数的主要功能是分组排序,实现类似group by + order by的效果

SELECT

FI.*

FROM

( SELECT T.*, ROW_NUMBER ( ) OVER ( PARTITION BY T.u_name ORDER BY T.insert_time DESC ) RW FROM user_test T ) FI

WHERE

FI.RW =1

这里partition by实现了根据用户名进行分组,order by对结果集根据插入时间进行排序,row_number()函数将每一组的行数单独标注了出来。最后我们取rw=1的数据,也就取到了每个重复用户数据的最新一条数据。

样板数据参考:

DROP TABLE IF EXISTS `user_test`;

CREATE TABLE `user_test` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`u_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,

`u_sex` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,

`u_phone` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,

`insert_time` datetime NULL DEFAULT NULL,

`update_by` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,

PRIMARY KEY (`id`) USING BTREE

) ENGINE = InnoDB AUTO_INCREMENT = 6 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;

-- ----------------------------

-- Records of user_test

-- ----------------------------

INSERT INTO `user_test` VALUES (1, '小明', '男', '13288888888', '2020-10-28 09:44:16', 'admin');

INSERT INTO `user_test` VALUES (2, '小明', '男', '13288888888', '2020-10-28 09:45:01', 'admin');

INSERT INTO `user_test` VALUES (3, '小明', '男', '13288888888', '2020-10-28 09:45:35', 'admin');

INSERT INTO `user_test` VALUES (4, '小兰', '女', '16896969696', '2020-10-28 09:45:45', 'admin');

INSERT INTO `user_test` VALUES (5, '小兰', '女', '16896969696', '2020-10-28 09:46:14', 'admin');

SET FOREIGN_KEY_CHECKS = 1;

Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐