阿里数据分析岗的秋招SQL题(2019.8.28)

题目

name: table_t001

image-20200908201600557

表table_t001, user_id(用户名)、dt(购买日期)、amt(购买金额),找出购买天数最多的用户和购买金额最多的用户,按要求输出用户id、购买天数、购买金额,备注是购买天数最多还是购买金额最多。

思路

  1. 建立临时表,按user_id 统计购买天数和购买总金额;
  2. 对临时表查找购买天数最大和购买总金额最多的记录;
  3. 相应添加备注。

SQL 初实现

  1. 创建临时表

    -- 临时表 v_tablet001
    
    CREATE VIEW v_tablet001 AS
    SELECT user_id, 
        COUNT(DISTINCT dt) AS dt_count,
        SUM(amt) AS acc_amt
    FROM table_t001
    GROUP BY user_id;
    

    临时表 v_tablet001

    image-20200908201756480

  2. 查找购买天数最大和购买总金额最多的记录,分别备注,两条记录用UNION合并。

    -- 查询
    
    SELECT  user_id, 
            dt_count '购买天数', 
            acc_amt '购买金额', 
            '最大购买天数' AS '备注'
    FROM v_tablet001
    WHERE dt_count = (SELECT MAX(dt_count) FROM v_tablet001)
    UNION
    SELECT  user_id, 
            dt_count '购买天数', 
            acc_amt '购买金额', 
            '最大购买金额' AS '备注' 
    FROM v_tablet001
    WHERE acc_amt = (SELECT MAX(acc_amt) FROM v_tablet001);
    

    image-20200908202010106

题目的严谨性

至此,已经筛选出符合题目条件的记录。

但是,你有没有想过

  1. 如果有「同时是最大购买天数和最大购买金额」的情况呢?
  2. 如何根据情况判断,自动匹配备注,而不是手动提前判断并备注呢?

完善一下题目,

找出购买天数最多的用户和购买金额最多的用户,按要求输出用户id、购买天数、购买金额,备注是「购买天数最多」还是「购买金额最多」还是「同时购买天数和购买金额都最多」

新添加一条数据,创造第三种情况。新数据如下,

image-20200908202147448

更严谨的SQL实现

  1. 同样,建临时表,按user_id 统计购买天数和购买总金额

    -- 建临时表 v_tablet001
    
    CREATE VIEW v_tablet001 AS
    SELECT user_id, 
        COUNT(DISTINCT dt) AS dt_count,
        SUM(amt) AS acc_amt
    FROM table_t001
    GROUP BY user_id;
    
  2. 考虑用CASE WHEN 分情况添加备注。

    -- 查询,CASE WHEN 条件判断
    
    SELECT user_id, dt_count '购买天数', acc_amt '购买金额', note '备注' FROM
    (SELECT *,
        CASE 
        WHEN dt_count = (SELECT MAX(dt_count) FROM v_tablet001) AND 
            acc_amt <> (SELECT MAX(acc_amt) FROM v_tablet001) THEN '最大购买天数'
        WHEN acc_amt = (SELECT MAX(acc_amt) FROM v_tablet001) AND
            dt_count <> (SELECT MAX(dt_count) FROM v_tablet001) THEN '最大购买金额'
        WHEN dt_count = (SELECT MAX(dt_count) FROM v_tablet001 ) AND 
            acc_amt = (SELECT MAX(acc_amt) FROM v_tablet001) THEN '同时购买天数和购买金额都最多'
        END AS note 
    FROM v_tablet001) AS new_table
    WHERE note IS NOT NULL;
    

    image-20200908202530392

SQL优化

  1. 第四部分的代码,在查询「购买天数最多」、「购买金额最多」、「同时购买天数和购买金额都最多」还可以优化,毕竟子查询的效率不够高

    思路:用RANK()分别对购买天数和购买金额逆序排序,筛选排名第一的记录。

    -- 查询,CASE WHEN 条件判断
    
    SELECT user_id, dt_count '购买天数', acc_amt '购买金额', 
        CASE
        WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多'
        WHEN rank1 = 1 AND rank2 <> 1 THEN '最大购买天数'
        WHEN rank2 = 1 AND rank1 <> 1 THEN '最大购买金额'
        END AS note
    FROM
    (SELECT user_id, dt_count, acc_amt, 
        RANK() OVER (ORDER BY dt_count DESC) AS rank1,
        RANK() OVER (ORDER BY acc_amt DESC) AS rank2
    FROM v_tablet001) AS t
    WHERE rank1 = 1 OR rank2 = 1;
    
  2. 如果直接用查询,不产生临时表,效率更高。所以建立临时表,和查询过程可以嵌套在一起,一步完成。

    SELECT user_id, dt_count '购买天数', acc_amt '购买金额', 
    CASE
        WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多'
        WHEN rank1 = 1 AND rank2 <> 1 THEN '最大购买天数'
        WHEN rank2 = 1 AND rank1 <> 1 THEN '最大购买金额'
        END AS note
    FROM
        (SELECT user_id, dt_count, acc_amt, 
            RANK() OVER (ORDER BY dt_count DESC) AS rank1,
            RANK() OVER (ORder BY acc_amt DESC) AS rank2
        FROM 
            (
            SELECT user_id, 
                COUNT(DISTINCT dt) AS dt_count,
                SUM(amt) AS acc_amt
            FROM table_t001
            GROUP BY user_id
            ) AS t1  -- t1对应原本的临时表
        ) AS t2
    WHERE rank1 = 1 OR rank2 = 1;
    

    笔记

    1. 复习rank()dense_rank()row_number() 的区别:

      image-20200908203600495

    2. MySQL 临时表的建立:

      • temporary 创建临时表

        create temporary table tmp_table
        select *
        from table_name;
        
      • view 创建表临时表:

        create view view_name as 
        select *
        from table_name;
        

参考

Update time: 2020-09-08

results matching ""

    No results matching ""