CoCalc -- 41.MySQL新特性.md

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

GitHub Repository: jackfrued/Python-100-Days
Path: blob/master/Day36-45/41.MySQL新特性.md
Views: ⁷²⁹

MySQL新特性

JSON类型

很多开发者在使用关系型数据库做数据持久化的时候，常常感到结构化的存储缺乏灵活性，因为必须事先设计好所有的列以及对应的数据类型。在业务发展和变化的过程中，如果需要修改表结构，这绝对是比较麻烦和难受的事情。从 MySQL 5.7 版本开始，MySQL引入了对 JSON 数据类型的支持（MySQL 8.0 解决了 JSON 的日志性能瓶颈问题），用好 JSON 类型，其实就是打破了关系型数据库和非关系型数据库之间的界限，为数据持久化操作带来了更多的便捷。

JSON 类型主要分为 JSON 对象和 JSON数组两种，如下所示。

JSON 对象

{"name": "骆昊", "tel": "13122335566", "QQ": "957658"}

JSON 数组

[1, 2, 3]

[{"name": "骆昊", "tel": "13122335566"}, {"name": "王大锤", "QQ": "123456"}]

哪些地方需要用到JSON类型呢？举一个简单的例子，现在很多产品的用户登录都支持多种方式，例如手机号、微信、QQ、新浪微博等，但是一般情况下我们又不会要求用户提供所有的这些信息，那么用传统的设计方式，就需要设计多个列来对应多种登录方式，可能还需要允许这些列存在空值，这显然不是很好的选择；另一方面，如果产品又增加了一种登录方式，那么就必然要修改之前的表结构，这就更让人痛苦了。但是，有了 JSON 类型，刚才的问题就迎刃而解了，我们可以做出如下所示的设计。

create table `tb_test`
(
`user_id` bigint unsigned,
`login_info` json,
primary key (`user_id`)
) engine=innodb;

insert into `tb_test` values 
    (1, '{"tel": "13122335566", "QQ": "654321", "wechat": "jackfrued"}'),
    (2, '{"tel": "13599876543", "weibo": "wangdachui123"}');

如果要查询用户的手机和微信号，可以用如下所示的 SQL 语句。

select 
    `user_id`,
    json_unquote(json_extract(`login_info`, '$.tel')) as 手机号,
    json_unquote(json_extract(`login_info`, '$.wechat')) as 微信 
from `tb_test`;

+---------+-------------+-----------+
| user_id | 手机号      | 微信       |
+---------+-------------+-----------+
|       1 | 13122335566 | jackfrued |
|       2 | 13599876543 | NULL      |
+---------+-------------+-----------+

因为支持 JSON 类型，MySQL 也提供了配套的处理 JSON 数据的函数，就像上面用到的json_extract和json_unquote。当然，上面的 SQL 还有更为便捷的写法，如下所示。

select 
	`user_id`,
    `login_info` ->> '$.tel' as 手机号,
    `login_info` ->> '$.wechat' as 微信
from `tb_test`;

再举个例子，如果我们的产品要实现用户画像功能（给用户打标签），然后基于用户画像给用户推荐平台的服务或消费品之类的东西，我们也可以使用 JSON 类型来保存用户画像数据，示意代码如下所示。

创建画像标签表。

create table `tb_tags`
(
`tag_id` int unsigned not null comment '标签ID',
`tag_name` varchar(20) not null comment '标签名',
primary key (`tag_id`)
) engine=innodb;

insert into `tb_tags` (`tag_id`, `tag_name`) 
values
    (1, '70后'),
    (2, '80后'),
    (3, '90后'),
    (4, '00后'),
    (5, '爱运动'),
    (6, '高学历'),
    (7, '小资'),
    (8, '有房'),
    (9, '有车'),
    (10, '爱看电影'),
    (11, '爱网购'),
    (12, '常点外卖');

为用户打标签。

create table `tb_users_tags`
(
`user_id` bigint unsigned not null comment '用户ID',
`user_tags` json not null comment '用户标签'
) engine=innodb;

insert into `tb_users_tags` values 
    (1, '[2, 6, 8, 10]'),
    (2, '[3, 10, 12]'),
    (3, '[3, 8, 9, 11]');

接下来，我们通过一组查询来了解 JSON 类型的巧妙之处。

查询爱看电影（有10这个标签）的用户ID。

select `user_id` from `tb_users_tags` where 10 member of (`user_tags`->'$');

查询爱看电影（有10这个标签）的80后（有2这个标签）用户ID。

select `user_id` from `tb_users_tags` where json_contains(`user_tags`->'$', '[2, 10]');

查询爱看电影或80后或90后的用户ID。

select `user_id` from `tb_users_tags` where json_overlaps(user_tags->'$', '[2, 3, 10]');

说明：上面的查询用到了member of谓词和两个 JSON 函数，json_contains可以检查 JSON 数组是否包含了指定的元素，而json_overlaps可以检查 JSON 数组是否与指定的数组有重叠部分。

窗口函数

MySQL 从8.0开始支持窗口函数，大多数商业数据库和一些开源数据库早已提供了对窗口函数的支持，有的也将其称之为 OLAP（联机分析和处理）函数，听名字就知道跟统计和分析相关。为了帮助大家理解窗口函数，我们先说说窗口的概念。

窗口可以理解为记录的集合，窗口函数也就是在满足某种条件的记录集合上执行的特殊函数，对于每条记录都要在此窗口内执行函数。窗口函数和我们上面讲到的聚合函数比较容易混淆，二者的区别主要在于聚合函数是将多条记录聚合为一条记录，窗口函数是每条记录都会执行，执行后记录条数不会变。窗口函数不仅仅是几个函数，它是一套完整的语法，函数只是该语法的一部分，基本语法如下所示：

<窗口函数> over (partition by <用于分组的列名> order by <用于排序的列名>  rows between ... and ...)
<窗口函数> over (partition by <用于分组的列名> order by <用于排序的列名> range between ... and ...)

上面语法中，窗口函数的位置可以放以下两种函数：

专用窗口函数，包括：lead、lag、first_value、last_value、rank、dense_rank和row_number等。
聚合函数，包括：sum、avg、max、min和count等。

下面为大家举几个使用窗口函数的简单例子，我们直接使用上一课创建的 hrs 数据库。

例子1：查询按月薪从高到低排在第4到第6名的员工的姓名和月薪。

select * from (
	select 
		`ename`, `sal`,
		row_number() over (order by `sal` desc) as `rank`
	from `tb_emp`
) `temp` where `rank` between 4 and 6;

说明：上面使用的函数row_number()可以为每条记录生成一个行号，在实际工作中可以根据需要将其替换为rank()或dense_rank()函数，三者的区别可以参考官方文档或阅读《通俗易懂的学会：SQL窗口函数》进行了解。在MySQL 8以前的版本，我们可以通过下面的方式来完成类似的操作。
select `rank`, `ename`, `sal` from (
       select @a:=@a+1 as `rank`, `ename`, `sal` 
       from `tb_emp`, (select @a:=0) as t1 order by `sal` desc
) as `temp` where `rank` between 4 and 6;

例子2：查询每个部门月薪最高的两名的员工的姓名和部门名称。

select `ename`, `sal`, `dname` 
from (
    select 
        `ename`, `sal`, `dno`,
        rank() over (partition by `dno` order by `sal` desc) as `rank`
    from `tb_emp`
) as `temp` natural join `tb_dept` where `rank`<=2;

说明：在MySQL 8以前的版本，我们可以通过下面的方式来完成类似的操作。
select `ename`, `sal`, `dname` from `tb_emp` as `t1` 

natural join tb_dept where ( select count(*) from tb_emp as t2 where t1.dno=t2.dno and t2.sal>t1.sal )<2 order by dno asc, sal desc;

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

MySQL新特性

JSON类型

窗口函数

公用表表达式（CTE）

Product

Resources

Company