安装好 ClickHouse 之后,先来个例子感受下如何用的,
创建表 与 导入测试数据
CREATE TABLE trips(`trip_id` UInt32,`vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15),`pickup_date` Date,`pickup_datetime` DateTime,`dropoff_date` Date,`dropoff_datetime` DateTime,`store_and_fwd_flag` UInt8,`rate_code_id` UInt8,`pickup_longitude` Float64,`pickup_latitude` Float64,`dropoff_longitude` Float64,`dropoff_latitude` Float64,`passenger_count` UInt8,`trip_distance` Float64,`fare_amount` Float32,`extra` Float32,`mta_tax` Float32,`tip_amount` Float32,`tolls_amount` Float32,`ehail_fee` Float32,`improvement_surcharge` Float32,`total_amount` Float32,`payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),`trip_type` UInt8,`pickup` FixedString(25),`dropoff` FixedString(25),`cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),`pickup_nyct2010_gid` Int8,`pickup_ctlabel` Float32,`pickup_borocode` Int8,`pickup_ct2010` String,`pickup_boroct2010` String,`pickup_cdeligibil` String,`pickup_ntacode` FixedString(4),`pickup_ntaname` String,`pickup_puma` UInt16,`dropoff_nyct2010_gid` UInt8,`dropoff_ctlabel` Float32,`dropoff_borocode` UInt8,`dropoff_ct2010` String,`dropoff_boroct2010` String,`dropoff_cdeligibil` String,`dropoff_ntacode` FixedString(4),`dropoff_ntaname` String,`dropoff_puma` UInt16)ENGINE = MergeTreePARTITION BY toYYYYMM(pickup_date)ORDER BY pickup_datetime;
插入数据
INSERT INTO tripsSELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{1..2}.gz','TabSeparatedWithNames')
由于我没有安装 client 使用的是 dataGrip 通过 JDBC 连接的 clickhouse(方式可查看 Docker 安装 ClickHouse),所以在执行上面的语句的时候会超时(暂时不知道如何设置超时时间),所以先将文件下载下来,使用 docker 容器的客户端再导入
# 下载有点慢,需要耐心等待,每个文件 78M,下载后然后解压wget https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_1.gzwget https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_2.gz
下载成功之后,将文件复制到 docker 容器里面
# 找到 clickhouse 容器名称➜ clickhouse docker ps | grep clickhouse36705c628e62 clickhouse/clickhouse-server:22.6.4.35 "/entrypoint.sh" About an hour ago Up About an hour 9009/tcp, 0.0.0.0:18123->8123/tcp, 0.0.0.0:19000->9000/tcp clickhouseca43ee3855c5 yandex/clickhouse-server:20.3.9.70 "/entrypoint.sh" 3 weeks ago Up 5 days (healthy) 8123/tcp, 9000/tcp, 9009/tcp sentry-self-hosted-clickhouse-1# 当前目录下有两个文件(解压后的文件)➜ clickhouse lsconfig.yaml data logs trips_1 trips_2# 将文件复制到容器中➜ clickhouse docker cp ./trips_1 clickhouse:/➜ clickhouse docker cp ./trips_2 clickhouse:/# 进入容器➜ clickhouse docker exec -it clickhouse /bin/bash# 这里在容器里面的话,加不加用户名密码都可以root@36705c628e62:/# clickhouse-client --user root --password root --query "INSERT INTO trips FORMAT TabSeparatedWithNames" < trips_1root@36705c628e62:/# clickhouse-client --user root --password root --query "INSERT INTO trips FORMAT TabSeparatedWithNames" < trips_2
现在就可以使用 dataGrip(有如何用 dataGrip 链接) 的控制台查询了
select count() from trips;-- 响应数量为 1999657
dataGrip 中也能直接查看表数据了
下面看看有多快
SELECT DISTINCT(pickup_ntaname) FROM trips
该语句处理 1,999,657 行数据,并返回 190 个值,大约耗时 0.05 秒
default> SELECT DISTINCT(pickup_ntaname) FROM trips[2022-07-28 16:01:52] 在 58 ms (execution: 31 ms, fetching: 27 ms) 内检索到从 1 开始的 190 行
