ClickHouse中的建表语句如下:
CREATE TABLE trace_local on cluster default
`serviceName` LowCardinality(String),
`host` LowCardinality(String),
`ip` String,
`spanName` String,
`spanId` String,
`pid` LowCardinality(String),
`parentSpanId` String,
`ppid` String,
`duration` Int64,
`rpcType` Int32,
`startTime` Int64,
`traceId` String,
`tags.k` Array(String),
`tags.v` Array(String),
`events` String,
KEY trace_idx traceId TYPE range
) ENGINE = MergeTree()
PARTITION BY intDiv(startTime, toInt64(7200000000))
PRIMARY KEY (serviceName, host, ip, pid, spanName)
ORDER BY (serviceName, host, ip, pid, spanName, tags.k);
CREATE TABLE trace on cluster default as trace_local
engine = Distributed(default, default, trace_local, rand());
ClickHouse中的查询语句如下:
select *
from trace
prewhere
traceId ='ccc6084420b76183'
where startTime > 1597968000300000 and startTime < 1598054399099000 settings max_threads = 1;
select count(*) count, spanName as name from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000 and startTime < 1598054399099000
group by spanName
order by count desc limit 1000;
select host as name, count(*) count
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000 and startTime < 1598054399099000
group by host;
select count(*) count, tags.k as name from trace
array join tags.k
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000 and startTime < 1598054399099000
group by tags.k;
select count(*) spancount,
sum(duration) as sumDuration, intDiv(startTime, 1440000000) as timeSel
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000 and startTime < 1598054399099000
group by timeSel;
select count(*) spanCount,
countIf(duration <=1000000), countIf(duration > 1000000), countIf(duration > 3000000)
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000 and startTime < 1598054399099000;
select host, startTime,traceId,spanName,duration,tags.k,tags.v
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000 and startTime < 1598054399099000 limit 1000000;
性能对比如下:
Ontime测试集是ClickHouse官网上推荐的一个分析型查询benchmark,为了更加公证公开地对比ClickHouse和Elasticsearch在分析查询上的性能差异。作者也引入了这个数据集进行测试比对,结果如下所示,ClickHouse在纯分析型查询场景下具有巨大性能优势。Elasticsearch和ClickHouse完整版建表语句和查询下载:聚合分析场景
用户画像场景也是用户比较难选择使用Elasticsearch还是ClickHouse的一个典型场景,该场景的具体特点是超大宽表,大批量更新写入,查询返回的数据量大,筛选条件复杂多变。用户在使用Elasticsearch时遇到的难点问题主要有两个:数据写不进去,导入慢;数据拉不出来,返回大规模明细数据非常慢。针对这个场景,作者根据真实用户的业务场景,mock了一张接近150列的大宽表进行相关的查询测试,具体的查询如下所示,每条查询返回的结果集在10万到100万行级别。Elasticsearch和ClickHouse完整版建表语句和查询下载:用户画像场景
ClickHouse中的查询语句如下:
select user_id
from person_tag
where mock3d_like > 8 and mock3d_consume_content_cnt > 8 and mock_10_day_product_avg_amt < 1 settings append_squashing_after_filter = 1;
select user_id
from person_tag
where mock_7_day_receive_cnt > 8 and like_fitness = 1 and mock14d_share_cnt > 8 settings append_squashing_after_filter = 1;
select user_id
from person_tag
where home_perfer_mock_score > 8 and mock7d_access_homepage_cnt > 8 settings append_squashing_after_filter = 1;
select user_id
from person_tag
where is_send_register_coupon > 8 and mock1d_like > 8 settings append_squashing_after_filter = 1;
select user_id
from person_tag
where like_sports = 1 and like_3c = 1 and sex = 1 and like_dance = 1 and mock1d_share_cnt > 6 settings append_squashing_after_filter = 1;
select user_id
from person_tag
where mock14d_access_homepage_cnt > 8 and like_anime = 1 settings append_squashing_after_filter = 1;
select user_id,offline_ver,is_visitor,mock1d_comment_like,reg_days,mock14d_share_cnt,mock_30_order_avg_delivery_time_cnt,mock7d_comment_cnt,performance_rate,mock3d_valid_user_follow_cnt,mock30d_consume_content_cnt,like_cnt,like_photo,ls90_day_access_days,mock3d_release_trend_cnt,mock14d_access_homepage_range,qutdoor_perfer_mock_score,mock3d_access_homepage_cnt,mock_15_order_avg_delivery_time_cnt,mock7d_release_trend_cnt,like_food,mock30d_follow_topic_cnt,mock7d_is_access_topic,like_music,mock3d_interactive_cnt,mock14d_valid_user_follow_cnt,reg_platform,mock_7_day_lottery_participate_cnt,pre_churn_users,etl_time,like_anime,mock14d_access_homepage_cnt,mock14d_consume_content_cnt,like_travel,like_watches,mock14d_comment_like,ls30_day_access_days,mock14d_release_trend_cnt,ftooeawr_perfer_mock_score,mock7d_valid_user_follow_cnt,beauty_perfer_mock_score
from person_tag
where mock3d_like > 8 and mock3d_consume_content_cnt > 8 and mock_10_day_product_avg_amt < 1 settings append_squashing_after_filter = 1;
查询性能结果对比如下,可以看出Elasticsearch在扫描导出大量结果数据的场景下,性能非常大,返回的结果集越大越慢,其中Q5是查询命中结果集很小的对比case。
在分析查询业务场景中,用户难免会有几个明细点查case,例如根据日志traceId查询明细信息。开源ClickHouse因为没有二级索引能力,在遇到这种情况时,查询性能对比Elasticsearch完全落后。阿里云ClickHouse自研了二级索引能力,补齐了这方面的短板,作者在这里专门加了一个二级索引点查的场景来进行性能对比测试。Elasticsearch和ClickHouse完整版建表语句和查询下载:二级索引点查场景
ClickHouse中的建表语句如下:
CREATE TABLE point_search_test_local on cluster default (
`PRI_KEY` String,
`SED_KEY` String,
`INT_0` UInt32,
`INT_1` UInt32,
`INT_2` UInt32,
`INT_3` UInt32,
`INT_4` UInt32,
`LONG_0` UInt64,
`LONG_1` UInt64,
`LONG_2` UInt64,
`LONG_3` UInt64,
`LONG_4` UInt64,
`STR_0` String,
`STR_1` String,
`STR_2` String,
`STR_3` String,
`STR_4` String,
`FIXSTR_0` FixedString(16),
`FIXSTR_1` FixedString(16),
`FIXSTR_2` FixedString(16),
`FIXSTR_3` FixedString(16),
`FIXSTR_4` FixedString(16),
KEY SED_KEY_IDX SED_KEY Type range
) ENGINE = MergeTree ORDER BY PRI_KEY
SETTINGS index_granularity_bytes = 4096, secondary_key_segment_min_rows = 1000000000, min_rows_for_wide_part = 2000000000;
CREATE TABLE point_search_test on cluster default as point_search_test_local
engine = Distributed(default, default, point_search_test_local, rand());
ClickHouse中的查询模板语句如下:
select * from point_search_test where SED_KEY = 'XXX' settings max_threads = 1;
最终的查询性能对比如下,阿里云ClickHouse具备二级索引能力后,其点查能力完全不弱于Elasticsearch,存储原生支持的二级索引能力,具有极致性能。(阿里云ClickHouse二级索引文档)
前面列举的所有数据集数据,作者都使用了ESSD本地文件导入的方式测试对比了Elasticsearch和ClickHouse的导入性能。ClickHouse可以直接使用ClickHouse-Client读取各种格式的本地文件进行导入,而Elasticsearch则是通过配置Logstash任务。具体耗时结果如下:
Elasticsearch最擅长的主要是完全搜索场景(where过滤后的记录数较少),在内存富裕运行环境下可以展现出非常出色的并发查询能力。但是在大规模数据的分析场景下(where过滤后的记录数较多),ClickHouse凭借极致的列存和向量化计算会有更加出色的并发表现,并且查询支持完备度也更好。ClickHouse的并发处理能力立足于磁盘吞吐,而Elasticsearch的并发处理能力立足于内存Cache,这使得两者的成本区间有很大差异,ClickHouse更加适合低成本、大数据量的分析场景,它能够充分利用磁盘的带宽能力。数据导入和存储成本上,ClickHouse更加具有绝对的优势。
简介: 本文的主旨在于通过彻底剖析ClickHouse和Elasticsearch的内核架构,从原理上讲明白两者的优劣之处,同时会附上一份覆盖多场景的测试报告给读者作为参考。
作者:阿里云数据库OLAP产品部 仁劼
Clickhouse是俄罗斯搜索巨头Yandex开发的完全列式存储计算的分析型数据库。ClickHouse在这两年的OLAP领域中一直非常热门,国内互联网大厂都有大规模使用。Elasticsearch是一个近实时的分布式搜索分析引擎,它的底层存储完全构建在Lucene之上。简单来说是通过扩展
优点:凝天小说网 https://www.2792.info
ClickHouse写入吞吐量大,单服务器日志写入量在50MB到200MB/s,每秒写入超过60w记录数,是ES的5倍以上。
查询速度快,官方宣称数据在pagecache中,单服务器查询速率大约在2-30GB/s;没在pagecache的情况下,查询速度取决于磁盘的读取速率和数据的压缩率。。
ClickHouse比ES服务器成本更低。一方面ClickHouse的数据压缩比比ES高,相同数据占用的磁盘空间只有ES的1/3到1/30,节省了磁盘空间.
https://developer.aliyun.com/adc/scenario/4ea641d7eca44034932eebeecfb087aa?spm=a2c6h.14164896.0.0.db784417TfScOX
在右侧远程桌面中打开Chromium浏览器,复制左侧云产品资源中提供的阿里云子用户名称和密码登录控制台。
复制如下阿里云Elasticsearch控制台地址至浏览器地址栏,登录Elasticsearch管理控制台,本实验资源区域限定为华东2(上海)
在右侧概览页面Elasticsear
转自:Elasticsearch和Clickhouse基本查询对比 - 知乎
Elasticsearch 是一个实时的分布式搜索分析引擎,它的底层是构建在Lucene之上的。简单来说是通过扩展Lucene的搜索能力,使其具有分布式的功能。ES通常会和其它两个开源组件logstash(日志采集)和Kibana(仪表盘)一起提供端到端的日志/搜索分析的功能,常常被简称为ELK。
Clickhouse是俄罗斯搜索巨头Yandex开发的面向列式存储的关系型数据库。ClickHouse是过去两年中OLAP领域中最
The APR based Apache Tomcat Native library which allows optimal performance in production 问题的解决
159132
Ubuntu elasticsearch max virtual memory areas vm.max_map_count [65530] is too low, increase to at le
63772