From Denormalization to JOINS: Why ClickHouse Can't Keep Up

Name: From Denormalization to JOINS: Why ClickHouse Can't Keep Up
Uploaded: 2023-12-05T03:35:18+03:00
Duration: 46 min 53 s
Description: From Denormalization to JOINS: Why ClickHouse Can't Keep Up

ClickHouse has long been praised for its performance, but that performance is limited to the local maximum offered by solutions dependent on denormalization. Significant advances in JOIN technology now allow you to ditch denormalization and enjoy record-setting performance improvements in return. Join our data engineering expert, Sida Shen, for this insightful review of what’s new when it comes to JOINs and why now is the time to graduate from denormalization and solutions like ClickHouse. Highlights: ?Why denormalization is required if you are using ClickHouse ?What costs and challenges come with denormalization, especially in real-time analytics ?How StarRocks replaces denormalization with on-the-fly JOINs ?Where the technical differences are between StarRocks and ClickHouse and which is right for you ?If ClickHouse is no longer cutting it, or you’re tired of being held back by denormalization, this webinar offers you a way forward. ----------------------------------------------------------------------------------------------------------------------- Timestamps 00:00 Intro 00: 26 Agenda 01:25 Data Modeling Best Practices - Normalization VS Denormalization 03:41 The Cost of Denormalization 05:58 Complex Real-Time Data Pipeline 07:14 Introducing StarRocks 08:24 SSB Benchmark Test - StarRocks VS. ClickHouse VS. Druid 10:19 TPC-DS Benchmark Test - StarRocks VS. Trino 11:23 Airbnb Case Study 13:22 Tencent Games Case Study 14:57 How Queries Work - From SQL Query to Result 16:42 Query Planning 18:39 ClickHouse Query Planner - Rule-Based Optimizer 19:49 StarRocks Query Planning - Cost-Based Optimizer 21:02 Data Pruning - Global Runtime filter 23:10 Compute Architecture - How Does It Affect JOINs? 23:22 JOIN Related Concept 25:15 How To Execute JOINs at Scale 27:35 Local JOINs - Collocated JOIN 28:19 Distributed JOINs - Broadcast JOIN 29:33 Distributed JOINs - Shuffle JOIN 30:22 Distributed JOINs - Bucket Shuffle JOIN 30:52 Recap: JOIN Strategies 32:07 Compute Architecture - Scatter/Gather, Map Reduce and MPP 34:10 StarRocks Architecture 35:22 StarRocks vs ClickHouse 37:10 Q & A 37:24 How different the query optimizer, including JOIN from Spark optimizer. Was there any motivation from other optimizers while building in StarRocks? 38:27 Why do I see ClickHouse outperform StarRocks on ClickBench when your data say otherwise? 39:21 If the internal storage and the compute node is decoupled, doesn't it increase the network overhead? What is the recommended design? 40:53 Can you speak to the join algorithms and strategies of each database? 43:16 Are there any drawbacks with shuffle join? 44:20 Where can I get the performance benchmarks? 44:52 Is there any active development work for improving StarRocks joins and more generally the optimizer. ----------------------------------------------------------------------------------------------------------------------- Learn more at https://celerdata.com/ Connect with us: LinkedIn: https://www.linkedin.com/company/celerdata/ Twitter: https://twitter.com/celerdata StarRocks GitHub: https://github.com/StarRocks/StarRocks StarRocks Website: https://www.starrocks.io/ Slack: https://starrocks.slack.com/join/shared_invite/zt-z5zxqr0k-U5lrTVlgypRIV8RbnCIAz #/shared-invite/email #DataAnalytics #DataEngineering #RealTimeAnalytics #RealTimeData #OLAP #DataAnalyst #DataEngineer #DataInfrastructure #UserFacingAnalytics #Database #AnalyticalDatabase #Denormalization #DataScience #ClickHouse #ApacheDruid #Trino

12+

17 просмотров

Пожаловаться Нарушение авторских прав

12+

17 просмотров

, чтобы оставлять комментарии