Data Warehousing Techniques for High-Cardinality, High-Frequency Time-Series AnalyticsKhrystyna Terletska Citation: Khrystyna Terletska, "Data Warehousing Techniques for High-Cardinality, High-Frequency Time-Series Analytics", Universal Library of Engineering Technology, Volume 02, Issue 03. Copyright: This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. AbstractIn the article, modern methods of organizing cloud storage for analytics of high-frequency (HF) and high-cardinality (HC) time series (TS) are examined, driven by the growth of data volumes reaching millions of points per second. The author substantiates the relevance of the study with the example of forecasts for the increase in the number of IoT devices to 41.6 billion by 2025 and the associated needs for HF-HC-TS analysis demanded in real-time financial, telecommunications, and cloud monitoring systems. The primary objective of this study is to review and contrast DWH approaches in light of the swift HF-HC-TS streams. The author proceeds from the systematization of 16 primary sources. Innovation rests in formalizing a set of methods that unite ingest optimization, late event handling via two timestamps, and a trust window. Key results demonstrate that using an append-only model in conjunction with the Capacitor format ensures linear scalability of streaming ingest and minimizes storage costs through delta encoding and dictionary compression. The introduction of two timestamps and watermarks in Dataflow/Apache Beam enables the correct handling of late-arriving points without requiring the rebuilding of historical data. The article demonstrates that combining architectural features of BigQuery with well-conceived operational practices—from mandatory partition filters and ingest lag control to the judicious choice of clustering columns and avoidance of JavaScript UDFs—creates a stable balance between performance and budget. The suggested set of anti-patterns helps identify and rectify unproductive plans promptly, restoring the system to its optimal working state. The mentioned ways ensure meeting SLA needs for delay and price when dealing with data flows that contain billions of records each day. This article is highly relevant to businesses that involve live analysis and monitoring, encompassing finance, IoT reporting, telecommunications, and building management. Keywords: High-Frequency Time Series, High Cardinality, BigQuery, Partitioning, Clustering, Capacitor, Append-Only, Watermarks, Surrogate Keys, Change Data Capture. Download![]() |
---|