Cardinality
Cardinality is the term used to define the uniqueness of data values included in a specific column of a database table. Cardinality can refer to two things in databases–
- Relationship Cardinality – While designing a database, cardinality is something that can be referred to as one-to-one, many-to-many or many-to-one relationships.
- Data Cardinality – This one matters a lot for query performance, here cardinality is referred to as the number of discrete values in a column concerning the number of rows in the table.
Cardinality is usually measured as high or low. If there are a lot of distinct values in a particular column, that means high cardinality; on the contrary, if there are a lot of repeated values means low cardinality. It also impacts the query performance in a way that it affects the query execution plan. Let’s understand with an example How cardinality impacts query performance –
When a customer visits your e-commerce website, browse products, fill subscriptions forms, select products, make payments, etc. All these activities performed by the customer on your website can help different business units like sales, marketing, finance, etc analyze the customer journey and gain insights to retain their customer. To analyze the whole customer journey, they would need to count all the events, for instance –
Cookies on their website from different customers or distinct visits and perform analytics for year-over-year, week-over-week comparison of these metrics across various dimensions to help them understand how their business is performing and how can it be made better.
Now, this is where they go through an impediment. Many large-scale enterprises deal with data having millions of distinct count cardinalities, so they had to go through a lot of trouble while calculating the distinct count cardinalities. Some of them are –
- For massive amounts of data, Overall execution time increases as the size of the data increases.
- To calculate distinct count and store the discrete values you need very high memory, and if you use hashing with compression to reduce memory needs, it will still increase execution time.
How does Kyvos help in solving this issue?
Kyvos had successfully solved this problem for many large-scale enterprises and provided them a solution to calculate both Accurate and Approximate distinct counts.
Kyvos enabled them to slice and dice these values interactively against n number of dimensions using their existing BI tools. It also empowered enterprises to create an optimized OLAP model using both accurate and approximate distinct count measures. This can help enterprises get insights in seconds depending upon the cardinality.
When new data gets added, Kyvos also provide an incremental refresh of the cube which assists you to keep your data and distinct count counters up to date.
« Back to Glossary