Перейти к основному содержимому
Версия: 2.0.x

ds_hll_count_distinct

Возвращает приблизительное значение агрегатной функции, аналогичное результату COUNT(DISTINCT col). APPROX_COUNT_DISTINCT(expr) — похожая функция.

ds_hll_count_distinct быстрее, чем комбинация COUNT и DISTINCT, и использует память фиксированного размера, поэтому для столбцов с высокой кардинальностью требуется меньше памяти.

Работает медленнее, чем APPROX_COUNT_DISTINCT(expr), но с более высокой точностью, поскольку использует Apache Datasketches. Для получения дополнительной информации см. HyperLogLog Sketches.

Синтаксис

ds_hll_count_distinct(expr, [log_k], [tgt_type])
  • log_k: Целое число. Диапазон [4, 21]. По умолчанию: 17.
  • tgt_type: Допустимые значения: HLL_4, HLL_6 (по умолчанию) и HLL_8.

Примеры

mysql> CREATE TABLE t1 (
-> id BIGINT NOT NULL,
-> province VARCHAR(64),
-> age SMALLINT,
-> dt VARCHAR(10) NOT NULL
-> )
-> DUPLICATE KEY(id)
-> DISTRIBUTED BY HASH(id) BUCKETS 4;
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t1 select generate_series, generate_series, generate_series % 100, "2024-07-24" from table(generate_series(1, 100000));

Query OK, 100000 rows affected (0.29 sec)

mysql> select ds_hll_count_distinct(id), ds_hll_count_distinct(province), ds_hll_count_distinct(age), ds_hll_count_distinct(dt) from t1 order by 1, 2;
+---------------------------+---------------------------------+----------------------------+---------------------------+
| ds_hll_count_distinct(id) | ds_hll_count_distinct(province) | ds_hll_count_distinct(age) | ds_hll_count_distinct(dt) |
+---------------------------+---------------------------------+----------------------------+---------------------------+
| 100090 | 100140 | 100 | 1 |
+---------------------------+---------------------------------+----------------------------+---------------------------+
1 row in set (0.07 sec)

mysql> select ds_hll_count_distinct(id, 21), ds_hll_count_distinct(province, 21), ds_hll_count_distinct(age, 21), ds_hll_count_distinct(dt, 21) from t1 order by 1, 2;
+-------------------------------+-------------------------------------+--------------------------------+-------------------------------+
| ds_hll_count_distinct(id, 21) | ds_hll_count_distinct(province, 21) | ds_hll_count_distinct(age, 21) | ds_hll_count_distinct(dt, 21) |
+-------------------------------+-------------------------------------+--------------------------------+-------------------------------+
| 99995 | 100001 | 100 | 1 |
+-------------------------------+-------------------------------------+--------------------------------+-------------------------------+
1 row in set (0.07 sec)


mysql> select ds_hll_count_distinct(id, 10, "HLL_8"), ds_hll_count_distinct(province, 10, "HLL_8"), ds_hll_count_distinct(age, 10, "HLL_8"), ds_hll_count_distinct(dt, 10, "HLL_8") from t1 order by 1, 2;
+----------------------------------------+----------------------------------------------+-----------------------------------------+----------------------------------------+
| ds_hll_count_distinct(id, 10, 'HLL_8') | ds_hll_count_distinct(province, 10, 'HLL_8') | ds_hll_count_distinct(age, 10, 'HLL_8') | ds_hll_count_distinct(dt, 10, 'HLL_8') |
+----------------------------------------+----------------------------------------------+-----------------------------------------+----------------------------------------+
| 99844 | 101905 | 96 | 1 |
+----------------------------------------+----------------------------------------------+-----------------------------------------+----------------------------------------+
1 row in set (0.09 sec)

Ключевые слова

DS_HLL_COUNT_DISTINCT,APPROX_COUNT_DISTINCT