Hey - weird one, ref upgrades - if we're upgrading manually - can you confirm how schema evolution s...
d
Hey - weird one, ref upgrades - if we're upgrading manually - can you confirm how schema evolution should work for clickhouse? Currently our process is to find your new tagged release, like 0.54.0 - identify any containers that have changed - i.e. collector, query service, migration service, ui service etc. Run a script to cache those containers locally in ECR. We then run the migration container, then redeploy the other containers using that image. It's all worked perfectly, until today.
Copy code
2024-09-12T17:00:59.545Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:code: 10, message: Not found column attributes_bool.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.attribute_keys_bool_final_mv (211bdbc2-196d-43ad-9f49-a31b13627556)", "interval": "3.047923669s"}
I presumed the migration service handled all that for us
Copy code
SHOW CREATE TABLE signoz_logs.attribute_keys_bool_final_mv

CREATE MATERIALIZED VIEW signoz_logs.attribute_keys_bool_final_mv TO signoz_logs.logs_attribute_keys
(
    `name` LowCardinality(String),
    `datatype` String
)
AS SELECT DISTINCT
    arrayJoin(mapKeys(attributes_bool)) AS name,
    'Bool' AS datatype
FROM signoz_logs.logs_v2
ORDER BY name ASC │
that i presume is okay
Copy code
DESCRIBE TABLE logs_v2
<snip>
14. │ attributes_bool      │ Map(LowCardinality(String), Bool)    │              │                    │         │ ZSTD(1)          │                │
</snip>
in fact selecting from that view is working too - so perhaps exporter problem - hmm 🙂
n
Can you share from which version did you upgrade? also can you run this command on clickhouse and share the output
Copy code
select * from system.mutations where is_done=0
d
Morning 🙂
Copy code
ip-192-168-2-148.eu-west-2.compute.internal :) select * from system.mutations where is_done=0 
Ok.
0 rows in set. Elapsed: 0.003 sec.
I was on 0.53 -> 0.54, although deffo did a few upgrades before that too, If i was to have missed a step, i.e say went from 0.51 -> 0.53, does it take that into account? or do i need to 100% make sure i go release by release by release ?
interestingly, if i select * from system.mutations - i only see 17 rows, and all traces, not logs at all.
which is why i'm wondering if i've missed something
The other bit of clarity i'd appreciate is the migration container vs the exporters themselves. I must admit I assumed that the migration containers did all the schema lifting, but trying to re-read about the
DOCKER_MULTI_NODE_CLUSTER
bit i'm not sure - that is only to setup the distributed versions of the schema right? the exporters don't do any schema migration themselves?
n
You have one shard of clickhouse right ? or multiple ones ?
d
i have 1 shard, but 2 servers, replica's
Copy code
<remote_servers>
        <cluster>
            <shard>
               <replica>
                  <host>clickhouse-1.signoz-dev.internal</host>
                  <port>9000</port>
               </replica>
               <replica>
                  <host>clickhouse-2.signoz-dev.internal</host>
                  <port>9000</port>
               </replica>
               <internal_replication>true</internal_replication>
            </shard>
        </cluster>
n
that might be the issue, in the docker deployment we havent’ enabled replicated mergetree by default. When installing did you make some changes in the schema to support replication ?
d
yea i had to retrofit it , i'm using terraform and ECS to deploy to
d
Copy code
{
      name  = "migration_service"
      image = var.ecr_image_schema-migrator
      command = ["--dsn=<tcp://clickhouse-1>.${local.fqdn}:9000", "--replication=true"]
      logConfiguration = {
yep i fspotted that bit when i built it - so that's deffo set
and i have the macro's set
<macros> <shard>01</shard> <replica>01</replica> </macros> <- on box 1 for example, box 2 has 01 & 02 as necessary
n
Got it can you share the schema of the tables in from all the replicas.
Copy code
show create table logs
show create table logs_v2

show create table distributed_logs
show create table distributed_logs_v2

show create table logs_v2_resource
show create table distributed_logs_v2_resource
d
I also have the tables there:
Copy code
4. │ distributed_logs                │
 5. │ distributed_logs_attribute_keys │
 6. │ distributed_logs_resource_keys  │
 7. │ distributed_logs_v2             │
 8. │ distributed_logs_v2_resource    │
 9. │ distributed_tag_attributes      │
10. │ distributed_usage               │
Replica 1 - logs:
Copy code
───────────────────────────────────────────────────────────────────────────────────────────┐
1. │ CREATE TABLE signoz_logs.logs
(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `resources_string_key` Array(String) CODEC(ZSTD(1)),
    `resources_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_string_key` Array(String) CODEC(ZSTD(1)),
    `attributes_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_value` Array(Int64) CODEC(ZSTD(1)),
    `attributes_float64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_float64_value` Array(Float64) CODEC(ZSTD(1)),
    `attributes_bool_key` Array(String) CODEC(ZSTD(1)),
    `attributes_bool_value` Array(Bool) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string_key` Array(String) CODEC(ZSTD(1)),
    `scope_string_value` Array(String) CODEC(ZSTD(1)),
    INDEX body_idx body TYPE ngrambf_v1(4, 60000, 5, 0) GRANULARITY 1,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4,
    INDEX scope_name_idx scope_name TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(timestamp / 1000000000)
ORDER BY (timestamp, id)
TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 │
Replica 1: Logs v2:
Copy code
1. │ CREATE TABLE signoz_logs.logs_v2
(
    `ts_bucket_start` UInt64 CODEC(DoubleDelta, LZ4),
    `resource_fingerprint` String CODEC(ZSTD(1)),
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `attributes_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `attributes_number` Map(LowCardinality(String), Float64) CODEC(ZSTD(1)),
    `attributes_bool` Map(LowCardinality(String), Bool) CODEC(ZSTD(1)),
    `resources_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    INDEX body_idx lower(body) TYPE ngrambf_v1(4, 60000, 5, 0) GRANULARITY 1,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4,
    INDEX scope_name_idx scope_name TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4,
    INDEX attributes_string_idx_key mapKeys(attributes_string) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1,
    INDEX attributes_string_idx_val mapValues(attributes_string) TYPE ngrambf_v1(4, 5000, 2, 0) GRANULARITY 1,
    INDEX attributes_int64_idx_key mapKeys(attributes_number) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1,
    INDEX attributes_int64_idx_val mapValues(attributes_number) TYPE bloom_filter GRANULARITY 1,
    INDEX attributes_bool_idx_key mapKeys(attributes_bool) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(timestamp / 1000000000)
ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id)
TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000)
SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
Replica 1 - Dist logs:
Copy code
───────────────────────────────────────────────────────────────────────────────────────────┐
1. │ CREATE TABLE signoz_logs.distributed_logs
(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `resources_string_key` Array(String) CODEC(ZSTD(1)),
    `resources_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_string_key` Array(String) CODEC(ZSTD(1)),
    `attributes_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_value` Array(Int64) CODEC(ZSTD(1)),
    `attributes_float64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_float64_value` Array(Float64) CODEC(ZSTD(1)),
    `attributes_bool_key` Array(String) CODEC(ZSTD(1)),
    `attributes_bool_value` Array(Bool) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string_key` Array(String) CODEC(ZSTD(1)),
    `scope_string_value` Array(String) CODEC(ZSTD(1))
)
ENGINE = Distributed('cluster', 'signoz_logs', 'logs', cityHash64(id)) │
Replica 1 - dist logs v2:
Copy code
1. │ CREATE TABLE signoz_logs.distributed_logs_v2
(
    `ts_bucket_start` UInt64 CODEC(DoubleDelta, LZ4),
    `resource_fingerprint` String CODEC(ZSTD(1)),
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `attributes_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `attributes_number` Map(LowCardinality(String), Float64) CODEC(ZSTD(1)),
    `attributes_bool` Map(LowCardinality(String), Bool) CODEC(ZSTD(1)),
    `resources_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string` Map(LowCardinality(String), String) CODEC(ZSTD(1))
)
ENGINE = Distributed('cluster', 'signoz_logs', 'logs_v2', cityHash64(id)) │
Replica 1: logs_2_resource:
Copy code
1. │ CREATE TABLE signoz_logs.logs_v2_resource
(
    `labels` String CODEC(ZSTD(5)),
    `fingerprint` String CODEC(ZSTD(1)),
    `seen_at_ts_bucket_start` Int64 CODEC(Delta(8), ZSTD(1)),
    INDEX idx_labels lower(labels) TYPE ngrambf_v1(4, 1024, 3, 0) GRANULARITY 1
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(seen_at_ts_bucket_start / 1000)
ORDER BY (labels, fingerprint, seen_at_ts_bucket_start)
TTL (toDateTime(seen_at_ts_bucket_start) + toIntervalSecond(1296000)) + toIntervalSecond(1800)
SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
Replica 1: distributed_logs_v2_resource:
Copy code
1. │ CREATE TABLE signoz_logs.distributed_logs_v2_resource
(
    `labels` String CODEC(ZSTD(5)),
    `fingerprint` String CODEC(ZSTD(1)),
    `seen_at_ts_bucket_start` Int64 CODEC(Delta(8), ZSTD(1))
)
ENGINE = Distributed('cluster', 'signoz_logs', 'logs_v2_resource', cityHash64(labels, fingerprint)) │
Note: Currently whilst we have a second box, the query service, and exporter datasources are only pointing to box 1. As is the migration service. I was going to ask you, the best practices for this as i couldn't find them. i.e. do i add a NLB and just Round Robin? or Fail Over? or what's best in your opinion with your stack.
I'll go fetch replica 2 now
Replica 2: logs:
Copy code
1. │ CREATE TABLE signoz_logs.logs
(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `resources_string_key` Array(String) CODEC(ZSTD(1)),
    `resources_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_string_key` Array(String) CODEC(ZSTD(1)),
    `attributes_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_value` Array(Int64) CODEC(ZSTD(1)),
    `attributes_float64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_float64_value` Array(Float64) CODEC(ZSTD(1)),
    `attributes_bool_key` Array(String) CODEC(ZSTD(1)),
    `attributes_bool_value` Array(Bool) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string_key` Array(String) CODEC(ZSTD(1)),
    `scope_string_value` Array(String) CODEC(ZSTD(1)),
    INDEX body_idx body TYPE ngrambf_v1(4, 60000, 5, 0) GRANULARITY 1,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4,
    INDEX scope_name_idx scope_name TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(timestamp / 1000000000)
ORDER BY (timestamp, id)
TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 │
Replica 2: logsv2:
Copy code
1. │ CREATE TABLE signoz_logs.logs_v2
(
    `ts_bucket_start` UInt64 CODEC(DoubleDelta, LZ4),
    `resource_fingerprint` String CODEC(ZSTD(1)),
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `attributes_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `attributes_number` Map(LowCardinality(String), Float64) CODEC(ZSTD(1)),
    `attributes_bool` Map(LowCardinality(String), Bool) CODEC(ZSTD(1)),
    `resources_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    INDEX body_idx lower(body) TYPE ngrambf_v1(4, 60000, 5, 0) GRANULARITY 1,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4,
    INDEX scope_name_idx scope_name TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4,
    INDEX attributes_string_idx_key mapKeys(attributes_string) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1,
    INDEX attributes_string_idx_val mapValues(attributes_string) TYPE ngrambf_v1(4, 5000, 2, 0) GRANULARITY 1,
    INDEX attributes_int64_idx_key mapKeys(attributes_number) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1,
    INDEX attributes_int64_idx_val mapValues(attributes_number) TYPE bloom_filter GRANULARITY 1,
    INDEX attributes_bool_idx_key mapKeys(attributes_bool) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(timestamp / 1000000000)
ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id)
TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000)
SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
Replica Distributed Logs:
Copy code
1. │ CREATE TABLE signoz_logs.distributed_logs
(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `resources_string_key` Array(String) CODEC(ZSTD(1)),
    `resources_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_string_key` Array(String) CODEC(ZSTD(1)),
    `attributes_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_value` Array(Int64) CODEC(ZSTD(1)),
    `attributes_float64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_float64_value` Array(Float64) CODEC(ZSTD(1)),
    `attributes_bool_key` Array(String) CODEC(ZSTD(1)),
    `attributes_bool_value` Array(Bool) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string_key` Array(String) CODEC(ZSTD(1)),
    `scope_string_value` Array(String) CODEC(ZSTD(1))
)
ENGINE = Distributed('cluster', 'signoz_logs', 'logs', cityHash64(id)) │
Replica : distributed_logs_v2:
Copy code
1. │ CREATE TABLE signoz_logs.distributed_logs_v2
(
    `ts_bucket_start` UInt64 CODEC(DoubleDelta, LZ4),
    `resource_fingerprint` String CODEC(ZSTD(1)),
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `attributes_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `attributes_number` Map(LowCardinality(String), Float64) CODEC(ZSTD(1)),
    `attributes_bool` Map(LowCardinality(String), Bool) CODEC(ZSTD(1)),
    `resources_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string` Map(LowCardinality(String), String) CODEC(ZSTD(1))
)
ENGINE = Distributed('cluster', 'signoz_logs', 'logs_v2', cityHash64(id)) │
Replica: logs_v2_resource;
Copy code
1. │ CREATE TABLE signoz_logs.logs_v2_resource
(
    `labels` String CODEC(ZSTD(5)),
    `fingerprint` String CODEC(ZSTD(1)),
    `seen_at_ts_bucket_start` Int64 CODEC(Delta(8), ZSTD(1)),
    INDEX idx_labels lower(labels) TYPE ngrambf_v1(4, 1024, 3, 0) GRANULARITY 1
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(seen_at_ts_bucket_start / 1000)
ORDER BY (labels, fingerprint, seen_at_ts_bucket_start)
TTL (toDateTime(seen_at_ts_bucket_start) + toIntervalSecond(1296000)) + toIntervalSecond(1800)
SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
Replica: distributed_logs_v2_resource:
Copy code
1. │ CREATE TABLE signoz_logs.distributed_logs_v2_resource
(
    `labels` String CODEC(ZSTD(5)),
    `fingerprint` String CODEC(ZSTD(1)),
    `seen_at_ts_bucket_start` Int64 CODEC(Delta(8), ZSTD(1))
)
ENGINE = Distributed('cluster', 'signoz_logs', 'logs_v2_resource', cityHash64(labels, fingerprint)) │
n
attribute_keys_bool_final_mv
Can you share for this table as well
d
replica 2:
Copy code
1. │ CREATE MATERIALIZED VIEW signoz_logs.attribute_keys_bool_final_mv TO signoz_logs.logs_attribute_keys
(
    `name` LowCardinality(String),
    `datatype` String
)
AS SELECT DISTINCT
    arrayJoin(mapKeys(attributes_bool)) AS name,
    'Bool' AS datatype
FROM signoz_logs.logs_v2
ORDER BY name ASC │
replica 1:
Copy code
1. │ CREATE MATERIALIZED VIEW signoz_logs.attribute_keys_bool_final_mv TO signoz_logs.logs_attribute_keys
(
    `name` LowCardinality(String),
    `datatype` String
)
AS SELECT DISTINCT
    arrayJoin(mapKeys(attributes_bool)) AS name,
    'Bool' AS datatype
FROM signoz_logs.logs_v2
ORDER BY name ASC │
at first glance they all appear to be consistent - and gpt4 agrees 🙂
n
Yeah it looks correct.
Can you share if there are any more logs from the otel collector ?
d
Whilst i fish that out, realised i pasted my question and statement between all the spam:
Note: Currently whilst we have a second box, the query service, and exporter datasources are only pointing to box 1. As is the migration service.
I was going to ask you, the best practices for this as i couldn't find them. i.e. do i add a NLB and just Round Robin? or Fail Over? or what's best in your opinion with your stack.
My initial thoughts were: • schema migrator deffo should point to just the main replica and let clickhouse do its thing. • query service COULD round robin i guess? unless you already have a way at pointing to multiple replicas i've missed. • exporter service i wasn't sure if i could round robin that and clickhouse-keeper will replica 02->01 as well as 01->02 or if i should just point to 01 and do fail over.
I wrote a little shell script to be able to send data easily and changed it for just 1 collector and logging is in the export pipeline as well as clickhouselogsexporter:
Copy code
$ ./otel_gen.sh -t log -c 1 -s abc -m "abc test"
Creating 1 LOG event(s)...
Sending to endpoint: <https://ingest.signoz.observationdeck.io>
Service name: abc
Successfully sent LOG event. Response: {"partialSuccess":{}}
This resulted in the collector (reduced spam):
Copy code
LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 1}
Resource attributes:
  collector_service
  -> service.name: Str(abc)
LogRecord #0
ObservedTimestamp: 2024-09-13 08:20:25.589427423 +0000 UTC
collector_service
Body: Str(abc test)
.....
So it can see the 1 log file okay. Your exporter then says:
Copy code
exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:code: 10, message: Not found column attributes_bool.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.attribute_keys_bool_final_mv (211bdbc2-196d-43ad-9f49-a31b13627556)", "interval": "5.551237308s"}
And then:
Copy code
exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:code: 10, message: Not found column attributes_bool.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.attribute_keys_bool_final_mv (211bdbc2-196d-43ad-9f49-a31b13627556)", "interval": "5.459340989s"}
Then after a minute or two:
Copy code
2024-09-13T08:25:01.856Z error exporterhelper/queue_sender.go:101 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "no more retries left: StatementSend:code: 10, message: Not found column attributes_bool.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.attribute_keys_bool_final_mv (211bdbc2-196d-43ad-9f49-a31b13627556)", "dropped_items": 1}

<http://go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1|go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1>
/home/runner/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.102.0/exporterhelper/queue_sender.go:101
<http://go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume|go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume>
/home/runner/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.102.0/internal/queue/bounded_memory_queue.go:52
<http://go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1|go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1>
/home/runner/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.102.0/internal/queue/consumers.go:43
In signoz however, the logs arrive, but duplicated lots
image.png
Copy code
SELECT * FROM distributed_logs_v2 WHERE body = 'abc test'
  ┌─ts_bucket_start─┬─resource_fingerprint──────────────────────┬───────────timestamp─┬──observed_timestamp─┬─id──────────────────────────┬─trace_id─┬─span_id─┬─trace_flags─┬─severity_text─┬─severity_number─┬─body─────┬─attributes_string──────────┬─attributes_number─┬─attributes_bool─┬─resources_string───────┬─scope_name─┬─scope_version─┬─scope_string─┐
 1. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo2JomAkYq8qRkyGpANRjpKP │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 2. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo2QNZb1BH8oVS6YB78wiLWX │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 3. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo34m6taZGn1U6U0V0xKXUQa │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 4. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo3FHkbhTV5hk3SU7pgKtgy1 │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 5. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo49rokpXNSlknbyPfJYLe4R │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 6. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo5GbpyEMZltQUK5hnVTCsuz │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 7. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo6Ska2c4N6uNaQ13kumCIwk │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 8. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo7jWcRZkSY5t3v0K34EQq0q │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
 9. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo98hUhWprwO2YgIuaQLX8On │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
10. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo9DgZ3XVLicqXIT0iPTZ90x │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
11. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo9o8BX1EW3RjkRsLwDp3lJm │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
12. │      1726214400 │ service.name=abc;hash=8639156975360962647 │ 1726215625479421772 │ 1726215625589427423 │ 2m0Zo9xbIKoMr7kPaPUOQl9yAG5 │          │         │           0 │ INFO          │               0 │ abc test │ {'event.domain':'example'} │ {}                │ {}              │ {'service.name':'abc'} │            │               │ {}           │
    └─────────────────┴───────────────────────────────────────────┴─────────────────────┴─────────────────────┴─────────────────────────────┴──────────┴─────────┴─────────────┴───────────────┴─────────────────┴──────────┴────────────────────────────┴───────────────────┴─────────────────┴────────────────────────┴────────────┴───────────────┴──────────────┘

12 rows in set. Elapsed: 0.014 sec. Processed 11.04 thousand rows, 4.51 MB (804.42 thousand rows/s., 328.47 MB/s.)
Peak memory usage: 4.23 MiB.
so 1 log generated 12 rows :S
n
Something is wrong on clickhouse only, can you try restarting clickhouse once, also check the logs of clickhouse ?
d
i'll try to restart it - but the only errors in the clickhouse server are things like this:
Copy code
2024.09.13 08:24:17.296544 [ 516027 ] {2ca0f204-c24b-4340-ad76-345ab6e6a8e2} <Error> executeQuery: Code: 10. DB::Exception: Not found column attributes_bool.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.attribute_keys_bool_final_mv (211bdbc2-196d-43ad-9f49-a31b13627556). (NOT_FOUND_COLUMN_IN_BLOCK) (version 24.8.2.3 (official build)) (from 192.168.2.50:47028) (in query: INSERT INTO signoz_logs.distributed_logs_v2 ( ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string ) VALUES), Stack trace (when copying this message, always include the lines below):
restarted node 1 and tried again - same error. restarted node 2 as well now - same error.
n
I am out of ideas r.n. Can you check if this returns empty result
Copy code
SELECT
    database,
    table,
    error_count,
    last_exception,
    data_files,
    data_compressed_bytes
FROM system.distribution_queue
WHERE error_count > 0
d
yep zero rows 🙂
I do find the weird ones \o/
n
@Srikanth Chekuri any ideas on this ? the schema seems to be in a correct state but Clickhouse is complaining.
d
Copy code
clickhouse-server --version
ClickHouse server version 24.8.2.3 (official build).
just going to check my one in china too - for perf - next week i need to build out production for them, hence the questions above about DB.
china is on 0.53 atm
n
Does it have a replicated cluster as well in docker ?
d
yes, they're all the same
image.png
the extra LB in front of Clickhouse is a fib atm, hence the question 🙂
Clickhouse DB's are ec2 instances, the rest is fargate ecs containers
n
One thing you can try is dropping the materialized views and related tables, it won’t cause data logs.
Since you have pasted the create table statements above you can recreate them.
Just drop this
signoz_logs.attribute_keys_bool_final_mv
on both replicas and see if data is ingested properly after that.
d
k thx
n
let me know what happens.
d
Copy code
replica 1: DROP TABLE IF EXISTS signoz_logs.attribute_keys_bool_final_mv;
replica 2: DROP TABLE IF EXISTS signoz_logs.attribute_keys_bool_final_mv;
i did that and tried sending a log - to see without recreating it. Interestingly it now is moaning about:
Copy code
2024-09-13T09:02:48.666Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:code: 10, message: Not found column attributes_string.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.attribute_keys_string_final_mv (44688cda-629a-4b09-a3d1-a564774a84ce)", "interval": "37.013097477s"}
I will add the keys_bool back on now. using the create table
n
can you delete these as well
Copy code
attribute_keys_string_final_mv
attribute_keys_float64_final_mv
these won’t lead to data loss
d
okay - when i re-added the other our error went back to our first one:
Copy code
Not found column attributes_bool.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.attribute_keys_bool_final_mv (96565ac5-0c8c-42d8-86fc-8fcce7355841)", "interval": "6.130953205s"}
interesting. i'll export the schema and delete all 3 now then
okay done - retested and now we get:
Copy code
2024-09-13T09:10:57.302Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:code: 10, message: Not found column resources_string.keys in block. There are only columns: ts_bucket_start, resource_fingerprint, timestamp, observed_timestamp, id, trace_id, span_id, trace_flags, severity_text, severity_number, body, attributes_string, attributes_number, attributes_bool, resources_string, scope_name, scope_version, scope_string: while pushing to view signoz_logs.resource_keys_string_final_mv (d4857f49-db69-4684-9886-68f75be959e3)", "interval": "6.535330254s"}
something is proper broken lol
s
i couldn't follow because the conv is too long. it would be good to get on a call maybe?
n
@Darren Smith if you are available we can get on a call.
d
Hey @Srikanth Chekuri - basically since upgrading to 0.54 - logs have started throwing those export errors and we end up with many dups arriving in clickhouse - so @nitya-signoz - has been helping me walk through comparing the schemas - we have just 2 nodes - replicating and 1 shard. But happy if its easier to jump on a call, appreciate it infact 🙂 - If i can get this a success, they want to use signoz elsewhere in china 🙂
Does huddle work okay? otherwise happy to provide a teams link or whatever - really appreciate it
n
Let’s wait for Srikanth to get back,
d
yea no problem 🙂
s
Let's use zoom?
d
no problem 🙂
Hey folks, hope you have a good weekend! Just wanted to say thank you again for helping yesterday. I replaced the DB with clickhouse-24.1.2.5 now and it's working perfectly. ❤️ I also added the NLB in front of Clickhouse and changed the collectors to use that + query service. Next step - redundancy on query service and i'm all prepped for 1.0 and live next week. Look forward to seeing what you have in store.