```My data writing volume is about 1.25Gb/s, 2.5M ...
# support
m
Copy code
My data writing volume is about 1.25Gb/s, 2.5M c/s, and the data is stored for two days. During the operation, there are many small file problems (Too many parts (300 with average size of 21.80 KiB) in table 'signoz_metrics.samples_v4
)
Copy code
clickhouse uses 12 shard / 1 replica,
collector sets send_batch_size: 100000.
After running for a period of time and 2 days later, there were many mergetree operations in clickhouse. Then the entire system OOM becomes unusable.
Copy code
Can anyone give me some optimization suggestions?
s
What is you collector configuration? How many collectors are you running?
m
6 collectors are running in 3 nodes。collector config as below: apiVersion: v1 data: otel-collector-config.yaml: |- exporters: clickhouselogsexporter: dsn: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_LOG_DATABASE} timeout: 10s clickhousemetricswrite: endpoint: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_DATABASE} resource_to_telemetry_conversion: enabled: true timeout: 15s clickhousetraces: datasource: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_TRACE_DATABASE} low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING} prometheus: endpoint: 0.0.0.0:8889 extensions: health_check: endpoint: 0.0.0.0:13133 pprof: endpoint: localhost:1777 zpages: endpoint: localhost:55679 processors: batch: send_batch_size: 100000 timeout: 10s k8sattributes: extract: metadata: - k8s.namespace.name - k8s.pod.name - k8s.pod.uid - k8s.pod.start_time - k8s.deployment.name - k8s.node.name filter: node_from_env_var: K8S_NODE_NAME passthrough: false pod_association: - sources: - from: resource_attribute name: k8s.pod.ip - sources: - from: resource_attribute name: k8s.pod.uid - sources: - from: connection probabilistic_sampler: sampling_percentage: 20 resourcedetection: detectors: - env - system system: hostname_sources: - dns - os timeout: 2s signozspanmetrics/cumulative: dimensions: - default: default name: service.namespace - default: default name: deployment.environment - name: signoz.collector.id dimensions_cache_size: 100000 latency_histogram_buckets: - 100us - 1ms - 2ms - 6ms - 10ms - 50ms - 100ms - 250ms - 500ms - 1000ms - 1400ms - 2000ms - 5s - 10s - 20s - 40s - 60s metrics_exporter: clickhousemetricswrite signozspanmetrics/delta: aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA dimensions: - default: default name: service.namespace - default: default name: deployment.environment - name: signoz.collector.id dimensions_cache_size: 100000 latency_histogram_buckets: - 100us - 1ms - 2ms - 6ms - 10ms - 50ms - 100ms - 250ms - 500ms - 1000ms - 1400ms - 2000ms - 5s - 10s - 20s - 40s - 60s metrics_exporter: clickhousemetricswrite receivers: hostmetrics: collection_interval: 30s scrapers: cpu: {} disk: {} filesystem: {} load: {} memory: {} network: {} httplogreceiver/heroku: endpoint: 0.0.0.0:8081 source: heroku httplogreceiver/json: endpoint: 0.0.0.0:8082 source: json jaeger: protocols: grpc: endpoint: 0.0.0.0:14250 thrift_http: endpoint: 0.0.0.0:14268 otlp: protocols: grpc: endpoint: 0.0.0.0:4317 max_recv_msg_size_mib: 16 http: endpoint: 0.0.0.0:4318 otlp/spanmetrics: protocols: grpc: endpoint: localhost:12345 service: extensions: - health_check - zpages - pprof pipelines: logs: exporters: - clickhouselogsexporter processors: - batch receivers: - otlp - httplogreceiver/heroku - httplogreceiver/json metrics: exporters: - clickhousemetricswrite processors: - batch receivers: - otlp metrics/internal: exporters: - clickhousemetricswrite processors: - resourcedetection - k8sattributes - batch receivers: - hostmetrics traces: exporters: - clickhousetraces processors: - signozspanmetrics/cumulative - signozspanmetrics/delta - batch - probabilistic_sampler receivers: - otlp - jaeger telemetry: logs: encoding: json metrics: address: 0.0.0.0:8888 otel-collector-opamp-config.yaml: 'server_endpoint: "ws://signoz-query-service:4320/v1/opamp"'
Copy code
I am still adding logs, and now the write volume is 1.6Gib/s. Can you give me some performance suggestions? My idea is to increase the number of rows written by otel-collector in batches to clickhouse at a time, thereby reducing the number of file merges. I feel that the modified send_batch_size: 100000 configuration is not effective.
s
Which version of SigNoz are you running?
m
Use helm on eks。CHART VERSION: 0.54.2; APP VERSION: 0.56.0
s
Use the latest version and see if you are still seeing the small parts issue