```My data writing volume is about 1 25Gb s 2 5M c s and the SigNoz Community #support

```My data writing volume is about 1.25Gb/s, 2.5M ...

MaoShu SRE

10/28/2024, 2:05 AM

Copy code

My data writing volume is about 1.25Gb/s, 2.5M c/s, and the data is stored for two days. During the operation, there are many small file problems (Too many parts (300 with average size of 21.80 KiB) in table 'signoz_metrics.samples_v4
)

MaoShu SRE

10/28/2024, 2:06 AM

Copy code

clickhouse uses 12 shard / 1 replica,
collector sets send_batch_size: 100000.
After running for a period of time and 2 days later, there were many mergetree operations in clickhouse. Then the entire system OOM becomes unusable.

MaoShu SRE

10/28/2024, 2:06 AM

Copy code

Can anyone give me some optimization suggestions?

Srikanth Chekuri

10/28/2024, 5:57 PM

What is you collector configuration? How many collectors are you running?

MaoShu SRE

10/29/2024, 1:18 AM

6 collectors are running in 3 nodes。collector config as below： apiVersion: v1 data: otel-collector-config.yaml: |- exporters: clickhouselogsexporter: dsn: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_LOG_DATABASE} timeout: 10s clickhousemetricswrite: endpoint: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_DATABASE} resource_to_telemetry_conversion: enabled: true timeout: 15s clickhousetraces: datasource: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_TRACE_DATABASE} low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING} prometheus: endpoint: 0.0.0.0:8889 extensions: health_check: endpoint: 0.0.0.0:13133 pprof: endpoint: localhost:1777 zpages: endpoint: localhost:55679 processors: batch: send_batch_size: 100000 timeout: 10s k8sattributes: extract: metadata: - k8s.namespace.name - k8s.pod.name - k8s.pod.uid - k8s.pod.start_time - k8s.deployment.name - k8s.node.name filter: node_from_env_var: K8S_NODE_NAME passthrough: false pod_association: - sources: - from: resource_attribute name: k8s.pod.ip - sources: - from: resource_attribute name: k8s.pod.uid - sources: - from: connection probabilistic_sampler: sampling_percentage: 20 resourcedetection: detectors: - env - system system: hostname_sources: - dns - os timeout: 2s signozspanmetrics/cumulative: dimensions: - default: default name: service.namespace - default: default name: deployment.environment - name: signoz.collector.id dimensions_cache_size: 100000 latency_histogram_buckets: - 100us - 1ms - 2ms - 6ms - 10ms - 50ms - 100ms - 250ms - 500ms - 1000ms - 1400ms - 2000ms - 5s - 10s - 20s - 40s - 60s metrics_exporter: clickhousemetricswrite signozspanmetrics/delta: aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA dimensions: - default: default name: service.namespace - default: default name: deployment.environment - name: signoz.collector.id dimensions_cache_size: 100000 latency_histogram_buckets: - 100us - 1ms - 2ms - 6ms - 10ms - 50ms - 100ms - 250ms - 500ms - 1000ms - 1400ms - 2000ms - 5s - 10s - 20s - 40s - 60s metrics_exporter: clickhousemetricswrite receivers: hostmetrics: collection_interval: 30s scrapers: cpu: {} disk: {} filesystem: {} load: {} memory: {} network: {} httplogreceiver/heroku: endpoint: 0.0.0.0:8081 source: heroku httplogreceiver/json: endpoint: 0.0.0.0:8082 source: json jaeger: protocols: grpc: endpoint: 0.0.0.0:14250 thrift_http: endpoint: 0.0.0.0:14268 otlp: protocols: grpc: endpoint: 0.0.0.0:4317 max_recv_msg_size_mib: 16 http: endpoint: 0.0.0.0:4318 otlp/spanmetrics: protocols: grpc: endpoint: localhost:12345 service: extensions: - health_check - zpages - pprof pipelines: logs: exporters: - clickhouselogsexporter processors: - batch receivers: - otlp - httplogreceiver/heroku - httplogreceiver/json metrics: exporters: - clickhousemetricswrite processors: - batch receivers: - otlp metrics/internal: exporters: - clickhousemetricswrite processors: - resourcedetection - k8sattributes - batch receivers: - hostmetrics traces: exporters: - clickhousetraces processors: - signozspanmetrics/cumulative - signozspanmetrics/delta - batch - probabilistic_sampler receivers: - otlp - jaeger telemetry: logs: encoding: json metrics: address: 0.0.0.0:8888 otel-collector-opamp-config.yaml: 'server_endpoint: "ws://signoz-query-service:4320/v1/opamp"'

MaoShu SRE

11/01/2024, 3:08 AM

Copy code

I am still adding logs, and now the write volume is 1.6Gib/s. Can you give me some performance suggestions? My idea is to increase the number of rows written by otel-collector in batches to clickhouse at a time, thereby reducing the number of file merges. I feel that the modified send_batch_size: 100000 configuration is not effective.

Srikanth Chekuri

11/01/2024, 4:50 AM

Which version of SigNoz are you running?

MaoShu SRE

11/01/2024, 6:30 AM

Use helm on eks。CHART VERSION： 0.54.2; APP VERSION: 0.56.0

Srikanth Chekuri

11/01/2024, 8:43 AM

Use the latest version and see if you are still seeing the small parts issue

13 Views

Open in Slack

Previous Next