Hey all, our company is looking for a monitoring s...
# general
t
Hey all, our company is looking for a monitoring solution and I've brought up a signoz setup locally for testing. I modified the default compose file and adjusted the signoz otel configs. Basically I just want to send some host metrics to signoz to start with. I don't get any errors on container startups and
docker run -it --rm signoz/troubleshoot checkEndpoint --endpoint=<...>
is running fine. But I don't see anything on the signoz frontend/UI. What am I missing? Will link the config setup/files as answer to this msg.
Copy code
version: "2.4"                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

services:
  clickhouse:
    image: clickhouse/clickhouse-server:22.8.8-alpine
    # ports:
    # - "9000:9000"
    # - "8123:8123"
    tty: true
    volumes:
      - ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
      - ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
      # - ./clickhouse-storage.xml:/etc/clickhouse-server/config.d/storage.xml
      - ./data/clickhouse/:/var/lib/clickhouse/
    restart: on-failure
    logging:
      options:
        max-size: 50m
        max-file: "3"
    healthcheck:
      # "clickhouse", "client", "-u ${CLICKHOUSE_USER}", "--password ${CLICKHOUSE_PASSWORD}", "-q 'SELECT 1'"
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3

  alertmanager:
    image: signoz/alertmanager:0.23.0-0.2
    volumes:
      - ./data/alertmanager:/data
    depends_on:
      query-service:
        condition: service_healthy
    restart: on-failure
    command:
      - --queryService.url=<http://query-service:8085>
      - --storage.path=/data

# Notes for Maintainers/Contributors who will change Line Numbers of Frontend & Query-Section. Please Update Line Numbers in `./scripts/commentLinesForSetup.sh` & `./CONTRIBUTING.md`

  query-service:
    image: signoz/query-service:0.11.4
    container_name: query-service
    command: ["-config=/root/config/prometheus.yml"]
    # ports:
    #   - "6060:6060"     # pprof port
    #   - "8080:8080"     # query-service port
    volumes:
      - ./prometheus.yml:/root/config/prometheus.yml
      - ../dashboards:/root/config/dashboards
      - ./data/signoz/:/var/lib/signoz/
    environment:
      - ClickHouseUrl=<tcp://clickhouse:9000/?database=signoz_traces>
      - ALERTMANAGER_API_PREFIX=<http://alertmanager:9093/api/>
      - SIGNOZ_LOCAL_DB_PATH=/var/lib/signoz/signoz.db
      - DASHBOARDS_PATH=/root/config/dashboards
      - STORAGE=clickhouse
      - GODEBUG=netdns=go
      - TELEMETRY_ENABLED=true
      - DEPLOYMENT_TYPE=docker-standalone-amd
    restart: on-failure
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8080/api/v1/version"]
      interval: 30s
      timeout: 5s
      retries: 3
    depends_on:
      clickhouse:
        condition: service_healthy

  frontend:
    image: signoz/frontend:0.11.4
    container_name: frontend
    restart: on-failure
    depends_on:
      - alertmanager
      - query-service
    ports:
      - "3301:3301"
    volumes:
      - ../common/nginx-config.conf:/etc/nginx/conf.d/default.conf

  otel-collector:
    image: signoz/signoz-otel-collector:0.63.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    user: root # required for reading docker container logs
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=signoz-host,os.type=linux
    ports:
      # - "1777:1777"     # pprof extension
      - "4317:4317"     # OTLP gRPC receiver
      - "4318:4318"     # OTLP HTTP receiver
      # - "8888:8888"     # OtelCollector internal metrics
      # - "8889:8889"     # signoz spanmetrics exposed by the agent
      # - "9411:9411"     # Zipkin port
      # - "13133:13133"   # health check extension
      # - "14250:14250"   # Jaeger gRPC
      # - "14268:14268"   # Jaeger thrift HTTP
      # - "55678:55678"   # OpenCensus receiver
      # - "55679:55679"   # zPages extension
    restart: on-failure
    depends_on:
      clickhouse:
        condition: service_healthy
~
I removed the demo app
load-hotrod
and
otel-collector-metrics
(not sure about the latter) from the docker compose config
I adjusted the signoz collector config to:
Copy code
receivers:                                                                                                             
  otlp:
    protocols:
      grpc:                               
        endpoint: 0.0.0.0:4317                       
      http:  
        endpoint: 0.0.0.0:4318
                                                                                                                       
processors:                                                                                                                                                                                                                                   
  batch:                                                                                                               
    send_batch_size: 10000                                                                                             
    send_batch_max_size: 11000                                                                                         
    timeout: 10s                                                                                                       
                                                                                                                       
exporters:                                                                                                             
  clickhousetraces:                                                                                                    
    datasource: <tcp://clickhouse:9000/?database=signoz_traces>                                                          
                                                                                                                       
  clickhousemetricswrite:
    endpoint: <tcp://clickhouse:9000/?database=signoz_metrics>
    resource_to_telemetry_conversion:                                                                         
      enabled: true                                                                                                    
                                                                                                                       
  clickhouselogsexporter:                                                                                              
    dsn: <tcp://clickhouse:9000/>                                                                                        
    timeout: 5s                                                                                                        
    sending_queue:                                                                                                     
      queue_size: 100                                                                                                  
    retry_on_failure:                                                                                                  
      enabled: true                                                                                                    
      initial_interval: 5s                                                                                             
      max_interval: 30s                                                                                                
      max_elapsed_time: 300s                                                                                           
                                                                                                                       
service:                                                                                                               
  pipelines:                                                                                                           
    traces:                                                                                                            
      receivers: [otlp]                                                                                                
      processors: [batch]                                                                                              
      exporters: [clickhousetraces]                                                                                    
    metrics:                                                                                                           
      receivers: [otlp]                                                                                                
      processors: [batch]                                                                                              
      exporters: [clickhousemetricswrite]                                                                              
    logs:                                                                                                              
      receivers: [otlp]                                                                                                
      processors: [batch]                                                                                              
      exporters: [clickhouselogsexporter]
And then I am running a
otel/opentelemetry-collector-contrib:0.66.0
docker image locally with following config:
Copy code
receivers:                                                                                                                                                                                                                                    
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
      memory:

processors:
  batch:
    send_batch_size: 1000
    timeout: 1s

exporters:
  otlp:
    endpoint: 192.168.1.47:4317
    tls:
      insecure: true

service:
  pipelines:                                                                                                           
    metrics/hostmetrics:                                                                                               
      receivers: [hostmetrics]                                                                                         
      processors: [batch]                                                                                              
      exporters: [otlp]
So unsure what I am doing wrong or missing.
p
@Teymour How are you. trying to visualise the infra metrics? Check these dashboards - https://github.com/SigNoz/dashboards/tree/main/hostmetrics
t
@Pranay: thanks, I'll try it out right away. But it looked like the clickhouse db
signoz_metrics
was not getting any data. But I have to double check that, will delete the clickhouse data folder first then retry.
@Pranay: importing/creating the dashboard went fine, but as I suspected there is no data landing in clickhouse. Assuming the db I should look for is
signoz_metrics
Copy code
d4498bd93bf3 :) show databases;

SHOW DATABASES

Query id: 762036c7-9949-4641-b91f-c60874517a3c

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ signoz_logs        │
│ signoz_metrics     │
│ signoz_traces      │
│ system             │
└────────────────────┘

7 rows in set. Elapsed: 0.002 sec. 

d4498bd93bf3 :) use signoz_metrics

USE signoz_metrics

Query id: c42370b4-97a2-43b5-b516-583beb6f9000

Ok.

0 rows in set. Elapsed: 0.001 sec. 

d4498bd93bf3 :) show tables;

SHOW TABLES

Query id: 3b0a069c-910c-4b1e-b684-1ccaf45a44a3

┌─name───────────┐
│ samples_v2     │
│ time_series_v2 │
└────────────────┘

2 rows in set. Elapsed: 0.003 sec. 

d4498bd93bf3 :) select count(*) from samples_v2;

SELECT count(*)
FROM samples_v2

Query id: f713cdd8-1da9-4520-9d40-1e9cdccfbc3f

┌─count()─┐
│       0 │
└─────────┘

1 row in set. Elapsed: 0.002 sec. 

d4498bd93bf3 :) select count(*) from time_series_v2;

SELECT count(*)
FROM time_series_v2

Query id: bf2079ba-b982-4a48-96de-6585d6449af8

┌─count()─┐
│       0 │
└─────────┘

1 row in set. Elapsed: 0.004 sec.
p
ah, curious - are you able to get trace data from the sample app? Can you check if you are able to send data from your app to SigNoz https://signoz.io/docs/install/troubleshooting/
t
Yep, I tried that already with:
Copy code
docker run -it --rm signoz/troubleshoot checkEndpoint --endpoint=192.168.1.47:4317
2022-12-05T12:49:07.751Z	INFO	troubleshoot/main.go:28	STARTING!
2022-12-05T12:49:07.753Z	INFO	checkEndpoint/checkEndpoint.go:41	checking reachability of SigNoz endpoint
2022-12-05T12:49:07.784Z	INFO	troubleshoot/main.go:46	Successfully sent sample data to signoz ...
I suspect it's my local
otel/opentelemetry-collector-contrib:0.66.0
collector setup that is wrong. As I understand it this collector should send data to the signoz collector via port 4317
I'll try this in the meantime:
p
@Srikanth Chekuri do you have more insights on what could be possible issue here?
t
not at my desk right now, but I have a suspicion. I assumed the image
otel/opentelemetry-collector-contrib:0.66.0
was based on a linux box, but it's just a go executable
so I need to mount my local root volume to that container (so that hostmetrics has a the right files/folders to get the metrics from)
at least that's my assumption, will try it out next 😃
p
cool, let us know how it goes
t
@Pranay problem was that I was mounting the wrong location for the config file (for the
otel/opentelemetry-collector-contrib
image).
I am now able to send logs and metrics from a container running
otel/opentelemetry-collector-contrib
to the signoz collector (
signoz/signoz-otel-collector
)
I am just having problems sending resourcedetection context to the metrics. It works fine for the logs though.
as an example, looking at the clickhouse db
logs:
Copy code
Row 1:
──────
timestamp:                1670326695000000000
observed_timestamp:       1670323095350083091
id:                       2IXLVZ5u9kIcqwX24q7fXIA0gNe
trace_id:                 
span_id:                  
trace_flags:              0
severity_text:            info
severity_number:          9
body:                     Joining mDNS multicast group on interface veth01247bf.IPv6 with address fe80::d496:2ff:fe42:216.
resources_string_key:     ['host_name','os_type']
resources_string_value:   ['firemonkey','linux']
attributes_string_key:    ['hostname','proc_id','appname']
attributes_string_value:  ['firemonkey','1114','avahi-daemon']
attributes_int64_key:     ['facility','priority']
attributes_int64_value:   [3,30]
attributes_float64_key:   []
attributes_float64_value: []
has the host_name and os_type
but the metrics data doesn't have the appropriate labels
Copy code
Row 1:
──────
metric_name:  system_cpu_load_average_15m
fingerprint:  10345722606695730287
timestamp_ms: 1670323105697
labels:       {"__name__":"system_cpu_load_average_15m"}
only the name label =/
p
How are you currently sending metrics to SigNoz?
t
using the
otel/opentelemetry-collector-contrib
image with following setup:
Copy code
receivers:
  hostmetrics:
    collection_interval: 10s
    root_path: /hostfs
    scrapers:
      load: {}

  syslog:
    tcp:
      listen_address: "0.0.0.0:54527"
    protocol: rfc3164
    location: UTC                                                                                                      
    operators:                                                                                                         
      - type: move                                                                                                     
        from: attributes.message                                                                                       
        to: body                                                                                                       
                                                                                                                       
processors:                                                                                                            
  batch:                                                                                                               
    send_batch_size: 1000                                                                                              
    timeout: 1s                                                                                                        
                                                                                                                                                                                                                                              
  resourcedetection:
    detectors: [docker, env]
    timeout: 2s
    override: false

exporters:
  otlp:
    endpoint: 192.168.1.47:4317
    tls:
      insecure: true

  logging:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [otlp]

    logs:
      receivers: [syslog]
      processors: [resourcedetection, batch]
      exporters: [otlp]
both containers (the otel contrib and the signoz collector) run on the same machine
am starting the otel contrib container with following options:
Copy code
docker run -v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml -v /:/hostfs -v /var/run/docker.sock:/var/run/docker.sock -p 0.0.0.0:54527:54527/tcp otel/opentelemetry-collector-contrib:0.66.0
And the signoz collector has only 1 receiver configured
here the config:
Copy code
receivers:                                                                                                                                                                                                                                    
  otlp:        
    protocols:              
      grpc:                               
        endpoint: 0.0.0.0:4317
      http:    
        endpoint: 0.0.0.0:4318
              
processors:              
  batch:                              
    send_batch_size: 10000
    send_batch_max_size: 11000                                                                                         
    timeout: 10s                                                                                                       
                                                                                                                       
exporters:                                                                                                             
  clickhousetraces:                                                                                                    
    datasource: <tcp://clickhouse:9000/?database=signoz_traces>                                                          
                                                                                                                       
  clickhousemetricswrite:                                                                                              
    endpoint: <tcp://clickhouse:9000/?database=signoz_metrics>                                                           
                                                                                                                       
  clickhouselogsexporter:                                                                                              
    dsn: <tcp://clickhouse:9000/>                                                                                        
    timeout: 5s                                                                                                        
    sending_queue:                                                                                                     
      queue_size: 100                                                                                                  
    retry_on_failure:                                                                                                  
      enabled: true                                                                                                    
      initial_interval: 5s                                                                                             
      max_interval: 30s                                                                                                
      max_elapsed_time: 300s                                                                                           
                                                                                                                       
service:                                                                                                               
  pipelines:                                                                                                           
    traces:                                                                                                            
      receivers: [otlp]                                                                                                
      processors: [batch]                                                                                              
      exporters: [clickhousetraces]                                                                                    
    metrics:                                                                                                           
      receivers: [otlp]                                                                                                
      processors: [batch]                                                                                              
      exporters: [clickhousemetricswrite]                                                                              
    logs:                                                                                                              
      receivers: [otlp]                                                                                                
      processors: [batch]                                                                                              
      exporters: [clickhouselogsexporter]
it's weird that the resource labels are correctly transferred for the logs, but not the metrics
p
Yeah, pretty weird. Would it be possible to create an issue for this with above details? I don't have any immediate possible solution in mind but can try to look more closely later https://github.com/SigNoz/signoz/issues/new/choose
t
sure, can do
s
Why did you get rid of this under clickhousemetrics exporter in SigNoz otel collector config?
Copy code
resource_to_telemetry_conversion:                                                                         
      enabled: true
t
ah that could be the problem then, I tried to get rid of as many options as possible to keep it as slim as possible. Might have been overzealous 😃
I'll try it with this option
is there a doc for these exporters that detail the options?
I created an issue, if the options fixes the issue (which seems likely looking at the name), I'll close it
looks good, thanks @Srikanth Chekuri!
will close the issue