https://signoz.io logo
Title
k

Kaouther Abrougui

07/19/2022, 2:10 PM
Question: are there some benchmarks about performance and scalability of SigNoz traces? How does it compare to jaeger backed by elasticsearch? I see in the architecture that SigNoz is backed by kafka?
a

Ankit Nayan

07/19/2022, 4:40 PM
@Kaouther Abrougui We used to have kafka + druid setup but removed kafka as it was not needed till you hit a huge scale. Comparison with elastic search is interesting, maybe a blog would be helpful. We have done some internal tests and columnar db like clickhouse seems quite performant for trace data. Some companies like uber and cloudflare have move from elastic to clickhouse for logs too. Attaching a screenshot from a talk and link of the talk for perf advantages found for logs.

https://youtu.be/me0fcwn5gXA?t=22773ā–¾

what is the scale you are talking about...handling 10K rps (100K events/s ) without sampling should be easy with signoz using around 20 CPUs. And further scale can be handled with sampling which can reduce the load 100-1000 times
k

Kaouther Abrougui

07/19/2022, 7:15 PM
Thanks for sharing Ankit. 100K events/s seems great... I don't have exact rate/s but I have traces with an average of 150K spans generated within couple minutes
a

Ankit Nayan

07/20/2022, 3:11 AM
150K spans/min should not be an issue
there are some companies who are using signoz at around 200K spans/min
šŸ‘ 1
k

Kaouther Abrougui

07/20/2022, 1:33 PM
@Ankit Nayan where can I increase the CPU to 20?
a

Ankit Nayan

07/20/2022, 1:50 PM
using helm installation? If there are no limits on resources of clickhouse, we need not and 20 CPUs ideally won't be needed for 150K/min ingestion
how much are you ingesting now?
k

Kaouther Abrougui

07/20/2022, 2:03 PM
around 150K/min
yes using helm installation
no resource limit at this moment
was able to use an old image of my service that worked with old collector version (while waiting for newer collector available) and with the default installation on k8 with helm chart the collector pod is crashing, I think due to load, also I see many spans on UI
When I look at the count graph, I can see the upper point at 285K+
I am getting missing span error also, probably due to collector crashing and restarting
Do I need to scale anything to support the expected load in order to avoid missing spans?
a

Ankit Nayan

07/20/2022, 3:12 PM
missing spans can be due to ā€¢ otel collector crashing ā€¢ parent span yet not received. If trace have a few minutes of completion time then this is usually the case
k

Kaouther Abrougui

07/20/2022, 3:12 PM
I saw otel collector crashing and I see 17 restarts, what needs to be done to avoid that?
adding more cpu and memory to the collector? are those configurable?
a

Ankit Nayan

07/20/2022, 3:13 PM
for now let's fix otel-collector crashing. Can you post of logs of the crashing otel-collector
k

Kaouther Abrougui

07/20/2022, 3:17 PM
Ok, thank you I'll change that and run another round
as for logs of crashing collector, what exactly you need to see? the logs of the collector look clean then it crashes and a new pod is restarted, are the collector logs saves somewhere I can fetch?
a

Ankit Nayan

07/20/2022, 3:19 PM
and definitely logs would be very useful.. Usually otel-collector does not crash due to resource limits..probably it will just drop data
k

Kaouther Abrougui

07/20/2022, 3:20 PM
are the logs persisted somewhere?
a

Ankit Nayan

07/20/2022, 3:21 PM
are the logs persisted somewhere?
not for K8s IMO. @Prashant Shahi is this correct?
p

Prashant Shahi

07/20/2022, 3:29 PM
you can only get logs of current pod and previous.
current logs:
OTEL_COLLECTOR_POD=$(kubectl get pods -n platform -o jsonpath={..metadata.name} -l "<http://app.kubernetes.io/component=otel-collector|app.kubernetes.io/component=otel-collector>")

kubectl logs -n platform $OTEL_COLLECTOR_POD
previous logs:
kubectl logs -n platform $OTEL_COLLECTOR_POD --previous
a

Ankit Nayan

07/20/2022, 3:30 PM
so, exit code with errors of last run should be available, right?
k

Kaouther Abrougui

07/20/2022, 3:32 PM
looks clean
this is what I get with previous flag
p

Prashant Shahi

07/20/2022, 3:32 PM
try this:
kubectl get events -n platform
okay. if previous instance of the container looks clean..
k

Kaouther Abrougui

07/20/2022, 3:33 PM
LAST SEEN   TYPE      REASON      OBJECT                                               MESSAGE
40m         Normal    Pulling     pod/signoz-release-otel-collector-678f68755c-fnj24   Pulling image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>"
40m         Warning   BackOff     pod/signoz-release-otel-collector-678f68755c-fnj24   Back-off restarting failed container
49m         Normal    Pulled      pod/signoz-release-otel-collector-678f68755c-fnj24   Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 671.012383ms
30m         Normal    Created     pod/signoz-release-query-service-0                   Created container signoz-release-query-service
30m         Normal    Started     pod/signoz-release-query-service-0                   Started container signoz-release-query-service
30m         Normal    Pulled      pod/signoz-release-query-service-0                   Container image "<http://docker.io/signoz/query-service:0.10.0|docker.io/signoz/query-service:0.10.0>" already present on machine
30m         Warning   BackOff     pod/signoz-release-query-service-0                   Back-off restarting failed container
30m         Warning   Unhealthy   pod/signoz-release-query-service-0                   Readiness probe failed: Get "<http://10.16.169.30:8080/api/v1/version>": dial tcp 10.16.169.30:8080: connect: connection refused
p

Prashant Shahi

07/20/2022, 3:35 PM
query-service seems to be unhealthy before..
can you run this as well?
kubectl describe -n platform pod/$OTEL_COLLECTOR_POD
k

Kaouther Abrougui

07/20/2022, 3:37 PM
kubectl describe -n platform pod/$OTEL_COLLECTOR_POD
W0720 16:35:56.616742   68666 gcp.go:120] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.25+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
Name:         signoz-release-otel-collector-678f68755c-fnj24
Namespace:    platform
Priority:     0
Node:         xxx
Start Time:   Wed, 20 Jul 2022 13:10:00 +0100
Labels:       <http://app.kubernetes.io/component=otel-collector|app.kubernetes.io/component=otel-collector>
              <http://app.kubernetes.io/instance=signoz-release|app.kubernetes.io/instance=signoz-release>
              <http://app.kubernetes.io/name=signoz|app.kubernetes.io/name=signoz>
              pod-template-hash=678f68755c
Annotations:  checksum/config: 7511037609a6822915f6adc83937e8de5da3aceca3ef20b4f83f4cba72d2eaf5
Status:       Running
IP:           10.16.1.79
IPs:
  IP:           10.16.1.79
Controlled By:  ReplicaSet/signoz-release-otel-collector-678f68755c
Init Containers:
  signoz-release-otel-collector-init:
    Container ID:  <containerd://1f1c64a60323017a3b84f6bc576e2de21481a7d23d5aa737bb0a2534eacdc22>d
    Image:         <http://docker.io/busybox:1.35|docker.io/busybox:1.35>
    Image ID:      <http://docker.io/library/busybox@sha256:8c40df61d40166f5791f44b3d90b77b4c7f59ed39a992fd9046886d3126ffa68|docker.io/library/busybox@sha256:8c40df61d40166f5791f44b3d90b77b4c7f59ed39a992fd9046886d3126ffa68>
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until wget --spider -q signoz-release-clickhouse:8123/ping; do echo -e "waiting for clickhouseDB"; sleep 5; done; echo -e "clickhouse ready, starting otel collector now";
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 20 Jul 2022 13:10:03 +0100
      Finished:     Wed, 20 Jul 2022 13:11:12 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5jzhr (ro)
Containers:
  signoz-release-otel-collector:
    Container ID:  <containerd://20b6eb0471ab94c2049f66c3d49b0091b0c9798c0cd36265b4b051dc77cbe65>5
    Image:         <http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>
    Image ID:      <http://docker.io/signoz/otelcontribcol@sha256:f3378be7a69b38ebb03c4cfa941fa35715f927e374519b7051f525a6c5a020c3|docker.io/signoz/otelcontribcol@sha256:f3378be7a69b38ebb03c4cfa941fa35715f927e374519b7051f525a6c5a020c3>
    Port:          <none>
    Host Port:     <none>
    Command:
      /otelcontribcol
      --config=/conf/otel-collector-config.yaml
    State:          Running
      Started:      Wed, 20 Jul 2022 15:52:18 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 20 Jul 2022 15:43:51 +0100
      Finished:     Wed, 20 Jul 2022 15:52:04 +0100
    Ready:          True
    Restart Count:  16
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     200m
      memory:  400Mi
    Environment:
      CLICKHOUSE_HOST:            signoz-release-clickhouse
      CLICKHOUSE_PORT:            9000
      CLICKHOUSE_HTTP_PORT:       8123
      CLICKHOUSE_CLUSTER:         cluster
      CLICKHOUSE_DATABASE:        signoz_metrics
      CLICKHOUSE_TRACE_DATABASE:  signoz_traces
      CLICKHOUSE_USER:            admin
      CLICKHOUSE_PASSWORD:        27ff0399-0d3a-4bd8-919d-17c2181e6fb9
      CLICKHOUSE_SECURE:          false
      CLICKHOUSE_VERIFY:          false
    Mounts:
      /conf from otel-collector-config-vol (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5jzhr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  otel-collector-config-vol:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      signoz-release-otel-collector
    Optional:  false
  kube-api-access-5jzhr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Normal   Pulled   52m                   kubelet  Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 671.012383ms
  Warning  BackOff  43m (x215 over 148m)  kubelet  Back-off restarting failed container
  Normal   Pulling  43m (x17 over 3h24m)  kubelet  Pulling image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>"
p

Prashant Shahi

07/20/2022, 3:38 PM
Got it!
Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 20 Jul 2022 15:43:51 +0100
      Finished:     Wed, 20 Jul 2022 15:52:04 +0100
cc @Ankit Nayan @Srikanth Chekuri
šŸ™Œ 1
previous logs look clean, because it OOMed without any logs..
šŸ‘ 1
k

Kaouther Abrougui

07/20/2022, 3:39 PM
great! so increasing the resource as Ankit pointed earlier could help?
a

Ankit Nayan

07/20/2022, 3:39 PM
yeah
k

Kaouther Abrougui

07/20/2022, 3:40 PM
should I just reinstall the chart again with the --set flags for those pointed parameters?
a

Ankit Nayan

07/20/2022, 3:41 PM
upgrading should work right @Prashant Shahi?
p

Prashant Shahi

07/20/2022, 3:45 PM
yes, that should work.
Though using
override-values.yml
is recommended.
k

Kaouther Abrougui

07/20/2022, 3:46 PM
yes, I prefer using that as well
šŸ‘ 2
p

Prashant Shahi

07/20/2022, 3:46 PM
or else, your future
helm upgrade
commands would overwrite back to defaults
a

Ankit Nayan

07/20/2022, 3:46 PM
yeah..that's a good approach.. overriding changes can be version controlled then
and similarly
kubectl describe ...
would help knowing issue with the query-service pod
k

Kaouther Abrougui

07/20/2022, 3:52 PM
ok I'll get that too
OMMKilled also
Containers:
  signoz-release-query-service:
    Container ID:  <containerd://84efdf05080a6e07b646a34bc6ef204bd19f924bff2711674a1e6c3de2a62de>6
    Image:         <http://docker.io/signoz/query-service:0.10.0|docker.io/signoz/query-service:0.10.0>
    Image ID:      <http://docker.io/signoz/query-service@sha256:1cbf6d2e0b55f1a2a7e8bb0f9b199c438198340096879511e57e4d9f8edf8cf8|docker.io/signoz/query-service@sha256:1cbf6d2e0b55f1a2a7e8bb0f9b199c438198340096879511e57e4d9f8edf8cf8>
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      -config=/root/config/prometheus.yml
    State:          Running
      Started:      Wed, 20 Jul 2022 16:02:34 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 20 Jul 2022 15:56:53 +0100
      Finished:     Wed, 20 Jul 2022 16:02:06 +0100
I'll add resources for query service as well.. that should be here right? https://github.com/SigNoz/charts/blob/36283d3da7f19a612c63409c7071c2aa173b21c2/charts/signoz/values.yaml#L315
p

Prashant Shahi

07/20/2022, 4:04 PM
yeah, you can increase the limits
šŸ‘ 1
a

Ankit Nayan

07/20/2022, 4:09 PM
@Kaouther Abrougui let us know if the otel-collector and query-service crashing issues are resolved
k

Kaouther Abrougui

07/20/2022, 4:10 PM
Yes, sure, I am going to start a build to get the spans, it will take some time before spans start coming, will let you know asap
šŸ‘ 1
Thank you very much for your support!!
šŸ‘ 1
collector OOMed again, I used
Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:     1
      memory:  2Gi
I'll bump memory to 10Gi
it OOMed again even after I upgraded the limit memory to 10Gi
a

Ankit Nayan

07/20/2022, 5:28 PM
weird...can you try 3 replicas of 2CPU and 4GB RAM. Is the load sudden? Possible to increase load slowly once they are in running state? Others are not facing any such issues
k

Kaouther Abrougui

07/20/2022, 5:32 PM
I can try decrease the load, and will try with replicas
a

Ankit Nayan

07/20/2022, 5:33 PM
I have seen otel-collector running with 2CPUs and 4GB RAM handing 20K spans/s. So, averaging to 1.2M spans/min. Might be due to spike of incoming data.
k

Kaouther Abrougui

07/20/2022, 5:36 PM
ok, by default I have 6 parallel processes that will be responsible of generating 6 traces, I'll test with 1 process at a time
a

Ankit Nayan

07/20/2022, 5:37 PM
6 traces? what's the size of 1 trace?
k

Kaouther Abrougui

07/20/2022, 5:39 PM
otel collector errored with below events, by I see it tried to reach the requested size not the limit so I'll put the upper value on the request too
in average I saw traces with 150K in 2minutes and half...
Events:
  Type     Reason               Age                    From                Message
  ----     ------               ----                   ----                -------
  Warning  FailedScheduling     32m                    default-scheduler   0/151 nodes are available: 146 node(s) had taint {<http://nvidia.com/gpu|nvidia.com/gpu>: present}, that the pod didn't tolerate, 3 Insufficient memory, 5 Insufficient cpu.
  Normal   NotTriggerScaleUp    32m                    cluster-autoscaler  pod didn't trigger scale-up: 3 node(s) had taint {<http://nvidia.com/gpu|nvidia.com/gpu>: present}, that the pod didn't tolerate, 1 Insufficient cpu, 1 Insufficient memory, 1 max node group size reached
  Normal   Scheduled            32m                    default-scheduler   Successfully assigned platform/signoz-release-otel-collector-dc4fb6bff-rz4l8 to gke-benchmarks-clust-pool-e2-standard-9a225627-76rw
  Normal   Pulling              32m                    kubelet             Pulling image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>"
  Normal   Pulled               32m                    kubelet             Successfully pulled image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>" in 8.72024562s
  Normal   Created              32m                    kubelet             Created container signoz-release-otel-collector-init
  Normal   Started              32m                    kubelet             Started container signoz-release-otel-collector-init
  Normal   Pulled               31m                    kubelet             Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 6.005665963s
  Normal   Pulling              14m (x2 over 32m)      kubelet             Pulling image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>"
  Normal   Pulled               14m                    kubelet             Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 524.349113ms
  Normal   Created              14m (x2 over 31m)      kubelet             Created container signoz-release-otel-collector
  Normal   Started              14m (x2 over 31m)      kubelet             Started container signoz-release-otel-collector
  Warning  Evicted              2m42s                  kubelet             The node was low on resource: memory. Container signoz-release-otel-collector was using 5119956Ki, which exceeds its request of 2Gi.
  Normal   Killing              2m41s                  kubelet             Stopping container signoz-release-otel-collector
  Warning  Evicted              2m36s                  kubelet             The node was low on resource: memory. Container signoz-release-otel-collector was using 5365200Ki, which exceeds its request of 2Gi.
  Warning  ExceededGracePeriod  2m26s (x2 over 2m32s)  kubelet             Container runtime did not kill the pod within specified grace period.
a

Ankit Nayan

07/20/2022, 5:41 PM
1 trace with 150K spans which get completed in avg somewhat more than 2 minutes?
that will be too many spans in 1 trace...I don't think the trace-detail UI page will work for this now
šŸ˜¬ 1
k

Kaouther Abrougui

07/20/2022, 5:43 PM
yes, however, I was using jaeger and it was also dropping spans... but I used to see in the logs that it is dropping spans and the number of spans dropped
a

Ankit Nayan

07/20/2022, 5:45 PM
holy..moly
k

Kaouther Abrougui

07/20/2022, 5:45 PM
haha, yeah a lot of spans, and expected 0 loss !!
šŸ¤Æ 1
a

Ankit Nayan

07/20/2022, 5:45 PM
are such traces rare? I am thinking how many such traces will appear in a min
k

Kaouther Abrougui

07/20/2022, 5:47 PM
no they are not rare, that is the standard... the rare ones are ... I got once a trace with 450K over 20 min
a

Ankit Nayan

07/20/2022, 5:47 PM
we might have to experiment on that a bit... anyway to generate such trace so that we can test signoz on that?
no they are not rare, that is the standard.
and how many of such traces are produced in an hr or day? having such traces being produced say at 10K traces/s and each of them being 150K spans, it would be 1.5B spans/s ingestion
k

Kaouther Abrougui

07/20/2022, 5:51 PM
"anyway to generate such trace so that we can test signoz on that?" not sure... initially I wanted a tracegen with such load so I can experiment with configs, but didn't find something available
will need those traces once a day on a schedule, otherwise anytime on demand
a

Ankit Nayan

07/20/2022, 5:54 PM
okay...then it seems reasonable... you want us to give it a try to make it work? Some eng bandwidth would get involved šŸ˜›
k

Kaouther Abrougui

07/20/2022, 5:59 PM
So 2 aspects here: first getting all spans of traces and not losing any, and second being able to visualize traces on UI...
a

Ankit Nayan

07/20/2022, 5:59 PM
can you try once again by replacing below line with
processors: [batch]
https://github.com/SigNoz/charts/blob/main/charts/signoz/values.yaml#L881 and removing section https://github.com/SigNoz/charts/blob/main/charts/signoz/values.yaml#L883-L889
šŸ‘ 1
So 2 aspects here: first getting all spans of traces and not losing any, and second being able to visualize traces on UI...
correct āœ…
k

Kaouther Abrougui

07/20/2022, 11:27 PM
updates here: after increasing request memory to 4Gi and using 1 process only to generate traces (to avoid spikes), the collector pod did not crash, but I see some missing spans, this could be due to another reason, but how can we rule out that the collector did not drop any span? in such case I can chase this issue on instrumentation side
This is the time scale in case
so this seem to be 216K+ spans in a little over 1 min
s

Srikanth Chekuri

07/21/2022, 2:35 AM
Just curious what do you have in those spans? I have seen the cases of few thousands of spans in the past (with max around 11k) but 216k is first time I am hearing. Are all these spans useful or did something is getting instrumented at micro instruction level?
k

Kaouther Abrougui

07/21/2022, 7:44 AM
we're instrumenting everything at this time, then based on use-fullness will filter out the unnecessary ones
I generated a very low scale trace, this time 11k. Still see missing span. Is it possible to get more details about this missing span? like where in the sequence it is detected missing? also is it possible to confirm whether or not it is missing because it was dropped or because it was never received?
just noticed that this is also happening in a little less than 3 seconds... this could explain the high spike when running large scale and in parallel
Oh, I think SigNoz uses the red dotted square to show where the span is missing... so from this I understand that only one span is missing at that root level, correct?
Actually, for the complete end to end trace, there should be 2 spans coming from another cluster with some context information, I guess those are the flagged missing spans... to get the complete trace I need to expose SigNoz collector and send those 2 spans... Is exposing SigNoz collector possible easily, I saw this issue faced https://github.com/SigNoz/signoz/issues/672
s

Srikanth Chekuri

07/21/2022, 8:53 AM
I am not aware of your setup but the signoz collector should be accessible to all the clients sending the telemetry data
k

Kaouther Abrougui

07/21/2022, 8:55 AM
I see a cluster IP associated with the collector, so only internal clients in the same cluster can access the collector... if I need to send spans from a completly different cluster, then I need to have an ingress right?
s

Srikanth Chekuri

07/21/2022, 8:58 AM
are you running the SizNoz within the same cluster as another application? or did you have a different one for it?
k

Kaouther Abrougui

07/21/2022, 9:00 AM
my application is disctributed across 2 different clusters, currently I am running sigNoz in one of the clusters and need to get spans from the other cluster
Ideally I need to run signoz out of the 2 clusters, so will have 3 clusters in total. I am trying this now
s

Srikanth Chekuri

07/21/2022, 9:01 AM
Yeah, that would be ideal.
šŸ‘ 1
k

Kaouther Abrougui

07/21/2022, 11:13 AM
Another execution with high scale this time, but not parallel trace generation, again only 1 span missing from the root (working on that part)... 131K+ spans in 47 seconds šŸ™‚
šŸ™Œ 1
By the way, any update on collector v 0.55?
a

Ankit Nayan

07/21/2022, 12:10 PM
yes..that is under test.. need it today or a couple of days wait is fine?
k

Kaouther Abrougui

07/21/2022, 12:11 PM
would be great if I get it today.
I created the signoz collector in separate cluster and exposed an ingressRoute to get spans for same trace from different clsuters... there is a step in my pipeline to create the collector with my application in one of the clusters, and that is the step that I did work around it to use an old version for my app... now that step seems stagnating.. I guess my workaround is in the way, so if I get the collector version supported, it would avoid this issue I guess
a

Ankit Nayan

07/21/2022, 4:28 PM
@Srikanth Chekuri can you please help her with the config changes needed to experiment with the
v0.55.0
otel collector?
s

Srikanth Chekuri

07/21/2022, 6:34 PM
@Kaouther Abrougui You can try with
signoz/signoz-otel-collector:0.55.0-rc.1
k

Kaouther Abrougui

07/21/2022, 6:35 PM
Thank you! on it right now
@Srikanth Chekuri, is below config in helm override-values.yaml correct? I am getting ImagePullBackOff on collectors pods
otelCollector:
  image:
    tag: 0.55.0-rc.1
otelCollectorMetrics:
  image:
    tag: 0.55.0-rc.1
s

Srikanth Chekuri

07/21/2022, 6:57 PM
You have to provide registry and repo also
k

Kaouther Abrougui

07/21/2022, 6:58 PM
tried this and got error too:
otelCollector:
  image:
    tag: signoz/signoz-otel-collector:0.55.0-rc.1
otelCollectorMetrics:
  image:
    tag: signoz/signoz-otel-collector:0.55.0-rc.1
is that not correct?
s

Srikanth Chekuri

07/21/2022, 6:59 PM
Under the image try this
registry: <http://docker.io|docker.io>
repository: signoz/signoz-otel-collector
tag: 0.55.0-rc.1
šŸ™Œ 1
k

Kaouther Abrougui

07/21/2022, 6:59 PM
oh ok, thanks!
p

Prashant Shahi

07/21/2022, 7:11 PM
@Kaouther Abrougui would appreciate any feedback or suggestions
k

Kaouther Abrougui

07/21/2022, 7:12 PM
getting this at the moment
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  2m11s              default-scheduler  Successfully assigned platform/signoz-release-otel-collector-5bfbf89f9-vbh8z to xxx
  Normal   Pulled     2m10s              kubelet            Container image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>" already present on machine
  Normal   Created    2m10s              kubelet            Created container signoz-release-otel-collector-init
  Normal   Started    2m10s              kubelet            Started container signoz-release-otel-collector-init
  Normal   Pulled     84s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 428.412212ms
  Normal   Pulled     83s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 426.96839ms
  Normal   Pulled     69s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 530.406348ms
  Normal   Pulling    46s (x4 over 84s)  kubelet            Pulling image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>"
  Normal   Created    45s (x4 over 84s)  kubelet            Created container signoz-release-otel-collector
  Warning  Failed     45s (x4 over 83s)  kubelet            Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "/otelcontribcol": stat /otelcontribcol: no such file or directory: unknown
  Normal   Pulled     45s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 444.906797ms
  Warning  BackOff    16s (x6 over 82s)  kubelet            Back-off restarting failed container
tried upgrade then delete/install but same
any reason for that error?
p

Prashant Shahi

07/21/2022, 7:17 PM
oh right.
signoz otel binary name is updated to
signoz-collector
s

Srikanth Chekuri

07/21/2022, 7:19 PM
@Prashant Shahi Command needs to be updated from
/otelcontribcol
to new name right?
let me create a branch in charts repo with these changes.
k

Kaouther Abrougui

07/21/2022, 7:23 PM
ok, let me know when ready
p

Prashant Shahi

07/21/2022, 8:02 PM
@Kaouther Abrougui Checkout to
otel-0.55-changes
branch.
git clone <https://github.com/SigNoz/charts.git> && cd charts

git checkout otel-0.55-changes

helm upgrade my-release -n platform -f override-values.yaml charts/signoz
k

Kaouther Abrougui

07/21/2022, 8:43 PM
Thanks! Trying it
Awesome! Thank you very much! It is upgraded successfully!
šŸŽ‰ 1
p

Prashant Shahi

07/21/2022, 8:52 PM
That's great to hear!
šŸ™Œ 1
k

Kaouther Abrougui

08/17/2022, 10:56 PM
@Prashant Shahi