This message was deleted SigNoz Community #general

Join Slack

This message was deleted.

# general

Slackbot

07/19/2022, 2:10 PM

This message was deleted.

Ankit Nayan

07/19/2022, 4:40 PM

@Kaouther Abrougui We used to have kafka + druid setup but removed kafka as it was not needed till you hit a huge scale. Comparison with elastic search is interesting, maybe a blog would be helpful. We have done some internal tests and columnar db like clickhouse seems quite performant for trace data. Some companies like uber and cloudflare have move from elastic to clickhouse for logs too. Attaching a screenshot from a talk and link of the talk for perf advantages found for logs.

https://youtu.be/me0fcwn5gXA?t=22773▾

Ankit Nayan

07/19/2022, 4:43 PM

what is the scale you are talking about...handling 10K rps (100K events/s ) without sampling should be easy with signoz using around 20 CPUs. And further scale can be handled with sampling which can reduce the load 100-1000 times

Kaouther Abrougui

07/19/2022, 7:15 PM

Thanks for sharing Ankit. 100K events/s seems great... I don't have exact rate/s but I have traces with an average of 150K spans generated within couple minutes

Ankit Nayan

07/20/2022, 3:11 AM

150K spans/min should not be an issue

Ankit Nayan

07/20/2022, 3:12 AM

there are some companies who are using signoz at around 200K spans/min

👍 1

Kaouther Abrougui

07/20/2022, 1:33 PM

@Ankit Nayan where can I increase the CPU to 20?

Ankit Nayan

07/20/2022, 1:50 PM

using helm installation? If there are no limits on resources of clickhouse, we need not and 20 CPUs ideally won't be needed for 150K/min ingestion

Ankit Nayan

07/20/2022, 1:50 PM

how much are you ingesting now?

Kaouther Abrougui

07/20/2022, 2:03 PM

around 150K/min

Kaouther Abrougui

07/20/2022, 2:04 PM

yes using helm installation

Kaouther Abrougui

07/20/2022, 2:04 PM

no resource limit at this moment

Kaouther Abrougui

07/20/2022, 2:06 PM

was able to use an old image of my service that worked with old collector version (while waiting for newer collector available) and with the default installation on k8 with helm chart the collector pod is crashing, I think due to load, also I see many spans on UI

Kaouther Abrougui

07/20/2022, 2:07 PM

When I look at the count graph, I can see the upper point at 285K+

Kaouther Abrougui

07/20/2022, 2:15 PM

I am getting missing span error also, probably due to collector crashing and restarting

Kaouther Abrougui

07/20/2022, 2:45 PM

Do I need to scale anything to support the expected load in order to avoid missing spans?

Ankit Nayan

07/20/2022, 3:12 PM

missing spans can be due to • otel collector crashing • parent span yet not received. If trace have a few minutes of completion time then this is usually the case

Kaouther Abrougui

07/20/2022, 3:12 PM

I saw otel collector crashing and I see 17 restarts, what needs to be done to avoid that?

Kaouther Abrougui

07/20/2022, 3:13 PM

adding more cpu and memory to the collector? are those configurable?

Ankit Nayan

07/20/2022, 3:13 PM

for now let's fix otel-collector crashing. Can you post of logs of the crashing otel-collector

Ankit Nayan

07/20/2022, 3:15 PM

you can change at https://github.com/SigNoz/charts/blob/36283d3da7f19a612c63409c7071c2aa173b21c2/charts/signoz/values.yaml#L762

Ankit Nayan

07/20/2022, 3:16 PM

and https://github.com/SigNoz/charts/blob/36283d3da7f19a612c63409c7071c2aa173b21c2/charts/signoz/values.yaml#L984

Kaouther Abrougui

07/20/2022, 3:17 PM

Ok, thank you I'll change that and run another round

Kaouther Abrougui

07/20/2022, 3:19 PM

as for logs of crashing collector, what exactly you need to see? the logs of the collector look clean then it crashes and a new pod is restarted, are the collector logs saves somewhere I can fetch?

Ankit Nayan

07/20/2022, 3:19 PM

and definitely logs would be very useful.. Usually otel-collector does not crash due to resource limits..probably it will just drop data

Kaouther Abrougui

07/20/2022, 3:20 PM

are the logs persisted somewhere?

Ankit Nayan

07/20/2022, 3:21 PM

are the logs persisted somewhere?

not for K8s IMO. @Prashant Shahi is this correct?

Prashant Shahi

07/20/2022, 3:29 PM

you can only get logs of current pod and previous.

Prashant Shahi

07/20/2022, 3:30 PM

current logs:

Copy code

OTEL_COLLECTOR_POD=$(kubectl get pods -n platform -o jsonpath={..metadata.name} -l "<http://app.kubernetes.io/component=otel-collector|app.kubernetes.io/component=otel-collector>")

kubectl logs -n platform $OTEL_COLLECTOR_POD

previous logs:

Copy code

kubectl logs -n platform $OTEL_COLLECTOR_POD --previous

Ankit Nayan

07/20/2022, 3:30 PM

so, exit code with errors of last run should be available, right?

Kaouther Abrougui

07/20/2022, 3:32 PM

looks clean

Kaouther Abrougui

07/20/2022, 3:32 PM

this is what I get with previous flag

Prashant Shahi

07/20/2022, 3:32 PM

try this:

Copy code

kubectl get events -n platform

Prashant Shahi

07/20/2022, 3:33 PM

okay. if previous instance of the container looks clean..

Kaouther Abrougui

07/20/2022, 3:33 PM

Copy code

LAST SEEN   TYPE      REASON      OBJECT                                               MESSAGE
40m         Normal    Pulling     pod/signoz-release-otel-collector-678f68755c-fnj24   Pulling image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>"
40m         Warning   BackOff     pod/signoz-release-otel-collector-678f68755c-fnj24   Back-off restarting failed container
49m         Normal    Pulled      pod/signoz-release-otel-collector-678f68755c-fnj24   Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 671.012383ms
30m         Normal    Created     pod/signoz-release-query-service-0                   Created container signoz-release-query-service
30m         Normal    Started     pod/signoz-release-query-service-0                   Started container signoz-release-query-service
30m         Normal    Pulled      pod/signoz-release-query-service-0                   Container image "<http://docker.io/signoz/query-service:0.10.0|docker.io/signoz/query-service:0.10.0>" already present on machine
30m         Warning   BackOff     pod/signoz-release-query-service-0                   Back-off restarting failed container
30m         Warning   Unhealthy   pod/signoz-release-query-service-0                   Readiness probe failed: Get "<http://10.16.169.30:8080/api/v1/version>": dial tcp 10.16.169.30:8080: connect: connection refused

Prashant Shahi

07/20/2022, 3:35 PM

query-service seems to be unhealthy before..

Prashant Shahi

07/20/2022, 3:35 PM

can you run this as well?

Copy code

kubectl describe -n platform pod/$OTEL_COLLECTOR_POD

Kaouther Abrougui

07/20/2022, 3:37 PM

Copy code

kubectl describe -n platform pod/$OTEL_COLLECTOR_POD
W0720 16:35:56.616742   68666 gcp.go:120] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.25+; use gcloud instead.
To learn more, consult <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>
Name:         signoz-release-otel-collector-678f68755c-fnj24
Namespace:    platform
Priority:     0
Node:         xxx
Start Time:   Wed, 20 Jul 2022 13:10:00 +0100
Labels:       <http://app.kubernetes.io/component=otel-collector|app.kubernetes.io/component=otel-collector>
              <http://app.kubernetes.io/instance=signoz-release|app.kubernetes.io/instance=signoz-release>
              <http://app.kubernetes.io/name=signoz|app.kubernetes.io/name=signoz>
              pod-template-hash=678f68755c
Annotations:  checksum/config: 7511037609a6822915f6adc83937e8de5da3aceca3ef20b4f83f4cba72d2eaf5
Status:       Running
IP:           10.16.1.79
IPs:
  IP:           10.16.1.79
Controlled By:  ReplicaSet/signoz-release-otel-collector-678f68755c
Init Containers:
  signoz-release-otel-collector-init:
    Container ID:  <containerd://1f1c64a60323017a3b84f6bc576e2de21481a7d23d5aa737bb0a2534eacdc22>d
    Image:         <http://docker.io/busybox:1.35|docker.io/busybox:1.35>
    Image ID:      <http://docker.io/library/busybox@sha256:8c40df61d40166f5791f44b3d90b77b4c7f59ed39a992fd9046886d3126ffa68|docker.io/library/busybox@sha256:8c40df61d40166f5791f44b3d90b77b4c7f59ed39a992fd9046886d3126ffa68>
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until wget --spider -q signoz-release-clickhouse:8123/ping; do echo -e "waiting for clickhouseDB"; sleep 5; done; echo -e "clickhouse ready, starting otel collector now";
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 20 Jul 2022 13:10:03 +0100
      Finished:     Wed, 20 Jul 2022 13:11:12 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5jzhr (ro)
Containers:
  signoz-release-otel-collector:
    Container ID:  <containerd://20b6eb0471ab94c2049f66c3d49b0091b0c9798c0cd36265b4b051dc77cbe65>5
    Image:         <http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>
    Image ID:      <http://docker.io/signoz/otelcontribcol@sha256:f3378be7a69b38ebb03c4cfa941fa35715f927e374519b7051f525a6c5a020c3|docker.io/signoz/otelcontribcol@sha256:f3378be7a69b38ebb03c4cfa941fa35715f927e374519b7051f525a6c5a020c3>
    Port:          <none>
    Host Port:     <none>
    Command:
      /otelcontribcol
      --config=/conf/otel-collector-config.yaml
    State:          Running
      Started:      Wed, 20 Jul 2022 15:52:18 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 20 Jul 2022 15:43:51 +0100
      Finished:     Wed, 20 Jul 2022 15:52:04 +0100
    Ready:          True
    Restart Count:  16
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     200m
      memory:  400Mi
    Environment:
      CLICKHOUSE_HOST:            signoz-release-clickhouse
      CLICKHOUSE_PORT:            9000
      CLICKHOUSE_HTTP_PORT:       8123
      CLICKHOUSE_CLUSTER:         cluster
      CLICKHOUSE_DATABASE:        signoz_metrics
      CLICKHOUSE_TRACE_DATABASE:  signoz_traces
      CLICKHOUSE_USER:            admin
      CLICKHOUSE_PASSWORD:        27ff0399-0d3a-4bd8-919d-17c2181e6fb9
      CLICKHOUSE_SECURE:          false
      CLICKHOUSE_VERIFY:          false
    Mounts:
      /conf from otel-collector-config-vol (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5jzhr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  otel-collector-config-vol:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      signoz-release-otel-collector
    Optional:  false
  kube-api-access-5jzhr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Normal   Pulled   52m                   kubelet  Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 671.012383ms
  Warning  BackOff  43m (x215 over 148m)  kubelet  Back-off restarting failed container
  Normal   Pulling  43m (x17 over 3h24m)  kubelet  Pulling image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>"

Prashant Shahi

07/20/2022, 3:38 PM

Got it!

Copy code

Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 20 Jul 2022 15:43:51 +0100
      Finished:     Wed, 20 Jul 2022 15:52:04 +0100

cc @Ankit Nayan @Srikanth Chekuri

🙌 1

Prashant Shahi

07/20/2022, 3:39 PM

previous logs look clean, because it OOMed without any logs..

👍 1

Kaouther Abrougui

07/20/2022, 3:39 PM

great! so increasing the resource as Ankit pointed earlier could help?

Ankit Nayan

07/20/2022, 3:39 PM

yeah

Kaouther Abrougui

07/20/2022, 3:40 PM

should I just reinstall the chart again with the --set flags for those pointed parameters?

Ankit Nayan

07/20/2022, 3:41 PM

upgrading should work right @Prashant Shahi?

Prashant Shahi

07/20/2022, 3:45 PM

yes, that should work.

Prashant Shahi

07/20/2022, 3:45 PM

Though using

override-values.yml

is recommended.

Kaouther Abrougui

07/20/2022, 3:46 PM

yes, I prefer using that as well

👍 2

Prashant Shahi

07/20/2022, 3:46 PM

or else, your future

helm upgrade

commands would overwrite back to defaults

Ankit Nayan

07/20/2022, 3:46 PM

yeah..that's a good approach.. overriding changes can be version controlled then

Ankit Nayan

07/20/2022, 3:51 PM

and similarly

kubectl describe ...

would help knowing issue with the query-service pod

Kaouther Abrougui

07/20/2022, 3:52 PM

ok I'll get that too

Kaouther Abrougui

07/20/2022, 3:53 PM

OMMKilled also

Copy code

Containers:
  signoz-release-query-service:
    Container ID:  <containerd://84efdf05080a6e07b646a34bc6ef204bd19f924bff2711674a1e6c3de2a62de>6
    Image:         <http://docker.io/signoz/query-service:0.10.0|docker.io/signoz/query-service:0.10.0>
    Image ID:      <http://docker.io/signoz/query-service@sha256:1cbf6d2e0b55f1a2a7e8bb0f9b199c438198340096879511e57e4d9f8edf8cf8|docker.io/signoz/query-service@sha256:1cbf6d2e0b55f1a2a7e8bb0f9b199c438198340096879511e57e4d9f8edf8cf8>
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      -config=/root/config/prometheus.yml
    State:          Running
      Started:      Wed, 20 Jul 2022 16:02:34 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 20 Jul 2022 15:56:53 +0100
      Finished:     Wed, 20 Jul 2022 16:02:06 +0100

Kaouther Abrougui

07/20/2022, 3:57 PM

I'll add resources for query service as well.. that should be here right? https://github.com/SigNoz/charts/blob/36283d3da7f19a612c63409c7071c2aa173b21c2/charts/signoz/values.yaml#L315

Prashant Shahi

07/20/2022, 4:04 PM

yeah, you can increase the limits

👍 1

Ankit Nayan

07/20/2022, 4:09 PM

@Kaouther Abrougui let us know if the otel-collector and query-service crashing issues are resolved

Kaouther Abrougui

07/20/2022, 4:10 PM

Yes, sure, I am going to start a build to get the spans, it will take some time before spans start coming, will let you know asap

👍 1

Kaouther Abrougui

07/20/2022, 4:10 PM

Thank you very much for your support!!

👍 1

Kaouther Abrougui

07/20/2022, 5:00 PM

collector OOMed again, I used

Copy code

Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:     1
      memory:  2Gi

Kaouther Abrougui

07/20/2022, 5:00 PM

I'll bump memory to 10Gi

Kaouther Abrougui

07/20/2022, 5:26 PM

it OOMed again even after I upgraded the limit memory to 10Gi

Ankit Nayan

07/20/2022, 5:28 PM

weird...can you try 3 replicas of 2CPU and 4GB RAM. Is the load sudden? Possible to increase load slowly once they are in running state? Others are not facing any such issues

Kaouther Abrougui

07/20/2022, 5:32 PM

I can try decrease the load, and will try with replicas

Ankit Nayan

07/20/2022, 5:33 PM

I have seen otel-collector running with 2CPUs and 4GB RAM handing 20K spans/s. So, averaging to 1.2M spans/min. Might be due to spike of incoming data.

Kaouther Abrougui

07/20/2022, 5:36 PM

ok, by default I have 6 parallel processes that will be responsible of generating 6 traces, I'll test with 1 process at a time

Ankit Nayan

07/20/2022, 5:37 PM

6 traces? what's the size of 1 trace?

Kaouther Abrougui

07/20/2022, 5:39 PM

otel collector errored with below events, by I see it tried to reach the requested size not the limit so I'll put the upper value on the request too

Kaouther Abrougui

07/20/2022, 5:39 PM

in average I saw traces with 150K in 2minutes and half...

Kaouther Abrougui

07/20/2022, 5:40 PM

Copy code

Events:
  Type     Reason               Age                    From                Message
  ----     ------               ----                   ----                -------
  Warning  FailedScheduling     32m                    default-scheduler   0/151 nodes are available: 146 node(s) had taint {<http://nvidia.com/gpu|nvidia.com/gpu>: present}, that the pod didn't tolerate, 3 Insufficient memory, 5 Insufficient cpu.
  Normal   NotTriggerScaleUp    32m                    cluster-autoscaler  pod didn't trigger scale-up: 3 node(s) had taint {<http://nvidia.com/gpu|nvidia.com/gpu>: present}, that the pod didn't tolerate, 1 Insufficient cpu, 1 Insufficient memory, 1 max node group size reached
  Normal   Scheduled            32m                    default-scheduler   Successfully assigned platform/signoz-release-otel-collector-dc4fb6bff-rz4l8 to gke-benchmarks-clust-pool-e2-standard-9a225627-76rw
  Normal   Pulling              32m                    kubelet             Pulling image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>"
  Normal   Pulled               32m                    kubelet             Successfully pulled image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>" in 8.72024562s
  Normal   Created              32m                    kubelet             Created container signoz-release-otel-collector-init
  Normal   Started              32m                    kubelet             Started container signoz-release-otel-collector-init
  Normal   Pulled               31m                    kubelet             Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 6.005665963s
  Normal   Pulling              14m (x2 over 32m)      kubelet             Pulling image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>"
  Normal   Pulled               14m                    kubelet             Successfully pulled image "<http://docker.io/signoz/otelcontribcol:0.45.1-1.1|docker.io/signoz/otelcontribcol:0.45.1-1.1>" in 524.349113ms
  Normal   Created              14m (x2 over 31m)      kubelet             Created container signoz-release-otel-collector
  Normal   Started              14m (x2 over 31m)      kubelet             Started container signoz-release-otel-collector
  Warning  Evicted              2m42s                  kubelet             The node was low on resource: memory. Container signoz-release-otel-collector was using 5119956Ki, which exceeds its request of 2Gi.
  Normal   Killing              2m41s                  kubelet             Stopping container signoz-release-otel-collector
  Warning  Evicted              2m36s                  kubelet             The node was low on resource: memory. Container signoz-release-otel-collector was using 5365200Ki, which exceeds its request of 2Gi.
  Warning  ExceededGracePeriod  2m26s (x2 over 2m32s)  kubelet             Container runtime did not kill the pod within specified grace period.

Ankit Nayan

07/20/2022, 5:41 PM

1 trace with 150K spans which get completed in avg somewhat more than 2 minutes?

Ankit Nayan

07/20/2022, 5:43 PM

that will be too many spans in 1 trace...I don't think the trace-detail UI page will work for this now

😬 1

Kaouther Abrougui

07/20/2022, 5:43 PM

yes, however, I was using jaeger and it was also dropping spans... but I used to see in the logs that it is dropping spans and the number of spans dropped

Ankit Nayan

07/20/2022, 5:45 PM

holy..moly

Kaouther Abrougui

07/20/2022, 5:45 PM

haha, yeah a lot of spans, and expected 0 loss !!

🤯 1

Ankit Nayan

07/20/2022, 5:45 PM

are such traces rare? I am thinking how many such traces will appear in a min

Kaouther Abrougui

07/20/2022, 5:47 PM

no they are not rare, that is the standard... the rare ones are ... I got once a trace with 450K over 20 min

Ankit Nayan

07/20/2022, 5:47 PM

we might have to experiment on that a bit... anyway to generate such trace so that we can test signoz on that?

Ankit Nayan

07/20/2022, 5:50 PM

no they are not rare, that is the standard.

and how many of such traces are produced in an hr or day? having such traces being produced say at 10K traces/s and each of them being 150K spans, it would be 1.5B spans/s ingestion

Kaouther Abrougui

07/20/2022, 5:51 PM

"anyway to generate such trace so that we can test signoz on that?" not sure... initially I wanted a tracegen with such load so I can experiment with configs, but didn't find something available

Kaouther Abrougui

07/20/2022, 5:52 PM

will need those traces once a day on a schedule, otherwise anytime on demand

Ankit Nayan

07/20/2022, 5:54 PM

okay...then it seems reasonable... you want us to give it a try to make it work? Some eng bandwidth would get involved 😛

Kaouther Abrougui

07/20/2022, 5:59 PM

So 2 aspects here: first getting all spans of traces and not losing any, and second being able to visualize traces on UI...

Ankit Nayan

07/20/2022, 5:59 PM

can you try once again by replacing below line with

processors: [batch]

https://github.com/SigNoz/charts/blob/main/charts/signoz/values.yaml#L881 and removing section https://github.com/SigNoz/charts/blob/main/charts/signoz/values.yaml#L883-L889

👍 1

Ankit Nayan

07/20/2022, 6:00 PM

So 2 aspects here: first getting all spans of traces and not losing any, and second being able to visualize traces on UI...

correct ✅

Kaouther Abrougui

07/20/2022, 11:27 PM

updates here: after increasing request memory to 4Gi and using 1 process only to generate traces (to avoid spikes), the collector pod did not crash, but I see some missing spans, this could be due to another reason, but how can we rule out that the collector did not drop any span? in such case I can chase this issue on instrumentation side

Kaouther Abrougui

07/20/2022, 11:31 PM

This is the time scale in case

Kaouther Abrougui

07/20/2022, 11:31 PM

so this seem to be 216K+ spans in a little over 1 min

Srikanth Chekuri

07/21/2022, 2:35 AM

Just curious what do you have in those spans? I have seen the cases of few thousands of spans in the past (with max around 11k) but 216k is first time I am hearing. Are all these spans useful or did something is getting instrumented at micro instruction level?

Kaouther Abrougui

07/21/2022, 7:44 AM

we're instrumenting everything at this time, then based on use-fullness will filter out the unnecessary ones

Kaouther Abrougui

07/21/2022, 7:48 AM

I generated a very low scale trace, this time 11k. Still see missing span. Is it possible to get more details about this missing span? like where in the sequence it is detected missing? also is it possible to confirm whether or not it is missing because it was dropped or because it was never received?

Kaouther Abrougui

07/21/2022, 7:56 AM

just noticed that this is also happening in a little less than 3 seconds... this could explain the high spike when running large scale and in parallel

Kaouther Abrougui

07/21/2022, 8:02 AM

Oh, I think SigNoz uses the red dotted square to show where the span is missing... so from this I understand that only one span is missing at that root level, correct?

Kaouther Abrougui

07/21/2022, 8:14 AM

Actually, for the complete end to end trace, there should be 2 spans coming from another cluster with some context information, I guess those are the flagged missing spans... to get the complete trace I need to expose SigNoz collector and send those 2 spans... Is exposing SigNoz collector possible easily, I saw this issue faced https://github.com/SigNoz/signoz/issues/672

Srikanth Chekuri

07/21/2022, 8:53 AM

I am not aware of your setup but the signoz collector should be accessible to all the clients sending the telemetry data

Kaouther Abrougui

07/21/2022, 8:55 AM

I see a cluster IP associated with the collector, so only internal clients in the same cluster can access the collector... if I need to send spans from a completly different cluster, then I need to have an ingress right?

Srikanth Chekuri

07/21/2022, 8:58 AM

are you running the SizNoz within the same cluster as another application? or did you have a different one for it?

Kaouther Abrougui

07/21/2022, 9:00 AM

my application is disctributed across 2 different clusters, currently I am running sigNoz in one of the clusters and need to get spans from the other cluster

Kaouther Abrougui

07/21/2022, 9:00 AM

Ideally I need to run signoz out of the 2 clusters, so will have 3 clusters in total. I am trying this now

Srikanth Chekuri

07/21/2022, 9:01 AM

Yeah, that would be ideal.

👍 1

Kaouther Abrougui

07/21/2022, 11:13 AM

Another execution with high scale this time, but not parallel trace generation, again only 1 span missing from the root (working on that part)... 131K+ spans in 47 seconds 🙂

🙌 1

Kaouther Abrougui

07/21/2022, 12:09 PM

By the way, any update on collector v 0.55?

Ankit Nayan

07/21/2022, 12:10 PM

yes..that is under test.. need it today or a couple of days wait is fine?

Kaouther Abrougui

07/21/2022, 12:11 PM

would be great if I get it today.

Kaouther Abrougui

07/21/2022, 12:14 PM

I created the signoz collector in separate cluster and exposed an ingressRoute to get spans for same trace from different clsuters... there is a step in my pipeline to create the collector with my application in one of the clusters, and that is the step that I did work around it to use an old version for my app... now that step seems stagnating.. I guess my workaround is in the way, so if I get the collector version supported, it would avoid this issue I guess

Ankit Nayan

07/21/2022, 4:28 PM

@Srikanth Chekuri can you please help her with the config changes needed to experiment with the

v0.55.0

otel collector?

Srikanth Chekuri

07/21/2022, 6:34 PM

@Kaouther Abrougui You can try with

signoz/signoz-otel-collector:0.55.0-rc.1

Kaouther Abrougui

07/21/2022, 6:35 PM

Thank you! on it right now

Kaouther Abrougui

07/21/2022, 6:56 PM

@Srikanth Chekuri, is below config in helm override-values.yaml correct? I am getting ImagePullBackOff on collectors pods

Copy code

otelCollector:
  image:
    tag: 0.55.0-rc.1
otelCollectorMetrics:
  image:
    tag: 0.55.0-rc.1

Srikanth Chekuri

07/21/2022, 6:57 PM

You have to provide registry and repo also

Kaouther Abrougui

07/21/2022, 6:58 PM

tried this and got error too:

Copy code

otelCollector:
  image:
    tag: signoz/signoz-otel-collector:0.55.0-rc.1
otelCollectorMetrics:
  image:
    tag: signoz/signoz-otel-collector:0.55.0-rc.1

Kaouther Abrougui

07/21/2022, 6:58 PM

is that not correct?

Srikanth Chekuri

07/21/2022, 6:59 PM

Under the image try this

Copy code

registry: <http://docker.io|docker.io>
repository: signoz/signoz-otel-collector
tag: 0.55.0-rc.1

🙌 1

Kaouther Abrougui

07/21/2022, 6:59 PM

oh ok, thanks!

Prashant Shahi

07/21/2022, 7:11 PM

@Kaouther Abrougui would appreciate any feedback or suggestions

Kaouther Abrougui

07/21/2022, 7:12 PM

getting this at the moment

Copy code

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  2m11s              default-scheduler  Successfully assigned platform/signoz-release-otel-collector-5bfbf89f9-vbh8z to xxx
  Normal   Pulled     2m10s              kubelet            Container image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>" already present on machine
  Normal   Created    2m10s              kubelet            Created container signoz-release-otel-collector-init
  Normal   Started    2m10s              kubelet            Started container signoz-release-otel-collector-init
  Normal   Pulled     84s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 428.412212ms
  Normal   Pulled     83s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 426.96839ms
  Normal   Pulled     69s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 530.406348ms
  Normal   Pulling    46s (x4 over 84s)  kubelet            Pulling image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>"
  Normal   Created    45s (x4 over 84s)  kubelet            Created container signoz-release-otel-collector
  Warning  Failed     45s (x4 over 83s)  kubelet            Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "/otelcontribcol": stat /otelcontribcol: no such file or directory: unknown
  Normal   Pulled     45s                kubelet            Successfully pulled image "<http://docker.io/signoz/signoz-otel-collector:0.55.0-rc.1|docker.io/signoz/signoz-otel-collector:0.55.0-rc.1>" in 444.906797ms
  Warning  BackOff    16s (x6 over 82s)  kubelet            Back-off restarting failed container

Kaouther Abrougui

07/21/2022, 7:12 PM

tried upgrade then delete/install but same

Kaouther Abrougui

07/21/2022, 7:14 PM

any reason for that error?

Prashant Shahi

07/21/2022, 7:17 PM

oh right.

Prashant Shahi

07/21/2022, 7:18 PM

signoz otel binary name is updated to

signoz-collector

Srikanth Chekuri

07/21/2022, 7:19 PM

@Prashant Shahi Command needs to be updated from

/otelcontribcol

to new name right?

Prashant Shahi

07/21/2022, 7:20 PM

otelcontribcol

should be updated to

signoz-collector

in following templates of the chart. https://github.com/SigNoz/charts/blob/main/charts/signoz/templates/otel-collector/deployment.yaml#L39 https://github.com/SigNoz/charts/blob/main/charts/signoz/templates/otel-collector-metrics/deployment.yaml#L39

Prashant Shahi

07/21/2022, 7:20 PM

let me create a branch in charts repo with these changes.

Kaouther Abrougui

07/21/2022, 7:23 PM

ok, let me know when ready

Prashant Shahi

07/21/2022, 8:02 PM

@Kaouther Abrougui Checkout to

otel-0.55-changes

branch.

Copy code

git clone <https://github.com/SigNoz/charts.git> && cd charts

git checkout otel-0.55-changes

helm upgrade my-release -n platform -f override-values.yaml charts/signoz

Kaouther Abrougui

07/21/2022, 8:43 PM

Thanks! Trying it

Kaouther Abrougui

07/21/2022, 8:48 PM

Awesome! Thank you very much! It is upgraded successfully!

🎉 1

Prashant Shahi

07/21/2022, 8:52 PM

That's great to hear!

🙌 1

Kaouther Abrougui

08/17/2022, 10:56 PM

@Prashant Shahi

30 Views

Open in Slack

Previous Next