This message was deleted SigNoz Community #general

Join Slack

This message was deleted.

# general

Slackbot

12/09/2022, 3:29 PM

This message was deleted.

Pranay

12/10/2022, 2:51 AM

@Prashant Shahi do you have any insights on this

Priytam Pandey

12/10/2022, 4:33 AM

Hi , I have to demo this to my company on Monday. Will we be able look this issue before that ? Let me try Jager in parallel in case if this doesn't work

Prashant Shahi

12/10/2022, 7:21 AM

Hi @Priytam Pandey 👋 It could be caused by either misconfigured instrumentation or at ingress level.

Prashant Shahi

12/10/2022, 7:25 AM

Perhaps you can try with

tracegen

to see if you are able to send traces: To install

tracegen

binary:

Copy code

go install <http://github.com/open-telemetry/opentelemetry-collector-contrib/tracegen@v0.63.0|github.com/open-telemetry/opentelemetry-collector-contrib/tracegen@v0.63.0>

Execute the command below to send sample trace data using `tracegen`:

Copy code

tracegen -traces 1 -otlp-endpoint <http://otelcollector.dash101.com:80|otelcollector.dash101.com:80> -otlp-insecure

Prashant Shahi

12/10/2022, 7:26 AM

@Srikanth Chekuri any idea about the above error message in instrumentation?

Srikanth Chekuri

12/10/2022, 8:04 AM

@Priytam Pandey can you update the receiver message size in otel collector for gRPC and give it a try?

Copy code

receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            max_recv_msg_size_mib: 25

Srikanth Chekuri

12/10/2022, 8:05 AM

Copy code

max_recv_msg_size_mib: 25

this one specifically

Srikanth Chekuri

12/10/2022, 8:15 AM

And can you also confirm you are connecting to correct grpc server https://github.com/grpc/grpc-java/issues/8164#issuecomment-838877435

Srikanth Chekuri

12/10/2022, 8:25 AM

This is not an issue with instrumentation. Please make sure the server configured to export telemetry is gRPC server.

Priytam Pandey

12/10/2022, 8:32 AM

Copy code

signoz-otel-collector                ClusterIP   172.20.243.65    <none>        14250/TCP,14268/TCP,8888/TCP,4317/TCP,4318/TCP

I believe port 4317 is grpc right ?

Copy code

<http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC

and ingress is annotated with grpc protocal

Priytam Pandey

12/10/2022, 8:32 AM

how to verify otherwise ?

Prashant Shahi

12/10/2022, 8:42 AM

@Priytam Pandey which K8s version are you using?

Priytam Pandey

12/10/2022, 8:45 AM

1.21, nodes are amd arch and helm is 3

Prashant Shahi

12/10/2022, 9:01 AM

You seem to be missing

ingressClassName: nginx

from the ingress.

Prashant Shahi

12/10/2022, 9:03 AM

Only in K8s version < 1.18, we can pass ingress class name via annotation.

Prashant Shahi

12/10/2022, 9:03 AM

For >= 1.18, we will have to set

ingressClassName

Prashant Shahi

12/10/2022, 9:05 AM

Copy code

otelCollector:
  ingress:
    enabled: true
    className: nginx
    annotations:
      <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: "GRPC"
    ....

Priytam Pandey

12/10/2022, 9:06 AM

My default ingress class is nginx that is why I chose not to mention it

Prashant Shahi

12/10/2022, 9:06 AM

Also, for instrumentation, you might have to pass the port

next to address.

Priytam Pandey

12/10/2022, 9:07 AM

Ohh i was in impression that http default port is 80 😞 will try it,

Prashant Shahi

12/10/2022, 9:07 AM

Also, some otel collector SDKs don't accept

http

https

prefix for GRPC endpoint.

✅ 1

Priytam Pandey

12/10/2022, 9:09 AM

so you are suggesting to rmeove http:// prefix ? something like below

Copy code

OTEL_EXPORTER_OTLP_ENDPOINT="otelcollector.****.com:80"

Prashant Shahi

12/10/2022, 9:12 AM

Before that, you can try with

<http://otelcollector.dash101.com:80>

✅ 1

Priytam Pandey

12/11/2022, 10:18 AM

Copy code

2022-12-11T15:44:41.421+0530	INFO	tracegen@v0.55.0/main.go:80	starting gRPC exporter
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel created	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] original dial target is: "otelcollector.****.com:80"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] parsed dial target is: {Scheme:otelcollector.****.com Authority: Endpoint:80 URL:{Scheme:otelcollector.****.com Opaque:80 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] fallback to scheme "passthrough"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] parsed dial target is: {Scheme:passthrough Authority: Endpoint:otelcollector.****.com:80 URL:{Scheme:passthrough Opaque: User: Host: Path:/otelcollector.*****.com:80 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel authority set to "otelcollector.****.com:80"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.429+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Resolver state updated: {
  "Addresses": [
    {
      "Addr": "otelcollector.****.com:80",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Type": 0,
      "Metadata": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.429+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel switches to new LB policy "pick_first"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.429+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel created	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel picks a new address "otelcollector.***.com:80" to connect	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {CONNECTING <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.514+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.514+0530	INFO	grpclog/component.go:71	[transport]transport: loopyWriter.run returning. connection error: desc = "transport is closing"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {IDLE <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel picks a new address "otelcollector.***.com:80" to connect	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.516+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {CONNECTING <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.516+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.578+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.579+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {READY <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.579+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.579+0530	INFO	tracegen/config.go:105	generation of traces isn't being throttled
2022-12-11T15:44:42.580+0530	INFO	tracegen/worker.go:91	traces generated	{"worker": 0, "traces": 1}
2022-12-11T15:44:42.581+0530	INFO	tracegen@v0.55.0/main.go:98	stop the batch span processor
2022-12-11T15:44:42.617+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.617+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {IDLE <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.617+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.617+0530	INFO	grpclog/component.go:71	[transport]transport: loopyWriter.run returning. connection error: desc = "transport is closing"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.143+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.143+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel picks a new address "otelcollector.****.com:80" to connect	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.148+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {CONNECTING <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.148+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.247+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.247+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {READY <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.248+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.298+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.299+0530	INFO	grpclog/component.go:71	[transport]transport: loopyWriter.run returning. connection error: desc = "transport is closing"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.299+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {IDLE <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.299+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022/12/11 15:44:52 context deadline exceeded
2022-12-11T15:44:52.585+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to SHUTDOWN	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.586+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.586+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel deleted	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.586+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel deleted	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.587+0530	INFO	tracegen@v0.55.0/main.go:89	stopping the exporter

Priytam Pandey

12/11/2022, 10:19 AM

none of the above solution helped , sorry for late response

Priytam Pandey

12/11/2022, 10:23 AM

but more I go deeper , it looks like there is problem at my end in cluster , will keep posted here

Priytam Pandey

12/11/2022, 11:56 AM

for now since ingress is not working (nginx) have declared service as loadbalancer and directly using 4317 port

Priytam Pandey

12/12/2022, 4:13 AM

@Prashant Shahi @Srikanth Chekuri @Pranay @Ankit Nayan, can we group services on the landing page and provide access as per the experience unit in the same organization?

Pranay

12/12/2022, 6:10 AM

@Priytam Pandey Not clear what do you mean - can you give an example? Do you mean different group of ppl should see only a particular set of services?

✅ 1

Priytam Pandey

12/13/2022, 11:31 AM

hey @Prashant Shahi this Init command is failing from yesterday release after helm started using clickhouse cluster (distributed)

Copy code

until wget --user "${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}" --spider -q signoz-clickhouse:8123/ping; do echo -e "waiting for clickhouseDB"; sleep 5; done; echo -e "clickhouse ready, starting otel collector now"

I don't see any svc with that name opened on 8123 port

Copy code

NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP                                                                   PORT(S)                                                                        AGE
signoz-alertmanager                  ClusterIP      172.20.246.103   <none>                                                                        9093/TCP                                                                       15m
signoz-alertmanager-headless         ClusterIP      None             <none>                                                                        9093/TCP                                                                       15m
signoz-clickhouse-operator-metrics   ClusterIP      172.20.50.41     <none>                                                                        8888/TCP                                                                       15m
signoz-frontend                      ClusterIP      172.20.247.80    <none>                                                                        3301/TCP                                                                       15m
signoz-k8s-infra-otel-agent          ClusterIP      172.20.178.168   <none>                                                                        13133/TCP,8888/TCP,4317/TCP,4318/TCP                                           15m
signoz-k8s-infra-otel-deployment     ClusterIP      172.20.163.26    <none>                                                                        13133/TCP,8888/TCP,4317/TCP,4318/TCP                                           15m
signoz-otel-collector                LoadBalancer   172.20.85.211    <deleted>   14250:31331/TCP,14268:30177/TCP,8888:32012/TCP,4317:32618/TCP,4318:31842/TCP   15m
signoz-otel-collector-metrics        ClusterIP      172.20.3.174     <none>                                                                        13133/TCP                                                                      15m
signoz-query-service                 ClusterIP      172.20.216.22    <none>                                                                        8080/TCP,8085/TCP                                                              15m
signoz-zookeeper                     ClusterIP      172.20.158.0     <none>                                                                        2181/TCP,2888/TCP,3888/TCP                                                     15m
signoz-zookeeper-headless            ClusterIP      None             <none>                                                                        2181/TCP,2888/TCP,3888/TCP                                                     15m

priytam.pandey@GL1276-X0 ~ %

Prashant Shahi

12/13/2022, 12:16 PM

Hi @Priytam Pandey, that happens when ClickHouse instance is removed from the cluster. Can you share the output of the following?

Copy code

kubectl -n platform get pods,chi

Priytam Pandey

12/13/2022, 12:53 PM

Copy code

NAME                                                    READY   STATUS     RESTARTS   AGE
pod/signoz-alertmanager-0                               0/1     Init:0/1   0          95m
pod/signoz-clickhouse-operator-847fc56977-9jw8v         2/2     Running    0          95m
pod/signoz-frontend-554f685b4c-sk2lq                    0/1     Init:0/1   0          95m
pod/signoz-k8s-infra-otel-agent-2k56l                   1/1     Running    0          96m
pod/signoz-k8s-infra-otel-agent-m4hbb                   1/1     Running    0          96m
pod/signoz-k8s-infra-otel-agent-wr9fc                   1/1     Running    0          96m
pod/signoz-k8s-infra-otel-deployment-5c547bcb47-h9ngs   1/1     Running    0          95m
pod/signoz-otel-collector-6b5f9d46c5-8gxz4              0/1     Init:0/1   0          95m
pod/signoz-otel-collector-6b5f9d46c5-kkfk2              0/1     Init:0/1   0          95m
pod/signoz-otel-collector-6b5f9d46c5-rqqn6              0/1     Init:0/1   0          95m
pod/signoz-otel-collector-metrics-9547956d5-98x5z       0/1     Init:0/1   0          95m
pod/signoz-query-service-0                              0/1     Init:0/1   0          95m
pod/signoz-zookeeper-0                                  1/1     Running    0          95m
pod/signoz-zookeeper-1                                  1/1     Running    0          95m

there is no resource called chi @Prashant Shahi

Prashant Shahi

12/13/2022, 2:58 PM

@Priytam Pandey ClickHouse resources (except operator) seems to be missing. Did you run

helm uninstall

or removed ClickHouse instances manually?

Prashant Shahi

12/13/2022, 2:58 PM

You could try

helm upgrade

command with the

override-values.yaml

Priytam Pandey

12/13/2022, 3:00 PM

Since upgrade to new version failed, I did helm uninstall , but a few resources were left so I had to delete them manually, I tried reinstalling but no luck, what is option to clean install again

Priytam Pandey

12/13/2022, 3:32 PM

uninstall and install again helped, sorry for troubling a lot

Prashant Shahi

12/13/2022, 3:46 PM

no worries 🙂

Prashant Shahi

12/13/2022, 3:46 PM

Glad that the issue is resolved

Prashant Shahi

12/13/2022, 3:48 PM

Next time, you can follow the instructions below to clean uninstall: https://signoz.io/docs/operate/kubernetes/#uninstall

Priytam Pandey

12/13/2022, 3:57 PM

do you guys have docs for tracing nginx (ingress)httpmodule ?

Pranay

12/13/2022, 4:01 PM

@Priytam Pandey Can you share some more context? Do you mean adding traces from nginx modules - so that a request is tracked from nginx to individual services?

✅ 1

Priytam Pandey

12/13/2022, 4:22 PM

Yes , I meant the same we are using nginx and contour as ingress controller , and seeing if there is option to start tracing from ingress to individual services using signoz tool

Priytam Pandey

12/13/2022, 4:24 PM

there is support for these tools (zipkin, jaegar and datadog), but currently datadog lib is only actively maintained https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/opentracing/ I am not able to find any docs for connecting with otel collector

Srikanth Chekuri

12/13/2022, 4:28 PM

OTEL also has support for nginx

Srikanth Chekuri

12/13/2022, 4:29 PM

https://opentelemetry.io/blog/2022/instrument-nginx/

✅ 1

Priytam Pandey

12/13/2022, 4:29 PM

Copy code

@Priytam Pandey Not clear what do you mean - can you give an example? Do you mean different group of ppl should see only a particular set of services?

@Pranay yes Pranay exactly same , it will help us giving personalized view to each unit , though I understand the attribute filters on landing page and custom dashboards may help

Priytam Pandey

12/13/2022, 4:32 PM

@Srikanth Chekuri ohh I tried this one, was not able to build/find prop .so file (ngx_http_opentelemetry_module.so) for our nginx . Will give it a fresh try again tomorrow, none of the release was working with arm arch based node

🆗 1

Priytam Pandey

12/14/2022, 2:27 PM

Experience till now is awesome. The exception page looks very important to us but needs a better experience in filtering and grouping. After having 18-19 microservices registered many expectations are being missed in pagination and the page is becoming very lengthy and messy. Also table search is buggy not working properly I am not able to figure out to draw a custom panel of exception groups by type and service name can you guys hint to me on how to do that

Pranay

12/14/2022, 5:23 PM

Nice to hear that @Priytam Pandey what would be an ideal experience for you for exceptions? Would you want to group them using services? Also what types of exceptions do you look for in the exceptions page? Would be great if you can share some more details on an issue signoz.io/gh so that we can discuss more on how this experience can be made better

Priytam Pandey

12/16/2022, 8:23 AM

group by: Exception type/Service Name/Attributes or filter on these fields and on top of the Table bar chart showing the exception per mins on applied filters so that we can easily see what is going. On clicking one bar of the chart should show the selected exception on the below table. Basically, the idea is to easily find exceptions in case of 80-90 microservices registered. I will raise table issue in github

39 Views

Open in Slack

Previous Next