This message was deleted.
# general
s
This message was deleted.
p
@Prashant Shahi do you have any insights on this
p
Hi , I have to demo this to my company on Monday. Will we be able look this issue before that ? Let me try Jager in parallel in case if this doesn't work
p
Hi @Priytam Pandey 👋 It could be caused by either misconfigured instrumentation or at ingress level.
Perhaps you can try with
tracegen
to see if you are able to send traces: To install
tracegen
binary:
Copy code
go install <http://github.com/open-telemetry/opentelemetry-collector-contrib/tracegen@v0.63.0|github.com/open-telemetry/opentelemetry-collector-contrib/tracegen@v0.63.0>
Execute the command below to send sample trace data using `tracegen`:
Copy code
tracegen -traces 1 -otlp-endpoint <http://otelcollector.dash101.com:80|otelcollector.dash101.com:80> -otlp-insecure
@Srikanth Chekuri any idea about the above error message in instrumentation?
s
@Priytam Pandey can you update the receiver message size in otel collector for gRPC and give it a try?
Copy code
receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            max_recv_msg_size_mib: 25
Copy code
max_recv_msg_size_mib: 25
this one specifically
And can you also confirm you are connecting to correct grpc server https://github.com/grpc/grpc-java/issues/8164#issuecomment-838877435
This is not an issue with instrumentation. Please make sure the server configured to export telemetry is gRPC server.
p
Copy code
signoz-otel-collector                ClusterIP   172.20.243.65    <none>        14250/TCP,14268/TCP,8888/TCP,4317/TCP,4318/TCP
I believe port 4317 is grpc right ?
Copy code
<http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC
and ingress is annotated with grpc protocal
how to verify otherwise ?
p
@Priytam Pandey which K8s version are you using?
p
1.21, nodes are amd arch and helm is 3
p
You seem to be missing
ingressClassName: nginx
from the ingress.
Only in K8s version < 1.18, we can pass ingress class name via annotation.
For >= 1.18, we will have to set
ingressClassName
Copy code
otelCollector:
  ingress:
    enabled: true
    className: nginx
    annotations:
      <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: "GRPC"
    ....
p
My default ingress class is nginx that is why I chose not to mention it
p
Also, for instrumentation, you might have to pass the port
80
next to address.
p
Ohh i was in impression that http default port is 80 😞 will try it,
p
Also, some otel collector SDKs don't accept
http
or
https
prefix for GRPC endpoint.
p
so you are suggesting to rmeove http:// prefix ? something like below
Copy code
OTEL_EXPORTER_OTLP_ENDPOINT="otelcollector.****.com:80"
p
Before that, you can try with
<http://otelcollector.dash101.com:80>
p
Copy code
2022-12-11T15:44:41.421+0530	INFO	tracegen@v0.55.0/main.go:80	starting gRPC exporter
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel created	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] original dial target is: "otelcollector.****.com:80"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] parsed dial target is: {Scheme:otelcollector.****.com Authority: Endpoint:80 URL:{Scheme:otelcollector.****.com Opaque:80 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] fallback to scheme "passthrough"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] parsed dial target is: {Scheme:passthrough Authority: Endpoint:otelcollector.****.com:80 URL:{Scheme:passthrough Opaque: User: Host: Path:/otelcollector.*****.com:80 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.428+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel authority set to "otelcollector.****.com:80"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.429+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Resolver state updated: {
  "Addresses": [
    {
      "Addr": "otelcollector.****.com:80",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Type": 0,
      "Metadata": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.429+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel switches to new LB policy "pick_first"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.429+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel created	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel picks a new address "otelcollector.***.com:80" to connect	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {CONNECTING <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:41.430+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.514+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.514+0530	INFO	grpclog/component.go:71	[transport]transport: loopyWriter.run returning. connection error: desc = "transport is closing"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {IDLE <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.515+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel picks a new address "otelcollector.***.com:80" to connect	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.516+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {CONNECTING <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.516+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.578+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.579+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {READY <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.579+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.579+0530	INFO	tracegen/config.go:105	generation of traces isn't being throttled
2022-12-11T15:44:42.580+0530	INFO	tracegen/worker.go:91	traces generated	{"worker": 0, "traces": 1}
2022-12-11T15:44:42.581+0530	INFO	tracegen@v0.55.0/main.go:98	stop the batch span processor
2022-12-11T15:44:42.617+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.617+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {IDLE <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.617+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:42.617+0530	INFO	grpclog/component.go:71	[transport]transport: loopyWriter.run returning. connection error: desc = "transport is closing"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.143+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.143+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel picks a new address "otelcollector.****.com:80" to connect	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.148+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {CONNECTING <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.148+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to CONNECTING	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.247+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.247+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {READY <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.248+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to READY	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.298+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.299+0530	INFO	grpclog/component.go:71	[transport]transport: loopyWriter.run returning. connection error: desc = "transport is closing"	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.299+0530	INFO	grpclog/component.go:71	[core]pickfirstBalancer: UpdateSubConnState: 0xc000354c30, {IDLE <nil>}	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:48.299+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to IDLE	{"system": "grpc", "grpc_log": true}
2022/12/11 15:44:52 context deadline exceeded
2022-12-11T15:44:52.585+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel Connectivity change to SHUTDOWN	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.586+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.586+0530	INFO	channelz/funcs.go:340	[core][Channel #1 SubChannel #2] Subchannel deleted	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.586+0530	INFO	channelz/funcs.go:340	[core][Channel #1] Channel deleted	{"system": "grpc", "grpc_log": true}
2022-12-11T15:44:52.587+0530	INFO	tracegen@v0.55.0/main.go:89	stopping the exporter
none of the above solution helped , sorry for late response
but more I go deeper , it looks like there is problem at my end in cluster , will keep posted here
for now since ingress is not working (nginx) have declared service as loadbalancer and directly using 4317 port
@Prashant Shahi @Srikanth Chekuri @Pranay @Ankit Nayan, can we group services on the landing page and provide access as per the experience unit in the same organization?
p
@Priytam Pandey Not clear what do you mean - can you give an example? Do you mean different group of ppl should see only a particular set of services?
p
hey @Prashant Shahi this Init command is failing from yesterday release after helm started using clickhouse cluster (distributed)
Copy code
until wget --user "${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}" --spider -q signoz-clickhouse:8123/ping; do echo -e "waiting for clickhouseDB"; sleep 5; done; echo -e "clickhouse ready, starting otel collector now"
I don't see any svc with that name opened on 8123 port
Copy code
NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP                                                                   PORT(S)                                                                        AGE
signoz-alertmanager                  ClusterIP      172.20.246.103   <none>                                                                        9093/TCP                                                                       15m
signoz-alertmanager-headless         ClusterIP      None             <none>                                                                        9093/TCP                                                                       15m
signoz-clickhouse-operator-metrics   ClusterIP      172.20.50.41     <none>                                                                        8888/TCP                                                                       15m
signoz-frontend                      ClusterIP      172.20.247.80    <none>                                                                        3301/TCP                                                                       15m
signoz-k8s-infra-otel-agent          ClusterIP      172.20.178.168   <none>                                                                        13133/TCP,8888/TCP,4317/TCP,4318/TCP                                           15m
signoz-k8s-infra-otel-deployment     ClusterIP      172.20.163.26    <none>                                                                        13133/TCP,8888/TCP,4317/TCP,4318/TCP                                           15m
signoz-otel-collector                LoadBalancer   172.20.85.211    <deleted>   14250:31331/TCP,14268:30177/TCP,8888:32012/TCP,4317:32618/TCP,4318:31842/TCP   15m
signoz-otel-collector-metrics        ClusterIP      172.20.3.174     <none>                                                                        13133/TCP                                                                      15m
signoz-query-service                 ClusterIP      172.20.216.22    <none>                                                                        8080/TCP,8085/TCP                                                              15m
signoz-zookeeper                     ClusterIP      172.20.158.0     <none>                                                                        2181/TCP,2888/TCP,3888/TCP                                                     15m
signoz-zookeeper-headless            ClusterIP      None             <none>                                                                        2181/TCP,2888/TCP,3888/TCP                                                     15m
priytam.pandey@GL1276-X0 ~ %
p
Hi @Priytam Pandey, that happens when ClickHouse instance is removed from the cluster. Can you share the output of the following?
Copy code
kubectl -n platform get pods,chi
p
Copy code
NAME                                                    READY   STATUS     RESTARTS   AGE
pod/signoz-alertmanager-0                               0/1     Init:0/1   0          95m
pod/signoz-clickhouse-operator-847fc56977-9jw8v         2/2     Running    0          95m
pod/signoz-frontend-554f685b4c-sk2lq                    0/1     Init:0/1   0          95m
pod/signoz-k8s-infra-otel-agent-2k56l                   1/1     Running    0          96m
pod/signoz-k8s-infra-otel-agent-m4hbb                   1/1     Running    0          96m
pod/signoz-k8s-infra-otel-agent-wr9fc                   1/1     Running    0          96m
pod/signoz-k8s-infra-otel-deployment-5c547bcb47-h9ngs   1/1     Running    0          95m
pod/signoz-otel-collector-6b5f9d46c5-8gxz4              0/1     Init:0/1   0          95m
pod/signoz-otel-collector-6b5f9d46c5-kkfk2              0/1     Init:0/1   0          95m
pod/signoz-otel-collector-6b5f9d46c5-rqqn6              0/1     Init:0/1   0          95m
pod/signoz-otel-collector-metrics-9547956d5-98x5z       0/1     Init:0/1   0          95m
pod/signoz-query-service-0                              0/1     Init:0/1   0          95m
pod/signoz-zookeeper-0                                  1/1     Running    0          95m
pod/signoz-zookeeper-1                                  1/1     Running    0          95m
there is no resource called chi @Prashant Shahi
p
@Priytam Pandey ClickHouse resources (except operator) seems to be missing. Did you run
helm uninstall
or removed ClickHouse instances manually?
You could try
helm upgrade
command with the
override-values.yaml
.
p
Since upgrade to new version failed, I did helm uninstall , but a few resources were left so I had to delete them manually, I tried reinstalling but no luck, what is option to clean install again
uninstall and install again helped, sorry for troubling a lot
p
no worries 🙂
Glad that the issue is resolved
Next time, you can follow the instructions below to clean uninstall: https://signoz.io/docs/operate/kubernetes/#uninstall
p
do you guys have docs for tracing nginx (ingress)httpmodule ?
p
@Priytam Pandey Can you share some more context? Do you mean adding traces from nginx modules - so that a request is tracked from nginx to individual services?
p
Yes , I meant the same we are using nginx and contour as ingress controller , and seeing if there is option to start tracing from ingress to individual services using signoz tool
there is support for these tools (zipkin, jaegar and datadog), but currently datadog lib is only actively maintained https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/opentracing/ I am not able to find any docs for connecting with otel collector
s
OTEL also has support for nginx
p
Copy code
@Priytam Pandey Not clear what do you mean - can you give an example? Do you mean different group of ppl should see only a particular set of services?
@Pranay yes Pranay exactly same , it will help us giving personalized view to each unit , though I understand the attribute filters on landing page and custom dashboards may help
@Srikanth Chekuri ohh I tried this one, was not able to build/find prop .so file (ngx_http_opentelemetry_module.so) for our nginx . Will give it a fresh try again tomorrow, none of the release was working with arm arch based node
Experience till now is awesome. The exception page looks very important to us but needs a better experience in filtering and grouping. After having 18-19 microservices registered many expectations are being missed in pagination and the page is becoming very lengthy and messy. Also table search is buggy not working properly I am not able to figure out to draw a custom panel of exception groups by type and service name can you guys hint to me on how to do that
p
Nice to hear that @Priytam Pandey what would be an ideal experience for you for exceptions? Would you want to group them using services? Also what types of exceptions do you look for in the exceptions page? Would be great if you can share some more details on an issue signoz.io/gh so that we can discuss more on how this experience can be made better
p
group by: Exception type/Service Name/Attributes or filter on these fields and on top of the Table bar chart showing the exception per mins on applied filters so that we can easily see what is going. On clicking one bar of the chart should show the selected exception on the below table. Basically, the idea is to easily find exceptions in case of 80-90 microservices registered. I will raise table issue in github