Hi, everyone, I followed this link to test django ...
# support
a
Hi, everyone, I followed this link to test django if can send data to SigNoz(https://signoz.io/blog/opentelemetry-django/), I deployed SigNoz successfully, and then I created this django app in kuberentes but got below errors
exec user process caused "exec format error"
by using default image ("signoz/sample-django:latest"), googled and found require to rebuild docker image after adding
#!/bin/bash
in dockerfile, and then I got below errors can you help why this issue happen , I used offical dockerfile to build image
Copy code
/bin/sh: 1: [opentelemetry-instrument,: not found
p
Hey @alan! That error usually happens when running on unsupported architecture. could you please share information about the cluster nodes? Do you have amd or arm nodes?
a
@Prashant Shahi
Copy code
root@uls-tst01-gen1-7b64d94f9c-2zlr8:/# uname -a
Linux uls-tst01-gen1-7b64d94f9c-2zlr8 5.4.0-1025-gkeop #26~18.04.1-Ubuntu SMP Mon Oct 4 03:14:34 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@uls-tst01-gen1-7b64d94f9c-2zlr8:/#
I run docker command locally, it is working, but I got this error
/bin/sh: 1: [opentelemetry-instrument,: not found
when creating pod in kuberenetes. so the docker image should be fine.
p
@alan it looks like the SigNoz Django image is built for ARM. I suppose you would have to build your own AMD Django image and use that.
a
@Prashant Shahi my mac used 2.2 GHz 6-Core Intel Core i7 should not ARM
this is not cpu architecture issue, I rebuilt the image from kubernetes node but got same issue, is this require some dependency to run opentelemetry-instrument in k8s ? pls help @Prashant Shahi
p
@alan let me build that on my machine and get back to you
1
s
Also looking into it...
p
@alan I was able to reproduce issue while building on both arm/amd. I was able to resolve it by passing the appropriate environments and complete commands.
Copy code
docker run --env OTEL_METRICS_EXPORTER=none \
    --env OTEL_SERVICE_NAME=djangoApp \
    --env OTEL_EXPORTER_OTLP_ENDPOINT=http://<IP of SigNoz>:4317 \
    --env DJANGO_SETTINGS_MODULE=mysite.settings \
    -p 8000:8000 \
    -t signoz/sample-django:latest opentelemetry-instrument gunicorn mysite.wsgi -c gunicorn.config.py --workers 2 --threads 2 --reload --bind 0.0.0.0:8000
Meanwhile, I am working on improving the Dockerfile to fix this and reduce the image size. Also, I believe @Srikanth Chekuri would be the best person who would have more knowledge on the Django sample app.
a
@Prashant Shahi the dockerfile is fine , I fixed that issue, and i can run docker in my local without any issues, but when I package it and run in kubernetes, it threw the errors that
/bin/sh: 1: [opentelemetry-instrument,: not found
that's the point, I wonder have you run this in kubernetes ? I am not able to instrument data to SigNoz.
@Srikanth Chekuri can you help also ? thanks
p
Can you share the k8s manifest for the sample django app?
@Prashant Shahi
p
I see you are missing
command
. Can you include the equipment environment variables and the command from the following in the manifest?
Copy code
docker run --env OTEL_METRICS_EXPORTER=none \
    --env OTEL_SERVICE_NAME=djangoApp \
    --env OTEL_EXPORTER_OTLP_ENDPOINT=http://<IP of SigNoz>:4317 \
    --env DJANGO_SETTINGS_MODULE=mysite.settings \
    -p 8000:8000 \
    -t signoz/sample-django:latest opentelemetry-instrument gunicorn mysite.wsgi -c gunicorn.config.py --workers 2 --threads 2 --reload --bind 0.0.0.0:8000
a
same thing, that's the reason I removed the command, @Prashant Shahi
p
you would need to include it. Also, you mentioned that you had built amd image locally. Can you point to that locally built image for k8s?
a
yes, I did but it had same issue,
I can include and using image built from kubernetes node and try it again , will let you know if any change this time
Copy code
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  2m44s                 default-scheduler  Successfully assigned alantest/sample-django-deployment-799456b8cc-8vtr2 to uls-tst01-gen1-7b64d94f9c-p949m
  Normal   Pulled     2m43s                 kubelet            Successfully pulled image "<http://artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd|artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd>" in 97.653019ms
  Normal   Pulled     2m42s                 kubelet            Successfully pulled image "<http://artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd|artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd>" in 69.621653ms
  Normal   Pulled     2m27s                 kubelet            Successfully pulled image "<http://artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd|artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd>" in 65.713606ms
  Normal   Created    119s (x4 over 2m43s)  kubelet            Created container sample-django
  Warning  Failed     119s (x4 over 2m43s)  kubelet            Error: failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "opentelemetry-instrument gunicorn mysite.wsgi -c gunicorn.config.py --workers 2 --threads 2 --reload --bind 0.0.0.0:8000": executable file not found in $PATH: unknown
  Normal   Pulled     119s                  kubelet            Successfully pulled image "<http://artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd|artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd>" in 65.166385ms
  Warning  BackOff    84s (x8 over 2m41s)   kubelet            Back-off restarting failed container
  Normal   Pulling    73s (x5 over 2m43s)   kubelet            Pulling image "<http://artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd|artifactory.wdc.com:6560/bdp-eng-docker/sample-django:amd>"
this is the same I got previously, and then I removed the command and args, now I added back but issue persist.
@Prashant Shahi @Srikanth Chekuri
p
replace command with this:
Copy code
command: ["opentelemetry-instrument", "gunicorn", "mysite.wsgi", "-c", "gunicorn.config.py", "--workers", "2", "--threads", "2", "--reload", "--bind", "0.0.0.0:8000"]
1
a
Copy code
gunicorn: error: unrecognized arguments: none djangoApp <http://my-release-signoz-otel-collector:4317> mysite.settings
@Prashant Shahi
no need argument ?
p
yes, can you remove the the
args
?
a
sure will try
yes this working cool @Prashant Shahi
🎉 1
p
That's great!
👍 1
a
but now issue is Signoz still not yet received the data from this django 🙂
Copy code
C02XGC2HJG5H:apmtest ah1000259263$ k get svc -n alantest
NAME                                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                                                AGE
clickhouse-operator-metrics                ClusterIP   172.30.240.102   <none>        8888/TCP                                                                               14d
jaeger-test-agent                          ClusterIP   172.30.241.78    <none>        5775/UDP,6831/UDP,6832/UDP,5778/TCP,14271/TCP                                          10d
jaeger-test-cassandra                      ClusterIP   None             <none>        7000/TCP,7001/TCP,7199/TCP,9042/TCP,9160/TCP                                           10d
jaeger-test-collector                      ClusterIP   172.30.241.211   <none>        14250/TCP,14268/TCP,14269/TCP                                                          10d
jaeger-test-query                          ClusterIP   172.30.241.46    <none>        80/TCP,16687/TCP                                                                       10d
my-release-clickhouse                      ClusterIP   172.30.240.62    <none>        8123/TCP,9000/TCP                                                                      14d
my-release-signoz-alertmanager             ClusterIP   172.30.241.35    <none>        9093/TCP                                                                               14d
my-release-signoz-alertmanager-headless    ClusterIP   None             <none>        9093/TCP                                                                               14d
my-release-signoz-frontend                 ClusterIP   172.30.241.173   <none>        3301/TCP                                                                               14d
my-release-signoz-otel-collector           ClusterIP   172.30.240.254   <none>        4317/TCP,4318/TCP,55680/TCP,55681/TCP,14250/TCP,14268/TCP,9411/TCP,8888/TCP,8889/TCP   14d
my-release-signoz-otel-collector-metrics   ClusterIP   172.30.241.20    <none>        4317/TCP,4318/TCP,55680/TCP,55681/TCP,14250/TCP,14268/TCP,9411/TCP,8888/TCP            14d
my-release-signoz-query-service            ClusterIP   172.30.240.200   <none>        8080/TCP                                                                               14d
my-release-zookeeper                       ClusterIP   172.30.240.207   <none>        2181/TCP,2888/TCP,3888/TCP                                                             14d
my-release-zookeeper-headless              ClusterIP   None             <none>        2181/TCP,2888/TCP,3888/TCP
I set OTEL exporter endpoint as below in deployment, is this correct ? @Prashant Shahi
Copy code
- name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "<http://my-release-signoz-otel-collector:4317>"
p
@alan As mentioned in the blog, you would have to create poll and visit polls. https://signoz.io/blog/opentelemetry-django/ Steps: 1. Browsing the app and checking data with SigNoz 2. a. Visit http://localhost:8000/admin and create a question for poll 3. b. Then visit the list of polls at http://localhost:8000/polls/ and explore the polls 4. c. The data should be visible now in SigNoz at
http://<IP of SigNoz>:3301
If you haven't updated Dockerfile: admin user:
ankitnayan
password:
password
a
yes, will try and then back to you
I suspect some issue with this
--bind", "0.0.0.0:8000"]
in yaml, when I create virtual service I got
400
error @Prashant Shahi
if so, what else I need to change ? I am not familiar with django, so asking @Prashant Shahi
this bind to http://0.0.0.0:8000 is this correct ?
@Prashant Shahi
@Srikanth Chekuri
p
can you port-forward and check if it works in localhost?
Try the following endpoints:
Copy code
<http://localhost:8000/admin>
<http://localhost:8000/polls/>
a
ok port-forward is good
and I create poll, I can see logs get updated but signiz didn't get updated
@Prashant Shahi
p
visit polls.. vote..
check SigNoz.
it should show up.
a
p
check SigNoz UI
a
p
hit refresh.
also, check otel-collector logs.
first of all, you should check if the OtelCollector Otlp endpoint is accessible. Follow the troubleshooting guide for k8s: https://signoz.io/docs/install/troubleshooting/
Copy code
kubectl -n platform run troubleshoot --image=signoz/troubleshoot \
  --restart='OnFailure' -i --tty --rm --command -- ./troubleshoot checkEndpoint \
  --endpoint=my-release-signoz-otel-collector.platform.svc.cluster.local:4317
a
will check
Copy code
C02XGC2HJG5H:jaeger ah1000259263$ kubectl -n alantest run troubleshoot --image=<http://artifactory.wdc.com:6609/signoz/troubleshoot|artifactory.wdc.com:6609/signoz/troubleshoot> -l e2-criticality=2,e2-environment=DEV,e2-owner=E2.Owner,e2-project=e2-migration,e2-support-contact=E2.Support \
>   --restart='OnFailure' -i --tty --rm --command -- ./troubleshoot checkEndpoint \
>   --endpoint=my-release-signoz-otel-collector:4317
If you don't see a command prompt, try pressing enter.
Error attaching, falling back to logs: unable to upgrade connection: container troubleshoot not found in pod troubleshoot_alantest
2022-04-28T09:10:37.205Z	INFO	troubleshoot/main.go:28	STARTING!
2022-04-28T09:10:37.208Z	INFO	checkEndpoint/checkEndpoint.go:41	checking reachability of SigNoz endpoint
2022-04-28T09:10:37.335Z	INFO	troubleshoot/main.go:46	Successfully sent sample data to signoz ...
pod "troubleshoot" deleted
@Prashant Shahi this means OtelCollector Otlp endpoint is accessible. correct?
troubleshoot/main.go:46	Successfully sent sample data to signoz
p
yes, that means SigNoz cluster is up and running properly.
a
then why signiz no data ?
let me know anything you need, this is log from my-release-signoz-otel-collector-7d895854c9-b7srl @
1. now the django is running successfully, 2. run the troubleshoot pod tested my-release-signoz-otel-collector:4317 also good 3. add few polls in django 4. but signiz still shows no data
@Prashant Shahi let me know anything you need for troubleshooting
is http need to be removed ?
Copy code
- name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: <http://my-release-signoz-otel-collector:4317>
p
it all looks good to me.. can you create another poll and vote again?
@alan
a
sure
checked but still no data
p
http is optional in some SDK for grpc port.
a
nothing changed in my-release-signoz-otel-collector-7d895854c9-b7srl
🥲
@Prashant Shahi not sure if this conflicts , I installed a Instrumentation by following this thread: https://medium.com/opentelemetry/using-opentelemetry-auto-instrumentation-agents-in-kubernetes-869ec0f42377