https://signoz.io logo
#support
Title
# support
j

James Henrich

04/11/2023, 3:35 AM
My frontend dashboard stopped displaying all metric graphs / trace graphs after a certain point in time. No issues with collector / query service as far as im concerned as logs look good and I am getting alert notifications on metrics I am emitting. Restarting the service doesnt help, any idea on how to troubleshoot this? Some info: signoz version: v0.15.0 running via docker-compose
v

Vishal Sharma

04/11/2023, 5:11 AM
Please check retention period in settings: https://signoz.io/docs/userguide/retention-period/ By default retention is set to 7 days for logs and traces, and 30 days for metrics.
j

James Henrich

04/11/2023, 11:15 AM
I dont think this is a retention issue, the default time view is 15 minutes and all metric/logs are not showing when metrics are in fact being emitted within that window
v

Vishal Sharma

04/11/2023, 11:31 AM
Can you check response of APIs on service page?
j

James Henrich

04/11/2023, 11:33 AM
I am getting no data on services page, I remember there existed information on signoz related components when it was working.
did a bit of debugging myself, an example req frontend sends to query service:
Copy code
{
  "start": 1681421689000,
  "end": 1681422589000,
  "step": 60,
  "variables": {
    "SIGNOZ_START_TIME": 1681421689000,
    "SIGNOZ_END_TIME": 1681422589000
  },
  "dataSource": 1,
  "compositeMetricQuery": {
    "queryType": 1,
    "panelType": 1,
    "builderQueries": {
      "A": {
        "queryName": "A",
        "aggregateOperator": 18,
        "metricName": "scalper_service_down",
        "tagFilters": {
          "items": [],
          "op": "AND"
        },
        "groupBy": [
          "profile"
        ],
        "expression": "A",
        "disabled": false
      }
    }
  }
}
resp:
Copy code
{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": null
  }
}
query service logs from request:
Copy code
2023-04-13T20:52:16.269Z        INFO    app/server.go:236       /api/v1/version timeTaken: 155.978µs
2023-04-13T20:52:16.271Z        INFO    app/server.go:236       /api/v1/featureFlags    timeTaken: 269.238µs
2023-04-13T20:52:16.323Z        INFO    app/server.go:236       /api/v1/configs timeTaken: 45.034907ms
2023-04-13T20:52:16.449Z        INFO    app/server.go:236       /api/v1/dashboards/{uuid}       timeTaken: 1.375588ms
2023-04-13T20:52:16.640Z        INFO    api/metrics.go:21       CustomMetricsFunction feature is not enabled in this plan
2023-04-13T20:52:16.640Z        INFO    clickhouseReader/reader.go:3071 Executing metric result query: SELECT profile,  ts, sum(value) as value FROM (SELECT profile,  ts, runningDifference(value)/runningDifference(ts) as value FROM(SELECT fingerprint, profile,  toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 60 SECOND) as ts, max(value) as value FROM signoz_metrics.distributed_samples_v2 GLOBAL INNER JOIN (SELECT  JSONExtractString(labels, 'profile') as profile, fingerprint FROM signoz_metrics.distributed_time_series_v2 WHERE metric_name = 'scalper_service_down') as filtered_time_series USING fingerprint WHERE metric_name = 'scalper_service_down' AND timestamp_ms >= 1681421689000 AND timestamp_ms <= 1681422540000 GROUP BY fingerprint, profile,ts ORDER BY fingerprint, profile,  ts) OFFSET 1) GROUP BY profile,ts ORDER BY profile,  ts
so im noticing the following: "CustomMetricsFunction feature is not enabled in this plan" Looking at the plans, nothing about this is mentioned that posting custom metrics premium feature. Did something recently change?
@Vishal Sharma
v

Vishal Sharma

04/14/2023, 4:19 AM
@Srikanth Chekuri Can you please have a look?
s

Srikanth Chekuri

04/14/2023, 4:41 AM
It’s a spurious message. Nothing about the plans changed, and it’s not a premium feature.
j

James Henrich

04/14/2023, 7:38 AM
Thanks for the response. Is there any other reason why the frontend cant see any data then / any advice on how to fix this problem? Alerts are working properly and metrics are being stored, its just that the data on the dashboard has been broken for about a week now. Restarts aren't helping
s

Srikanth Chekuri

04/14/2023, 7:40 AM
Alerts are working properly and metrics are being stored, its just that the data on the dashboard has been broken for about a week now.
Are these alerts on the same data or different metrics?
j

James Henrich

04/14/2023, 7:40 AM
The same data, yes
s

Srikanth Chekuri

04/14/2023, 7:44 AM
That shouldn’t be the case. Are you saying alerts are firing, but the services/dashboards are not working, and you are also confirming that metrics are being stored?
j

James Henrich

04/14/2023, 7:49 AM
I am not sure how alerts can be triggered properly on all the metrics I am posting without being stored properly
ive confirmed they are being emitted, I havent checked the db directly no
s

Srikanth Chekuri

04/14/2023, 7:54 AM
and metrics are being stored
You mentioned metrics are being stored. So by that statement I assumed you have confirmed that data exists in DB.
My frontend dashboard stopped displaying all metric graphs / trace graphs after a certain point in time
You mentioned this in your original post. I suspect your applications have stopped sending data or there is some issue from SDK when sending data.
I am not sure how alerts can be triggered properly on all the metrics I am posting without being stored properly (edited)
Alerts use the exact same queries and database. It would be very surprising for me if the same kind of query works in an alert but not in the dashboard. Alerts don’t break if the query returns no data. Can you share some of these alerts where dashboard doesn’t work but alert works.
j

James Henrich

04/14/2023, 8:06 AM
an example of logs from query-service for an alert currently being fired ("Gerald Scalper Service Down" -> metrc: scalper_service_down) with no metric data in dashboard:
Copy code
2023-04-14T08:00:10.196Z        DEBUG   rules/thresholdRule.go:625      ruleid:%!(EXTRA string=14, string=       runQueries:, map[string]string=map[A:SELECT profile,  ts, sum(value) as value FROM (SELECT profile,  ts, runningDifference(value)/runningDifference(ts) as value FROM(SELECT fingerprint, profile,  toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 30 SECOND) as ts, max(value) as value FROM signoz_metrics.distributed_samples_v2 GLOBAL INNER JOIN (SELECT  JSONExtractString(labels, 'profile') as profile, fingerprint FROM signoz_metrics.distributed_time_series_v2 WHERE metric_name = 'scalper_service_down' AND JSONExtractString(labels, 'profile') NOT IN ['gerald']) as filtered_time_series USING fingerprint WHERE metric_name = 'scalper_service_down' AND timestamp_ms >= 1681458310194 AND timestamp_ms <= 1681459210194 GROUP BY fingerprint, profile,ts ORDER BY fingerprint, profile,  ts) OFFSET 1) GROUP BY profile,ts ORDER BY profile,  ts])
2023-04-14T08:00:10.196Z        DEBUG   rules/thresholdRule.go:643      ruleId: %!(EXTRA string=14, string=      result query label:, string=A)
2023-04-14T08:00:10.229Z        DEBUG   rules/thresholdRule.go:525      ruleid:%!(EXTRA string=14, string=       resultmap(potential alerts):, int=5)
2023-04-14T08:00:10.229Z        DEBUG   rules/thresholdRule.go:326      target:%!(EXTRA float64=0, float64=0.001)
2023-04-14T08:00:10.229Z        DEBUG   rules/thresholdRule.go:326      target:%!(EXTRA float64=0, float64=0.001)
2023-04-14T08:00:10.229Z        DEBUG   rules/thresholdRule.go:326      target:%!(EXTRA float64=0, float64=0.001)
2023-04-14T08:00:10.229Z        DEBUG   rules/thresholdRule.go:326      target:%!(EXTRA float64=1, float64=0.001)
2023-04-14T08:00:10.229Z        DEBUG   rules/thresholdRule.go:326      target:%!(EXTRA float64=0, float64=0.001)
2023-04-14T08:00:10.229Z        DEBUG   rules/thresholdRule.go:534      ruleid:%!(EXTRA string=14, string=       result (found alerts):, int=1)
2023-04-14T08:00:10.229Z        INFO    rules/thresholdRule.go:740      rule:Gerald Scalper Service Down         alerts found: 1
2023-04-14T08:00:10.229Z        INFO    rules/thresholdRule.go:295      msg:sending alerts       rule:Gerald Scalper Service Down
And correct, I did not verify directly as I'm not too comfortable with clickhouse. It was just a strong assumption based on alert behaviour.
s

Srikanth Chekuri

04/14/2023, 8:09 AM
an example of logs from query-service for an alert currently being fired (“Gerald Scalper Service Down” -> metrc: scalper_service_down) with no metric data in dashboard:
You are not seeing data because the service is down and not sending any data?
j

James Henrich

04/14/2023, 8:11 AM
This is a separate service that performs health checks on other microservices. It is up and sending metrics. This is just one metric I chose as an example. I am very confident metrics are being sent properly as these alerts are correct and in this example the scalper service is in fact down.
s

Srikanth Chekuri

04/14/2023, 8:13 AM
I am very confident metrics are being sent properly as these alerts are correct and in this example the scalper service is in fact down
I was asking for a microservice which has alerts setup and working but the dashboard not working? Is there a such service you can share? How are you confirming the metrics are being sent from these microserivices?
j

James Henrich

04/14/2023, 8:20 AM
This is an example of that. Alerts are working properly on the "scalper_service_down" metric, but I cannot see said metric data in the dashboard. I dont think its relevant here that the service the metric relates to isn't the one posting it, but I can grab another one if you disagree I am checking both that the application is emitting looking at opentelemetry debug information, and that the otel collector service is healthy (not much logs here that I can see).
s

Srikanth Chekuri

04/14/2023, 8:25 AM
IMO, That service is not correct because it is down, and for the right reasons, alerts are being fired, it makes sense you don’t see any other data for the said service because it’s down and not sending any data. Are you seeing any traces under the
Traces
tab for the service?
j

James Henrich

04/14/2023, 8:28 AM
No, they've existed before. Nothing under services tab either. That service in specific isnt posting any metrics so theres no data that can be missing.
I stopped the service strictly for getting a controlled example to share
s

Srikanth Chekuri

04/14/2023, 8:30 AM
No, they’ve existed before. Nothing under services tab either.
This is high likely because there is some issue sending data to SigNoz as you mentioned collector is healthy and has no error logs. Can you select longer time range and check if it shows the services? Can you exec into ClickHouse and run some queries that I can share to confirm if there is any data received in the last X minutes/hours/days?
j

James Henrich

04/14/2023, 8:34 AM
ahh interesting, It seems that I am getting no data for all intervals <= 30, but above that (>= 60) metric/service/trace data shows up. Before this was not the case and I could see data for intervals 5, 15 and 30. And yes I can run some queries if we still need to
s

Srikanth Chekuri

04/14/2023, 8:37 AM
What is the last timestamp in the chart for the >= 60? Is there any chance that a time zone difference from that could cause this?
j

James Henrich

04/14/2023, 8:42 AM
Ahhh I found it. It is indeed a timezone issue and the machine I am using is an hour ahead of the applications reporting metrics/signoz clock. Setting back the time has fixed the problem. Sorry for the trouble and this being a pretty dumb mistake, thanks for helping out
6 Views