Hey I have an issue. I'm collctiong datapoint with...
# support
g
Hey I have an issue. I'm collctiong datapoint with httpcheck to ping a service to see if it's up. Doing with an interval of 30s. But working with t I had strange issue where I can see I have no data. Looking at the DB I see that I have data only every hours One of my httpcheck is like
Copy code
receivers:
  # Healthcheck - un receiver par service
  httpcheck/passport:
    collection_interval: 30s
    targets:
      - endpoint: "<https://foobar.com/healthcheck>"
        method: GET
        tls:
          insecure_skip_verify: true
In the db I do
Copy code
SELECT
    DISTINCT(CAST(JSONExtractString(labels, 'health_score') AS Int32)) AS health_score,
    unix_milli
FROM signoz_metrics.time_series_v4
WHERE
    metric_name = 'httpcheck_status'
    AND JSONExtractString(labels, 'service_name') = 'PASSPORT'
    AND JSONExtractString(labels, 'deployment_environment') = 'DEV08'
    AND JSONExtractString(labels, 'health_score') IS NOT NULL
    AND JSONExtractString(labels, 'health_score') != ''
    AND JSONExtractString(labels, 'health_score') != '0'
ORDER BY unix_milli DESC

Query id: cc24e792-b1b7-4329-84b2-6960482ea16f

    ┌─health_score─┬────unix_milli─┐
 1. │            3 │ 1747810800000 │
 2. │            3 │ 1747807200000 │
 3. │            3 │ 1747803600000 │
 4. │            3 │ 1747800000000 │
 5. │            3 │ 1747796400000 │
 6. │            3 │ 1747792800000 │
 7. │            3 │ 1747789200000 │
 8. │            3 │ 1747785600000 │
 9. │            3 │ 1747782000000 │
10. │            3 │ 1747778400000 │
11. │            3 │ 1747774800000 │
12. │            3 │ 1747771200000 │
13. │            3 │ 1747767600000 │
14. │            3 │ 1747764000000 │
15. │            3 │ 1747760400000 │
16. │            3 │ 1747756800000 │
17. │            3 │ 1747753200000 │
18. │            3 │ 1747749600000 │
    └──────────────┴───────────────┘
n
Hey @Gil Let me check on this once
g
Hey @Nagesh Bansal
I just try something... I stop the collector, truncate all the tables and restart the collector only with a healthcheck for 1 service I want to check, it was 9:33 when I started the collector and the unix time I got in the DB was 1748242800000 which is 9:00 SO the collector may get data every 30s but write it in the DB as a rounded hours
n
Can you also check in the otel-collector logs whether it's pinging the service every 30s
g
How can I do that ?
n
You can use the debug exporter to check when the httpcheck is getting triggered
g
Hi @Nagesh Bansal as requested. With PromQL I did something like
Copy code
max(last_over_time(httpcheck_status{health_score!="", health_score!='0', service_name="PASSPORT"}[5m]))
(or a version of this) Not knowing well the PromQL syntax I try to go with SQL.
Copy code
SELECT
    CAST(JSONExtractString(labels, 'health_score') AS Int32) AS health_score,
    JSONExtractString(labels, 'service_name') AS service_name,
    JSONExtractString(labels, 'deployment_environment') AS deployment_environment,
    JSONExtractString(labels, 'http_status_code') AS http_status_code,
    unix_milli,
    labels
FROM signoz_metrics.time_series_v4
WHERE
    metric_name = 'httpcheck_status'
    AND JSONExtractString(labels, 'service_name') = 'PASSPORT'
    AND JSONExtractString(labels, 'deployment_environment') = 'DEV08'
    AND unix_milli >= toUnixTimestamp(now() - INTERVAL 1 HOUR) * 1000
ORDER BY unix_milli DESC
LIMIT 10
I changed the
INTERVAL
to 1min or 30s second but I'm having no data. Like we saw, the collector send data every 30s as excepted, but in the
signoz_metrics.time_series_v4
table only data point rounded up to the hour are present (see my previous response for an exemple). So maybe it's perfectly normal and this table should get a datapoint every hour and there is a process that do that, but with the PromQL query it aims another table.
Hi @Nagesh Bansal any news ?
I think I have found something
n
Hey @Gil sorry for delay
I would recommend that you use the query-builder because when we try to upgrade the schema it might break your clickhouse queries
But if still you want to use the clickhouse queries and want to understand which table to look for you could use this doc: https://signoz.io/docs/userguide/write-a-metrics-clickhouse-query/
g
@Nagesh Bansal Thank you for your advice and the link. The issue I have found is this. From a previous discussion one of your colleague told me that the suffixed *_v2 tables are here for back compatibilities and not intended to be used from now. Like I show you, the collector send data every 30s as excepted but we found in *__v4 tables
unix__milli
to be rounded down to the last hour. Ex a ping ar 104130 would write at 100000 (in a unix manner).
Copy code
Row 1:
──────
env:            DEV08
temporality:    Cumulative
metric_name:    httpcheck_status
description:    1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:           1
type:           Sum
is_monotonic:   false
fingerprint:    2673516990800643293 -- 2.67 quintillion
unix_milli:     1748851200000 -- 1.75 trillion
labels:         {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"4xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
attrs:          {}
scope_attrs:    {}
resource_attrs: {}
__normalized:   true

Row 2:
──────
env:            DEV08
temporality:    Cumulative
metric_name:    httpcheck_status
description:    1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:           1
type:           Sum
is_monotonic:   false
fingerprint:    5068959940230499294 -- 5.07 quintillion
unix_milli:     1748851200000 -- 1.75 trillion
labels:         {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"1xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
attrs:          {}
scope_attrs:    {}
resource_attrs: {}
__normalized:   true

Row 3:
──────
env:            DEV08
temporality:    Cumulative
metric_name:    httpcheck_status
description:    1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:           1
type:           Sum
is_monotonic:   false
fingerprint:    7659024227477066828 -- 7.66 quintillion
unix_milli:     1748851200000 -- 1.75 trillion
labels:         {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"3xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
attrs:          {}
scope_attrs:    {}
resource_attrs: {}
__normalized:   true
Using PromQL I had some good result in the past but I had issue few week ago. Since I'm not familliar with the Query Builder (can you show me how to do it here ?) nor with the PromQL grammar, I tried with a simple SQL command from the dashboard. Then I try the adapted query on the *_v2 tables
Copy code
Row 1:
──────
metric_name:  httpcheck_duration
fingerprint:  15655094791583317282 -- 15.66 quintillion
timestamp_ms: 1748852108517 -- 1.75 trillion
labels:       {"__name__":"httpcheck_duration","__temporality__":"Unspecified","deployment_environment":"DEV08","host_name":"FRPAR3DXDEV08","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
temporality:  Unspecified
description:  Measures the duration of the HTTP check.
unit:         ms
type:         Gauge
is_monotonic: false

Row 2:
──────
metric_name:  httpcheck_status
fingerprint:  2673516990800643293 -- 2.67 quintillion
timestamp_ms: 1748852108517 -- 1.75 trillion
labels:       {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"4xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
temporality:  Cumulative
description:  1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:         1
type:         Sum
is_monotonic: false

Row 3:
──────
metric_name:  httpcheck_status
fingerprint:  5068959940230499294 -- 5.07 quintillion
timestamp_ms: 1748852108517 -- 1.75 trillion
labels:       {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"1xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
temporality:  Cumulative
description:  1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:         1
type:         Sum
is_monotonic: false
Here the column's name is
timestamp_ms
but the data is right ! So I think there is something that happen, at list for the
recerivers/httpcheck
on your side on the *_v4 tables that round up the times
I totally understand that the schema can change in the future. I'm more experienced in SQL that the others options and I don't find, but that's maybe me, how to use the Query Builder correctly