Hey I have an issue I m collctiong datapoint with httpcheck SigNoz Community #support

Hey I have an issue. I'm collctiong datapoint with...

Gil

05/22/2025, 2:01 PM

Hey I have an issue. I'm collctiong datapoint with httpcheck to ping a service to see if it's up. Doing with an interval of 30s. But working with t I had strange issue where I can see I have no data. Looking at the DB I see that I have data only every hours One of my httpcheck is like

Copy code

receivers:
  # Healthcheck - un receiver par service
  httpcheck/passport:
    collection_interval: 30s
    targets:
      - endpoint: "<https://foobar.com/healthcheck>"
        method: GET
        tls:
          insecure_skip_verify: true

In the db I do

Copy code

SELECT
    DISTINCT(CAST(JSONExtractString(labels, 'health_score') AS Int32)) AS health_score,
    unix_milli
FROM signoz_metrics.time_series_v4
WHERE
    metric_name = 'httpcheck_status'
    AND JSONExtractString(labels, 'service_name') = 'PASSPORT'
    AND JSONExtractString(labels, 'deployment_environment') = 'DEV08'
    AND JSONExtractString(labels, 'health_score') IS NOT NULL
    AND JSONExtractString(labels, 'health_score') != ''
    AND JSONExtractString(labels, 'health_score') != '0'
ORDER BY unix_milli DESC

Query id: cc24e792-b1b7-4329-84b2-6960482ea16f

    ┌─health_score─┬────unix_milli─┐
 1. │            3 │ 1747810800000 │
 2. │            3 │ 1747807200000 │
 3. │            3 │ 1747803600000 │
 4. │            3 │ 1747800000000 │
 5. │            3 │ 1747796400000 │
 6. │            3 │ 1747792800000 │
 7. │            3 │ 1747789200000 │
 8. │            3 │ 1747785600000 │
 9. │            3 │ 1747782000000 │
10. │            3 │ 1747778400000 │
11. │            3 │ 1747774800000 │
12. │            3 │ 1747771200000 │
13. │            3 │ 1747767600000 │
14. │            3 │ 1747764000000 │
15. │            3 │ 1747760400000 │
16. │            3 │ 1747756800000 │
17. │            3 │ 1747753200000 │
18. │            3 │ 1747749600000 │
    └──────────────┴───────────────┘

Nagesh Bansal

05/23/2025, 8:38 AM

Hey @Gil Let me check on this once

Gil

05/26/2025, 7:25 AM

Hey @Nagesh Bansal

Gil

05/26/2025, 7:45 AM

I just try something... I stop the collector, truncate all the tables and restart the collector only with a healthcheck for 1 service I want to check, it was 9:33 when I started the collector and the unix time I got in the DB was 1748242800000 which is 9:00 SO the collector may get data every 30s but write it in the DB as a rounded hours

Nagesh Bansal

05/26/2025, 7:52 AM

Can you also check in the otel-collector logs whether it's pinging the service every 30s

Gil

05/26/2025, 7:56 AM

How can I do that ?

Nagesh Bansal

05/26/2025, 8:17 AM

You can use the debug exporter to check when the httpcheck is getting triggered

Gil

05/26/2025, 9:03 AM

Hi @Nagesh Bansal as requested. With PromQL I did something like

Copy code

max(last_over_time(httpcheck_status{health_score!="", health_score!='0', service_name="PASSPORT"}[5m]))

(or a version of this) Not knowing well the PromQL syntax I try to go with SQL.

Copy code

SELECT
    CAST(JSONExtractString(labels, 'health_score') AS Int32) AS health_score,
    JSONExtractString(labels, 'service_name') AS service_name,
    JSONExtractString(labels, 'deployment_environment') AS deployment_environment,
    JSONExtractString(labels, 'http_status_code') AS http_status_code,
    unix_milli,
    labels
FROM signoz_metrics.time_series_v4
WHERE
    metric_name = 'httpcheck_status'
    AND JSONExtractString(labels, 'service_name') = 'PASSPORT'
    AND JSONExtractString(labels, 'deployment_environment') = 'DEV08'
    AND unix_milli >= toUnixTimestamp(now() - INTERVAL 1 HOUR) * 1000
ORDER BY unix_milli DESC
LIMIT 10

I changed the

INTERVAL

to 1min or 30s second but I'm having no data. Like we saw, the collector send data every 30s as excepted, but in the

signoz_metrics.time_series_v4

table only data point rounded up to the hour are present (see my previous response for an exemple). So maybe it's perfectly normal and this table should get a datapoint every hour and there is a process that do that, but with the PromQL query it aims another table.

Gil

05/28/2025, 7:17 AM

Hi @Nagesh Bansal any news ?

Gil

05/28/2025, 7:49 AM

I think I have found something

Nagesh Bansal

05/30/2025, 7:01 AM

Hey @Gil sorry for delay

Nagesh Bansal

05/30/2025, 7:02 AM

I would recommend that you use the query-builder because when we try to upgrade the schema it might break your clickhouse queries

Nagesh Bansal

05/30/2025, 7:03 AM

But if still you want to use the clickhouse queries and want to understand which table to look for you could use this doc: https://signoz.io/docs/userguide/write-a-metrics-clickhouse-query/

Gil

06/02/2025, 8:52 AM

@Nagesh Bansal Thank you for your advice and the link. The issue I have found is this. From a previous discussion one of your colleague told me that the suffixed *_v2 tables are here for back compatibilities and not intended to be used from now. Like I show you, the collector send data every 30s as excepted but we found in *__v4 tables

unix__milli

to be rounded down to the last hour. Ex a ping ar 104130 would write at 100000 (in a unix manner).

Copy code

Row 1:
──────
env:            DEV08
temporality:    Cumulative
metric_name:    httpcheck_status
description:    1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:           1
type:           Sum
is_monotonic:   false
fingerprint:    2673516990800643293 -- 2.67 quintillion
unix_milli:     1748851200000 -- 1.75 trillion
labels:         {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"4xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
attrs:          {}
scope_attrs:    {}
resource_attrs: {}
__normalized:   true

Row 2:
──────
env:            DEV08
temporality:    Cumulative
metric_name:    httpcheck_status
description:    1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:           1
type:           Sum
is_monotonic:   false
fingerprint:    5068959940230499294 -- 5.07 quintillion
unix_milli:     1748851200000 -- 1.75 trillion
labels:         {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"1xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
attrs:          {}
scope_attrs:    {}
resource_attrs: {}
__normalized:   true

Row 3:
──────
env:            DEV08
temporality:    Cumulative
metric_name:    httpcheck_status
description:    1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:           1
type:           Sum
is_monotonic:   false
fingerprint:    7659024227477066828 -- 7.66 quintillion
unix_milli:     1748851200000 -- 1.75 trillion
labels:         {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"3xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
attrs:          {}
scope_attrs:    {}
resource_attrs: {}
__normalized:   true

Using PromQL I had some good result in the past but I had issue few week ago. Since I'm not familliar with the Query Builder (can you show me how to do it here ?) nor with the PromQL grammar, I tried with a simple SQL command from the dashboard. Then I try the adapted query on the *_v2 tables

Copy code

Row 1:
──────
metric_name:  httpcheck_duration
fingerprint:  15655094791583317282 -- 15.66 quintillion
timestamp_ms: 1748852108517 -- 1.75 trillion
labels:       {"__name__":"httpcheck_duration","__temporality__":"Unspecified","deployment_environment":"DEV08","host_name":"FRPAR3DXDEV08","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
temporality:  Unspecified
description:  Measures the duration of the HTTP check.
unit:         ms
type:         Gauge
is_monotonic: false

Row 2:
──────
metric_name:  httpcheck_status
fingerprint:  2673516990800643293 -- 2.67 quintillion
timestamp_ms: 1748852108517 -- 1.75 trillion
labels:       {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"4xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
temporality:  Cumulative
description:  1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:         1
type:         Sum
is_monotonic: false

Row 3:
──────
metric_name:  httpcheck_status
fingerprint:  5068959940230499294 -- 5.07 quintillion
timestamp_ms: 1748852108517 -- 1.75 trillion
labels:       {"__name__":"httpcheck_status","__temporality__":"Cumulative","deployment_environment":"DEV08","health_score":"3","host_name":"FRPAR3DXDEV08","http_method":"GET","http_status_class":"1xx","http_status_code":"200","http_url":"<https://xxx/healthcheck>","os_type":"linux","service_name":"PASSPORT"}
temporality:  Cumulative
description:  1 if the check resulted in status_code matching the status_class, otherwise 0.
unit:         1
type:         Sum
is_monotonic: false

Here the column's name is

timestamp_ms

but the data is right ! So I think there is something that happen, at list for the

recerivers/httpcheck

on your side on the *_v4 tables that round up the times

Gil

06/02/2025, 8:56 AM

I totally understand that the schema can change in the future. I'm more experienced in SQL that the others options and I don't find, but that's maybe me, how to use the Query Builder correctly

Open in Slack

Previous Next