Wasn't sure if it was best to post here or to open...
# support
k
Wasn't sure if it was best to post here or to open up a ticket. I am pulling in application request logs and need to get two figures, a total count of requests for a particular URL (MFA setup) and a count of unique users who triggered those. I'ver had no trouble getting the first metric. But the second metric is returning an over inflated number. Let's say user A requests the page 50 minutes ago, 30 minutes ago, and 5 minutes ago. When showing logs for the past 1 hour, the total request count is 3, but when grouping those requests by user_id, instead of showing 1 uniq user, it's showing 3. I think this is because of how the data is aggregated. Signoz seems to count distinct users within each aggregation segment and sums them. If the same user were to request the page 3 times within the same 5 minute segment, then it correctly shows 3 total requests and 1 unique user. But if their requests span multiple segments, they are counted multiple times. There does not appear in the Signoz UI to be a way currently to get a unique count over the entire period of time without aggregation, snd I'm not familiar enough with Clickhouse to write a custom query to get this data. Any idea how I might be able to get the data I'm looking for? For reference, attaching a screenshot of my current query.
s
I created an issue https://github.com/SigNoz/signoz/issues/6336. I will share a ClickHouse query later today.
k
@Srikanth Chekuri Thank you, that would be most appreciated 🙂
s
Please give this a try.
Copy code
SELECT
    countDistinct(attributes_string['user_id']) AS value,
FROM signoz_logs.distributed_logs_v2
WHERE 
    ((timestamp >= {{.start_timestamp_nano}})
    AND (timestamp <= {{.end_timestamp_nano}}))
    AND (ts_bucket_start >= intDiv({{.start_timestamp_ms}}, 1000 * 1800)) * 1800
    AND (ts_bucket_start <= intDiv({{.end_timestamp_ms}}, 1000 * 1800)) * 1800
    AND attributes_string['rails_controller'] = 'MfaController'
    AND attributes_string['rails_action'] = 'create'
    AND (resource_fingerprint GLOBAL IN (
        SELECT fingerprint
        FROM signoz_logs.distributed_logs_v2_resource
        WHERE (simpleJSONExtractString(labels, 'service.name') = $client) AND (simpleJSONExtractString(labels, 'deployment.environment') = $environment)
    ))
k
@Srikanth ChekuriHad to make some tweaks. Changed the last WHERE condition, instead of service.name
=
, it needed to be
IN
. And I needed to remove the two lines referring to ts_bucket_start. Then I was able to get a count working. What is ts_bucket_start? Is it vital?
Was able to simplify the query further, replacing the fingerprint query with checks on the resources_string column. What is the advantage of using fingerprint over resources_string?
Copy code
SELECT
    countDistinct(attributes_string['user_id']) AS value
FROM signoz_logs.distributed_logs_v2
WHERE 
    timestamp >= {{.start_timestamp_nano}}
    AND timestamp <= {{.end_timestamp_nano}}
    AND resources_string['service.name'] IN $client
    AND resources_string['deployment.environment'] IN $environment
    AND attributes_string['rails_controller'] = 'MfaController'
    AND attributes_string['rails_action'] = 'create'
The numbers I'm getting are in line with what our logs show and what DataDog currently says, so that is great news. So it looks like querying ts_bucket_start and fingerprint might not be needed.
s
The fingerprint part makes it faster by skipping unnecessary data. You might not have a lot of data, so you probably won't see any difference.