Darren Smith
08/08/2024, 4:00 PMDarren Smith
08/08/2024, 5:32 PMDarren Smith
08/08/2024, 5:33 PMDarren Smith
08/08/2024, 5:35 PMDarren Smith
08/08/2024, 5:35 PMDarren Smith
08/09/2024, 2:55 PMtransform:
error_mode: ignore
metric_statements:
- context: metric
conditions:
- IsString(resource.attributes["service.name"])
statements:
- set(name,Concat(["aws", ConvertCase(resource.attributes["service.name"],"lower"), ConvertCase(name,"lower")], "."))
Darren Smith
08/09/2024, 2:56 PMaws_ecs_cpuutilization
Darren Smith
08/09/2024, 2:56 PMDarren Smith
08/09/2024, 2:57 PMDarren Smith
08/09/2024, 2:57 PMDarren Smith
08/09/2024, 2:57 PMAPI responded with 400 - error in builder queries status: error, errors: {}
Darren Smith
08/09/2024, 2:58 PMDarren Smith
08/09/2024, 2:58 PMSrikanth Chekuri
08/12/2024, 2:52 AM"CPUUtilization" for example is a metric unit of Percent and type of Summary looking at the time_series_v4 database table. We also then get things like CPUUtilization_Sum (type sum) and CPUUtilization_Count (type sum).We don't support
Summary
metric type because it doesn't make sense to aggregate summary metrics.
If i take an example - this is a ECS Service for a specific cluster and specific service - as average on 1 minute period in AWS. so peaks at 17:49 at 16.66%
Whereas visualising the same data in Signoz - shows a spike at 17:53 not 17:49 and it's only 10.5%It is well known that consuming metrics from cloud providers come with a delay of 5minutes - 10 minutes
Darren Smith
08/12/2024, 6:59 AMSrikanth Chekuri
08/12/2024, 2:44 PMDarren Smith
08/14/2024, 7:42 AMfunc addSingleSummaryDataPoint(pt pmetric.SummaryDataPoint, resource pcommon.Resource, metric pmetric.Metric, namespace string,
tsMap map[string]*prompb.TimeSeries, externalLabels map[string]string) {
time := convertTimeStamp(pt.Timestamp())
// sum and count of the summary should append suffix to baseName
baseName := getPromMetricName(metric, namespace)
// treat sum as a sample in an individual TimeSeries
sum := &prompb.Sample{
Value: pt.Sum(),
Timestamp: time,
}
if pt.Flags().NoRecordedValue() {
sum.Value = math.Float64frombits(value.StaleNaN)
}
sumlabels := createAttributes(resource, pt.Attributes(), externalLabels, nameStr, baseName+sumStr)
addSample(tsMap, sum, sumlabels, metric)
// treat count as a sample in an individual TimeSeries
count := &prompb.Sample{
Value: float64(pt.Count()),
Timestamp: time,
}
if pt.Flags().NoRecordedValue() {
count.Value = math.Float64frombits(value.StaleNaN)
}
countlabels := createAttributes(resource, pt.Attributes(), externalLabels, nameStr, baseName+countStr)
addSample(tsMap, count, countlabels, metric)
// process each percentile/quantile
for i := 0; i < pt.QuantileValues().Len(); i++ {
qt := pt.QuantileValues().At(i)
quantile := &prompb.Sample{
Value: qt.Value(),
Timestamp: time,
}
if pt.Flags().NoRecordedValue() {
quantile.Value = math.Float64frombits(value.StaleNaN)
}
percentileStr := strconv.FormatFloat(qt.Quantile(), 'f', -1, 64)
qtlabels := createAttributes(resource, pt.Attributes(), externalLabels, nameStr, baseName, quantileStr, percentileStr)
addSample(tsMap, quantile, qtlabels, metric)
}
}
So it takes the incoming Summary metric and creates 2 metrics called _sum and _count - that is what i was seeing but couldn't find out what was doing it to start with.
You're then creating metrics with the same name but with different quantile={0|1} type labels too.
The problem is they're stored as type: Sum (is_monotonic: false) - which should mean they act like a gauge right?
But the signoz UI is forcing it as a counter with Rate / whatever functions.
IF i manually change them to gauges in the DB just with a query:
ALTER TABLE time_series_v4_6hrs UPDATE type = 'Gauge' WHERE metric_name LIKE 'aws_%' AND type = 'Sum';
Then the UI works properly.
I'm going to experiment with the metrics processor to try and split them out into Gauges - but wanted to update ^ as it looks like its the clickhouse exporter that may need a tweak? or your ui?Darren Smith
08/14/2024, 6:12 PMAWS (metric-stream) -> [fire-hose] --> [AWS-ELB] --> firehose_receiver:[custom-collector] --> oltp(http) --> [signoz-collectors] --> [clickhouse]
I will probably extend it to allow us to remove the original summary metric and/or also add _sum, _count and quantile values - again as gauges for the last ones.
If you get a chance - i'd like your view on if this is nuts and overkill - or if signoz should treat non monotolic Sum's as gauges internally.Srikanth Chekuri
08/14/2024, 6:46 PMSELECT DISTINCT
temporality,
metric_name,
type,
is_monotonic
FROM signoz_metrics.time_series_v4
Darren Smith
08/14/2024, 7:07 PM┌─temporality─┬─metric_name──────────────────┬─type────┬─is_monotonic─┐
1. │ Unspecified │ aws_ecs_cpuutilization │ Summary │ false │ <-- This is the original metric from firehose, that's split into the two quantile metrics as per your clickhouseexporter code
2. │ Unspecified │ aws_ecs_cpuutilization_count │ Sum │ false │ <-- This is the new count metric it adds in the clickhouseexporter code.
3. │ Unspecified │ aws_ecs_cpuutilization_sum │ Sum │ false │ <-- This is the new sum metric it adds in the clickhouseexporter code.
└─────────────┴──────────────────────────────┴─────────┴──────────────┘
┌─temporality─┬─metric_name────────────────┬─type──┬─is_monotonic─┐
4. │ Unspecified │ aws_ecs_cpuutilization_avg │ Gauge │ false │ <-- This is my NEW metric that i'm generating as part of my new processor.
└─────────────┴────────────────────────────┴───────┴──────────────┘
Darren Smith
08/14/2024, 7:08 PMDarren Smith
08/14/2024, 7:08 PMDarren Smith
08/14/2024, 7:08 PMDarren Smith
08/14/2024, 7:11 PMSrikanth Chekuri
08/16/2024, 11:56 AMDarren Smith
08/16/2024, 11:57 AMSrikanth Chekuri
08/16/2024, 11:57 AMDarren Smith
08/16/2024, 11:57 AMSrikanth Chekuri
08/16/2024, 12:00 PMSum
and Count
. The source must clarify if they are a cumulative value or delta value. Exporter uses whatever it receives.Darren Smith
08/16/2024, 12:01 PMDarren Smith
08/16/2024, 12:01 PMDarren Smith
08/16/2024, 12:01 PMSrikanth Chekuri
08/16/2024, 12:03 PMDarren Smith
08/16/2024, 12:53 PM