Do you have any case studies for migrating from Da...
# support
c
Do you have any case studies for migrating from Datadog to Signoz? I’m interested in migrating a Datadog setup which uses about 400K metrics/month , and around 300 synthetics. Datadog works but is pricy, so I’m interested in replacing Datadog to reduce costs. Signoz came up in my web searches
a
Currently migrating from Datadog to SigNoz for the same reasons
c
@Alexei Zenin Very nice! Would love to hear your experience (positive/negaive) with this exercise. Datadog costs seems to be a common motivator for looking for alternatives
a
Biggest headache so far was setting up clickhouse and everything else via Cloudformation (we run on ECS so could not use the Kubernetes stuff SigNoz has written)
I would say its not complete feature parity just yet and still some UX quirks to iron out but currently migrated a chunk of services and doing about 250k spans per hour and seems to work fine so far
c
Yikes, I hate CloudFormation. I used CF when spinning up EKS clusters, and it is a pain. But I’m not using ECS so that may be one hurdle I can avoid
a
Yeah took me a few weeks to iron out the kinks so would definitely advise going EKS if you can
c
What features does it lack compared to Datadog? Are these things that are “good enough” if you don’t have them?
a
At the moment no easy way to drill down on a per endpoint level in the service page for errors and browse certain traces. You could go to the traces tab though and filter on the specific service and endpoints though i think
I guess overall will be a learning curve to get used to new habits for debugging 😅
c
main things, despite the quirks, are you confident that this is “good enough” to replace DD for your usage?
a
At the moment its the best thing we found so far for an OpenTelemetry backend thats open source
c
what are your thoughts on ClickHouse? that is one thing I don’t have experience with
a
We wanted to limit the tools needed so went for SigNoz, Prometheus, Grafana
Biggest areas of concern are only the single node Clickhouse deployment option and the backend service which sends alerts being down. Both can only run with 1 instance atm to my understanding. I would say we aren't using SigNoz for everything, only for Traces (due to maturity of other tools at the moment)
c
Are you self-hosting everything (SigNoz, Prometheus, Grafana), or are you using any cloud-hosted versions of these services?
a
Self hosting most things. Grafana is simple to run on fargate so costs like 50 cents a day. Thinking of using managed AWS Prometheus to avoid needing to scale/operate that
c
Thanks for your answers, and good luck in your efforts. Maybe I’ll follow your footsteps!!
a
Thanks! Good luck as well with your migration definitely not an easy task
c
Well if it was easy, it wouldn’t be fun. 😉
a
@Craig Rodrigues
I’m interested in migrating a Datadog setup which uses about 400K metrics/month
400K metrics should be easy ... we have users using >1M metrics
Deploying to EKS should be pretty straightforward following https://signoz.io/docs/install/kubernetes/aws/
a
@Ankit Nayan ah yeah thanks. I forgot you could do that, was used to seeing everything in services page on number of errors per endpoint in Datadog
a
Biggest areas of concern are only the single node Clickhouse deployment option and the backend service which sends alerts being down. Both can only run with 1 instance atm to my understanding.
we are working to make both clickhouse and query service horizontally scalable. Should be out in a month or so. I didn't get the part
the backend service which sends alerts being down.
Would love to hear more about it. It would be great if you can open a github issue about it to track it publicaly
p
Biggest headache so far was setting up clickhouse and everything else via Cloudformation (we run on ECS so could not use the Kubernetes stuff SigNoz has written
@Alexei Zenin would you be able to share any snippets/docs on how ran in ECS/Cloudformation? We don't have official docs for it yet, but some members in the community may be interested in it
a
Yeah, talked to Jason and saw the open issue. Will try to find some time to share some templates. The collectors were the trickiest so thinking of sharing those first
Ankit I think your work will solve what I am describing. My point I was trying to make is that for high availabilty SigNoz would not provide a resilient setup if either Clickhouse or the query service goes down (since there is 1 of them). If Clickhouse goes down no ingestion or querying/alerting. If query service goes down no alerting for that time period.
j
I would love a copy of those ECS blueprints.