Hello <!everyone>, I'm Shuvam and I take care of d...
# general
s
Hello <!everyone>, I'm Shuvam and I take care of design at SigNoz. We're currently in the process of revamping a lot of the product flow and experiences and I'd love to hear feedback from you. If you have already set up your self-hosted instance or use SigNoz cloud, and are having trouble getting something done โŽฏ or have suggestions on how we can improve, I'm all ears
๐Ÿ‘ 12
๐Ÿ‘‹ 23
๐Ÿ‘ 13
t
Making the import of Grafana dashboards work (or creating standard SigNoz dashboards with good documentation for popular use cases) would be the most desired feature, as it is tedious to setup dashboards yourself and this is the main thing holding back SigNoz adaption from my POV.
๐Ÿ‘€ 2
โ˜๏ธ 5
s
Hello Shuvam! My main request is on the logs experience, I wish I had a โ€œDatadog-likeโ€ experience: โ€ข Left Sidebar with all the filters โ—ฆ Add filters in a search bar is not very intuitive โ—ฆ weirdly enough, you do it for
Traces
but not for
Logs
๐Ÿ™‚ โ€ข If filters are automatically detected from my logs, that would be great too
๐Ÿ‘ 5
๐Ÿ’ฏ 1
s
Hey @Thomas Lutz I hear you. We have been working on making getting started with dashboards easier and the team is actively working on this. You would start seeing improvements to having a set of good default dashboards shortly
๐Ÿ‘ 1
@Saad Bahir I have been exploring adding similar sidebar filters to logs to, but my main concern has been that the sidebar eats up space to actually see the logs. We're working on some layout tweaks to find where we can afford space and plug that in. Besides, what do you think lacks in the datadog experience that we can refine further for you?
d
It would be nice for query builder for logs and traces to support text based input for easily copy pasting queries to and from SigNoz.
๐Ÿ‘ 2
๐Ÿ‘€ 1
s
@Shuvam Manna thatโ€™s understandable but an option to โ€œhideโ€ the sidebar could do the trick for me ๐Ÿ˜„ Honestly thatโ€™s the only thing I miss from Datadog
๐Ÿ‘ 1
๐Ÿ‘€ 1
s
Let me see what I can do about that
๐Ÿ™Œ 1
But thanks a ton for the feedback
โœ… 1
d
Another thing that we encountered is long SigNoz url's. We have log based alerting configured to our slack channel which provides a link to the relevant query. The query can become big but the generated url even bigger to a point that the url is to big for a slack message. This could be just an "us" issue with to big queries, but we've never had this issue with our previous log tooling.
s
Another thing that we encountered is long SigNoz url's.
This should be too hard to fix. Let me check where we can plug this in our roadmap @Danik Raikhlin
๐Ÿ™ 2
d
maybe give alerts access to saved views? This would allow to quickly see in the ui what an alert sees
๐Ÿ‘€ 1
e
hey Shuvam, that's exciting to hear! I have a few pieces of feedback: Logs explorer search filter 1. I don't like how the return key tokenizes the query, then you have to press the "Stage & Run Query" button to actually search. I recently discovered cmd+return runs the query, but leaves the last filter untokenized if you didn't press return first. why not just have return tokenize+search? 2. if you press "Stage & Run Query" with an untokenized query it deletes it unrecoverably 3. if you delete a tokenized query (keyboard or the x button), it would be great if cmd+z could recover it Logs explorer "Log details" pane 1. I'm not able to quickly navigate between log lines when the details pane is open because you have to click out to close it, then click again on another line to open the details, waiting for the animation. it's not massive, but when trying to view the details of multiple lines, its not the fastest experience. my ideal UX would be if the lines remained clickable when the pane is open, have esc or the close button close the pane, and have the up/down arrow keys navigate between log lines 2. I also miss Mezmo's in-line field view (see screenshot). maybe this could be a setting to either open the details pane or an in-line view when clicking a log line, or a separate button on the log line?
๐Ÿ‘€ 1
m
Hey we're using signoz since almost a year at noah labs ๐Ÿ™‚ I think it's really nice I love the simplicity of the architecture since we need to maintain this on self hosted. Other open source vendors are way too more complicated and harder to maintain. Here's things I think is missing: โ€ข Ignore certain exception: โ—ฆ Sentry offers this, which is super cool. You can basically say this exception is fine, I don't mind seeing it. So it won't alert us, and won't also appear in "exception" page. Idk what's the most elegant way of doing it, but maybe through a filter based on the event? โ€ข Input based views? โ—ฆ Queries can gets too complicated sometime, for example i want to see the certain event of a user. so what i need to do, is to hardcode id of a user id, and then ask the person who's using singoz to replace the id with the wanted id, which is kinda annoying โ€ข Alerting system for "monitoring" purpose โ—ฆ If i don't get otel stuff for a certain service since quite sometime, then I can assume service is down. Being able to set up alerting for this would be very cool. Maybe better and elegant way of this is to integrate grafana. โ€ข Automatic anomaly detection โ—ฆ Quite some work, but can be really interesting.
๐Ÿ‘€ 1
j
+1 on datadog-esque log filters on the left sidebar, especially if they could be pregenerated for you. here are some really useful ones that datadog pregenerates and histograms for you (screensho). 90% of the time when Iโ€™m searching I really just want to drill into a specific namespace or service and having to handwrite this query every time is super annoying
๐Ÿ‘€ 1
I would also love if my log filters were โ€œstickierโ€ i find myself frequently having to rewrite the same log filters when i navigate away for a second. I know there are saved views but itโ€™s usually overkill when iโ€™m just trying to look at something for 10 mins
๐Ÿ‘€ 1
c
+100 on the datadog-esque filters. Imo the lack of these is kind of a blocker for us to migrate to signoz rn as we have that in our current platform (crowdstrike logscale)
๐Ÿ‘€ 1
some more feedback: 1. The log histogram in log search should be a stacked bar chart that groups by log level and makes the "error" portion of the bar chart red instead of treating every log message the same. Datadog and Loki have that. 2. When I search for "foo" in my logs (e.g. in the body) I want the "foo" substring to be highlighted in yellow in search results (again Datadog and Loki have that) 3. The formatting of structured logs in datadog is a lot nicer/readible. Hard to describe, but basically it uses the space more efficiently and better color coding for the keys and values of a log line.
โž• 1
๐Ÿ‘€ 1
4. Having a search history (personal and team) would be great. Honeycomb.io does a great job at that.
๐Ÿ‘€ 1
5. This is more than just design, but having some kind "log patterns" feature like Datadog or uptrace.dev has would be extremely valuable as well.
๐Ÿ‘€ 1
6. When I go to the line details, I want to be able to click an attribute and show the value distribution of that attribute across all logs in the current timeframe. (.ie. draw a chart that does a group by on that attribute).
๐Ÿ‘€ 1
d
somtimes we have quiet big traces; today one let the browser go out of memory; this prevents us from opening the span it self; would be nice to have at least an option to open the span (with maybe just the paretn and one child?)
๐Ÿ‘€ 1
p
I know this is unrelated to design, but thank you for the alert-to-email feature
๐Ÿ™Œ 2
s
Faced a lot of issues setting up winston logs with nodejs, so ended up using datadog because the setup wasnโ€™t straightforward with signoz
๐Ÿ‘€ 1
s
Thanks for all the pointers folks, let me go through them and get back ๐ŸคŸ
d
@Shuvam Manna the logs UI. It would be great if we can have something like cloudwatch and can see all the long lines of log also in the list view only. clicking on each log to see full message is painful while debugging
๐Ÿ‘€ 1
p
Don't use ClickHouse operator for helm chart. It's doesn't feel idiomatic. Complicates upgrading/uninstalling. Would make sense to use ClickHouse subchart to be more in line with how Helm works.
๐Ÿ‘€ 2
n
I'm quite new to Signoz so please take this with a grain of salt. I'm looking to see how we can create SLO's based on metric/trace data for availability and performance.
๐Ÿ‘€ 1
d
@Shuvam Manna Yesterday I've encountered couple of issues during upgrade of SigNoz (again) that gave me idea's for suggestions. We are using Helm to install SigNoz. Within the helm chart there is a condition that if its a fresh installation then it should run an init database job and if its an helm upgrade it should run an upgrade job. And I've encountered 2 issues with that approach. 1. We are dealing here with persistent storage. So we've ran into situation where we have already been running SigNoz and having an existing database in place but were forced to reinstall our helm chart from fresh. The init job will fail because it tries to create the data structure. So the only way out is to delete clickhouse pvc and lose all our telemetry data. So suggestion here is to not have the 2 types of jobs based on helm state but to merge them and auto detect if the database is already in place to either do an create or a migrate. 2. Second scenario is related to the first. Its the usage and dependency on Job's to have ran. Helm doesn't have a particular order in which it fires up deployments vs jobs. So the query service and collectors have an init container that checks if the job has ran. However we often encounter that the job wont run because the pods are in an init state, but the jobs stay in init state because they are waiting on the job. Chicken and Egg problem. But this behavior is random, sometimes job comes first and then everything is fine. But if pod comes first the installation will get stuck, and to recover from that is to manually scale down the pods to 0 replicas so the job will start and afterwards scale back up. So suggestion here is to think of other means than jobs to run database create/migrate OR have the check on the job not in a init container but part of the main container so the pod is in a Running state making sure that the Job will run.
๐Ÿ‘€ 2
s
@Danik Raikhlin I get the issue. I'd pass this feedback on to the team right away.