How Cedana Leverages SigNoz to Ensure Uptime in Real-Time Compute Operations
About Cedana
Cedana, a YCombinator backed startup, offers an automated system for saving, migrating, and resuming compute workloads to ensure operational reliability during hardware failures. By enabling seamless recovery and continuity, Cedana optimizes real-time computational tasks across various environments. To ensure operational stability and continuous performance, Cedana uses SigNoz to monitor its infrastructure comprehensively.
The Challenge: Monitoring a Low-Level, Critical Infrastructure
Cedana operates at a low level in the computing stack, meaning any errors or failures could have significant impacts on customer operations. Ensuring the reliability of their daemon processes across multiple nodes, as well as tracking performance metrics, became a top priority.
"We work at such a low level that anything can go wrong, and system reliability is key. Monitoring helps us track what happens when things fail, which is crucial for our platform’s success."
— Niranjan Ravichandra, Co-founder & CTO, Cedana
Cedana needed an observability solution that was easy to set up, could provide in-depth metrics on their daemons, and would integrate seamlessly with their existing infrastructure.
Why SigNoz?
"Getting started with SigNoz was incredibly easy. We were able to set up the OpenTelemetry collector quickly and start monitoring our systems almost immediately. The platform’s ease of use and fast setup allowed us to focus more on optimizing our infrastructure without any hassle."
— Niranjan Ravichandra, Co-founder & CTO, Cedana
When looking for an observability platform, Cedana chose SigNoz for several important reasons:
Easy Setup with OpenTelemetry:
Niranjan discovered SigNoz while transitioning from BetterStack to OpenTelemetry. The ease of setting up SigNoz with OpenTelemetry allowed Cedana to quickly instrument their system and start collecting logs, metrics, and traces with minimal overhead.
Superior Performance Monitoring:
SigNoz provided the deep visibility Cedana needed to monitor the behavior and performance of their daemons across customer nodes. They could track failures, monitor uptime, and analyze performance metrics effectively using SigNoz's APM.
Custom Data Analysis with ClickHouse Integration:
SigNoz’s backend, powered by ClickHouse, allowed Cedana to export performance data to Jupyter notebooks for advanced analysis. This enabled the team to generate custom reports for customers and gain deeper insights into how their product was being used across hundreds of nodes.
Implementation: Observability Across Production and Staging Environments
Cedana implemented SigNoz across both production and staging environments to ensure consistent monitoring throughout their development and deployment pipelines. This allowed them to catch potential issues early in staging and optimize their system before pushing updates to production.
"We use SigNoz in both production and staging environments to ensure that we can monitor performance and catch issues early. This helps us keep our infrastructure stable across all environments."
— Niranjan Ravichandra, Co-founder & CTO, Cedana
Real-Time Insights and Custom Analysis of Collected Data
SigNoz’s distributed tracing and APM allowed Cedana to track the behavior of their GRPC calls and control plane connections, ensuring seamless communication between their nodes. Additionally, the ability to collect logs and set up alerts helped Cedana ensure system uptime and detect failures early.
One of the most powerful use cases for Cedana was the ability to export performance data from SigNoz and analyze it in custom environments like Jupyter notebooks. This allowed them to dig deeper into their data, identify optimization opportunities, and create detailed reports for customers.
"We pull data into Jupyter notebooks from SigNoz’s ClickHouse backend to analyze performance trends across hundreds of nodes. This has been invaluable for optimizing our infrastructure and making business decisions."
— Niranjan Ravichandra, Co-founder & CTO, Cedana
Looking Ahead - More detailed dashboards
SigNoz has become an integral part of Cedana’s observability stack, providing real-time insights, performance monitoring, and deep traceability across its complex infrastructure.
Currently using SigNoz for tracing and logs, Cedana aims to set up more detailed dashboards to enhance real-time visibility into performance metrics. As Cedana scales, they plan to leverage SigNoz’s capabilities to monitor resource utilization across hundreds of nodes, ensuring optimal performance for their compute workloads.
With its easy integration, powerful data analysis features, and robust alerting capabilities, SigNoz has empowered Cedana to ensure the reliability and performance of its platform as it scales.
"SigNoz has been great for us—easy to set up, powerful, and flexible. It’s helping us optimize performance and ensure reliability as we grow."
— Niranjan Ravichandra, Co-founder & CTO, Cedana
SigNoz cloud is the easiest way to run SigNoz. You can sign up here for a free account and get 30 days of unlimited access to all features.