How Generative AI co Wombo provides great experience to its 5mn MAU using SigNoz for Observability
I sat down with Abhinav Ramana, Sr. Software Engineer at Wombo to understand how they use SigNoz at Wombo. Abhinav is an experienced software engineer with 7+ yrs of experience working at companies like LinkedIn and Google. Here are a few snippets from our conversation (edited for legibility)
Can you share what Wombo does? What are your key products?
Wombo, at its core, creates AI-generated content for the mass public. We make AI-generated content that is easy to use rather than very technical so that everyday users can play with it, have fun, and create content for their needs.
We have two key products:
- Wombo Song - It is a generative video platform where users can upload images of their faces, choose a song, and a video is generated that looks like the users are singing that song.
- Dream - It is an AI-generated art product similar to Stable Diffusion but much simpler to use. You can enter prompts and choose styles to create unique art that you can share with others.
How is the engineering team at Wombo organized?
Ours is a small team. We have five back-end engineers, and they also work as data engineers.
We handle everything from the coding to reviewing, to deployment, debugging, and ensuring the system's health.
What business problems were you trying to solve when you started exploring SigNoz?
As a product, we want to get state-of-the-art features around AI content generation to our users fast and always keep building new features so that users are engaged with the platform. They should like to spend time on the platform and find it helpful. So, the user experience of the product is super important for us.
To give you an idea of scale, we have over 5 million monthly active users. We were the number one app in the US a few times, and we already have over a billion like artifacts generated by users. We needed a system that allows us to get an understanding of what's happening in our tech infrastructure so that we can get a better understanding of user experience.
For example, how fast is the product in users' experience, and if some issues happen - how can we quickly find out what's happening and debug it?
Being a small team we don't have dedicated test engineers, and huge number of automated integration tests. So we mostly test core flows and the changes that we believe are relevant. This leads to gaps for unexpected failures when we deploy to production.
Our observability system should be good enough to address any issues quickly. This helps us be fast, and ship features to users faster.
So what we wanted first, since we are developing new features all the time, both from the application perspective and the AI model perspective, we wanted to deploy as soon as possible and in general, see whether we are breaking anything. Hence monitoring was super essential for us.
The most critical thing for me was safe deployments and the system's health because we want to know immediately if something has gone wrong so we can take corrective actions as soon as possible.
What are the key use cases you had for SigNoz?
The key metric was the image generation time. But other than that, finding the logs across multiple containers is also crucial.
Although we are not completely service-oriented, we still have celery queues. For example, we have machine learning AI models which run on NVIDIA cards on AWS. They are separate containers from the API that the front end, like mobile or web-app, talks to get the response. So, getting a complete view there was significant.
From an end user's perspective, we would like to see a person's life cycle.
How that person interacted? So simply like searching, let's say the user ID in some search bar and seeing, okay, this user went through these logs in the back end, and he went through these logs. So this is what happened to him in the AI, and this is where the problem happened.
Logs were important, along with exceptions. So, for example, immediately when we deploy a new release if we see this is a recent exception that we never saw or didn't expect. Why is this happening? The ability to step through traces, what was the SQL statement that was called there.
Oh, this is stepping through all those calls. Or, okay, I'm hitting graph DB. So why is it taking time, or is it the SQL taking time? Is it the cache taking time?
We have multiple data sources — the ability to see which is taking time and why is crucial for us.
How did SigNoz compare to other solutions you evaluated?
We tried a few other tools, like DataDog.
The setup time for SigNoz was low. I got the basic traces and exceptions setup working within one and a half days, and I could see these data for our development environment.
For the other tools, it took about two weeks. While with SigNoz, I could at least get to that stage in a single day without any help. Just by reading the docs and following the steps.
Even the AWS setup was easy. I never had to use or go to my website actually to do anything. Following the instructions and copy-pasting pretty much worked.
Whereas for DataDog, you have to go to either AWS CLI or AWS console and manually change some very specific things. This was pretty cumbersome in our experience.
The fact that I got SigNoz running in a single day, at least the traces and exceptions part, was huge.
Why does the team at Wombo prefer open-source software?
It's very simple to find solutions to problems without needing to depend on contacting someone in a company. There's just a bigger community so that we will find answers quicker, and more people will look at whatever issues we create.
There will be community contributors and dedicated contributors, which makes getting help from the community much more straightforward. We also like contributing back to the open source community and engaging with them; hence, it's a win-win.
Any advice for new teams setting up their observability systems?
The topmost criterion is, does the tool have all the essential features? Logs, traces and metrics , does it have all those things? The second thing is how easy it is to set up and maintain. Even if setup takes a bit of time, it should be easy to maintain.
So every time you change something in your infrastructure, do you also have to change something in your observability platform? And even if you have to, how easy or how hard is it to make the required changes?
The top two things are the completeness of the feature set and less maintenance.
Thank you for taking out the time to read this case study. If you have any feedback or want to share your story using SigNoz, please feel free to reach out to
[email protected] with
Case Study as subject.
Sharing stories of how different teams are using SigNoz helps the community in learning different use cases and problems SigNoz can solve and also showcases how you are solving issues in a unique way.
Feel free to join our slack community and say hi! 👋