You have nested JSON in your logs. When you query body.event.type = "user_action"
, it fails.
Here's why and how to fix it.
The Query That Should Work But Doesn't
Here's a log entry that breaks queries:
{
"body": {
"message": "API request processed",
"event.type": "user_action",
"kubernetes.pod.name": "api-server-123",
"user": {
"session": {
"id": "sess_abc123"
}
}
}
}
Now try these queries:
body.event.type = "user_action"
❌ Failsbody.kubernetes.pod.name = "api-server-123"
❌ Failsbody.user.session.id = "sess_abc123"
✅ Works
Why do the first two fail?
When you query body.event.type
, the system looks for nested structure: body → event → type
. But your data has a flat key "event.type"
. No nested event
object exists, so the query fails.
This dot notation ambiguity breaks log exploration workflows and forces users to remember which fields are "really nested" versus which ones just "look nested."
Building JSON Flattening to Solve the Ambiguity
JSON flattening fixes this. It transforms nested structures into flat, dot-notated keys:
// Before flattening
{
"user": {
"profile": {
"settings": {
"theme": "dark"
}
}
}
}
// After flattening
{
"user.profile.settings.theme": "dark"
}
Now body.user.profile.settings.theme = "dark"
always works.
How It Works in Practice
Enable JSON flattening in your JSON Parser processor with these options:
- Enable Flattening: Transform nested objects into dot-notation keys
- Enable Paths: Keep full path hierarchy
user.profile.settings.theme
vs. leaf names onlytheme
- Path Prefix: Add prefixes to avoid naming conflicts
- Enable Mapping: Automatically map common fields to standard attributes
Flatten specific data (Kubernetes metadata, app config, user sessions) while keeping other log parts unchanged. Flattening happens at ingestion time, so queries stay fast.
The Hidden Problem: Transformation Anti-Patterns
While building JSON flattening, we analyzed common log transformation patterns.
The Data Duplication Pattern
Here's a common transformation pattern we observed:
{
"body": {
"user": "alice",
"action": "login",
"session": {
"id": "sess_123",
"duration": 1847
},
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"ip": "192.168.1.100"
}
},
"attributes": {
"user": "alice",
"action": "login",
"session.id": "sess_123",
"session.duration": 1847,
"metadata.timestamp": "2024-01-15T10:30:00Z",
"metadata.ip": "192.168.1.100"
}
}
Every piece of data exists twice: in body
and attributes
. You're storing duplicate data.
Why This Happens
JSON Parser extracts data from body
to attributes
but doesn't remove the original. You extract for querying but forget to clean up, leaving duplicates.
This causes: • 2x storage costs • Slower queries from redundant data • Larger indexes • Poor compression
The Extract and Remove Pattern
Solution: extract data, then remove duplicates.
Step 1: Extract - Parse JSON and move queryable fields to attributes Step 2: Remove - Clean up the extracted data from the original body
Here's how it should look:
// After proper extract and remove
{
"body": "User login processed successfully", // Clean, minimal body
"attributes": {
"user": "alice",
"action": "login",
"session.id": "sess_123",
"session.duration": 1847,
"ip": "192.168.1.100"
}
}
Implementation in SigNoz
Use built-in remove operations to clean up after extraction:
processors:
- json_parser:
from: body
to: attributes
enable_flattening: true
- remove:
fields:
- body.user
- body.action
- body.session
- body.metadata
Result: body
keeps context, attributes
has queryable data.
What to extract & what to keep
Extract for querying:
- User identifiers (
user_id
,session_id
,request_id
) - Status indicators (
http.status_code
,error.type
) - Performance metrics (
duration
,response_time
) - Business dimensions (
product
,feature
,region
)
Keep in body for context:
- Stack traces and error details
- Full request/response payloads
- Debugging information
- Large text fields
Always remove after extracting:
- Don't duplicate data
- Use remove operations
- Body = context, attributes = queries
Field Naming and Organization
Use consistent naming patterns as they make queries predictable:
// Good: Consistent, hierarchical naming
{
"attributes": {
"http.method": "POST",
"http.status_code": 200,
"http.url": "/api/users",
"http.response_time_ms": 45,
"user.id": "user_123",
"user.role": "admin"
}
}
// Avoid: Inconsistent naming patterns
{
"attributes": {
"httpMethod": "POST", // Different convention
"status": 200, // Missing namespace
"request_url": "/api/users", // Mixed convention
"responseTime": 45, // No units
"userId": "user_123" // Different pattern
}
}
ClickHouse Optimization Tips
- Keep consistent data types for same fields
- Avoid high-cardinality fields (millions of unique values)
- Use similar attribute sets for similar log entries
Storage Impact Analysis
We analyzed the storage impact of different transformation patterns:
Proper transformation (extract + remove):
- Storage: Baseline
- Query performance: Optimal
- Index size: Minimal
Data duplication (extract only):
- Storage: 2x cost increase
- Query performance: 50% slower due to redundant data processing
- Index size: 2x larger
No transformation (raw logs only):
- Storage: Baseline but poor queryability
- Query performance: Very slow without indexed attributes
- Index size: Minimal but ineffective
The difference between proper transformation and data duplication can literally double your storage bill.
Performance Impact
- CPU: Less than 5% increase in processing time
- Memory: Proportional to nesting depth (typically less than 10MB)
- Throughput: No measurable impact on log ingestion rates
Implementation Guidelines
When implementing JSON flattening and data transformation:
- Start with selective flattening - Target specific paths like
kubernetes.*
orapplication.*
- Always use remove operations - Clean up extracted data to prevent duplication
- Establish naming conventions - Use consistent patterns like
service.field
ornamespace.field
- Monitor storage impact - Track attribute counts and storage usage after transformation changes
- Test query patterns - Ensure flattened fields work with your common queries
Conclusion
JSON flattening fixes dot notation query failures. The bigger win is using extract-and-remove patterns to cut storage costs and speed up queries.
JSON flattening is available now in SigNoz. Check out our changelog for details and join our community to discuss log transformation strategies.