Send your VPC logs to SigNoz
Overview
This documentation provides a detailed walkthrough on how to set up an AWS Lambda function to collect Virtual Private Cloud (VPC) logs stored in an AWS S3 bucket and forward them to SigNoz. This will enable you to automatically sends your VPC logs to SigNoz, enabling you to visualize and monitor your application's load balancing performance and health.
Here’s a quick summary of what we’ll be doing in this detailed documentation.
- Creating / Configuring your S3 bucket
- Understanding how lambda function work
- Creating a lambda function
- The Lambda Function Code
- Visualize the logs in SigNoz
Prerequisites
- AWS account with administrative privilege.
Creating / Configuring your S3 bucket
To accomplish the task described, please follow these steps:
- Creating an S3 Bucket:
- Sign in to the AWS Management Console.
- Navigate to the Amazon S3 service.
- Click on Create bucket.
- Enter a unique bucket name, select the region, and configure any additional settings if needed (such as versioning, logging, etc.).
- Click Create bucket to finalize the creation process.
Uploading Data to the S3 Bucket:
- After creating the bucket, navigate to the S3 Management Console.
- Click on the Upload button.
- Select the files you wish to upload from your local machine.
- Optionally, configure settings like encryption, permissions, etc.
- Click Upload to upload the selected files to the bucket.
File Formats:
- You can upload files in various formats such as .json, .csv, .log, .gz, .zip, etc.
- If you have configured Virtual Private Cloud (VPC) logging, VPC logs are automatically saved in
.gzip
format in your S3 bucket.
Ensure that you have appropriate permissions and follow AWS best practices for security and cost optimization when creating and using S3 buckets. Refer to this link to know more about setting automatic VPC flowlogs collection to S3 bucket.
Please be advised that all logs will undergo conversion to JSON format before transmission. Consequently, it may be necessary to perform supplementary preprocessing of the logs as part of this conversion process. Here preprocessing of the logs means getting them from S3 bucket, separating each log line based on the delimiter ("," or based on quotes) and assigning them to respective keys. Thus making a key, value pair before sending to SigNoz.
To move forward, we assume that you already have some data in your S3 bucket.
For the scope of this documentation, we assume that all the data in S3 bucket is in the same format. For example, if one file is in .csv
format, then all files within the bucket will be in .csv
format. For files in different formats, you will have to use different parsing functions for each format or update the existing function accordingly.
The general header(table) format of VPC logs is:
vpc_headers= ["version","account-id","interface-id","srcaddr","dstaddr","srcport",
"dstport","protocol","packets","bytes","start","end","action","log-status"]
Note that these headers are just for name sake, you can change them if you wish to, but it is not advisable.
Understanding how lambda function work
When you successfully attach your lambda function with the S3 bucket and configure it correctly, any new addition / deletion / copy / PUT etc, requests made to the S3 bucket will trigger the lambda function and the code written in the lambda function will get executed.
Creating a lambda function
Follow these steps to create the lambda function:
Step 1: Go to your AWS console and search for AWS Lambda, go to Functions and click on Create Function.
Step 2: Choose the Author from scratch
checkbox and proceed to fill in the function name.
Step 3: Choose Python 3.x
as the Runtime version, x86_64
as Architecture (preferably), and keep other settings as default. Select Create a new role with basic Lambda permissions
for now, we’ll requiring more permissions down the lane. So for now, select this option.
Step 4: Once you are done configuring the lambda function, you Lambda function is created.
Configuring Policies for Lambda function
As said in Step 3 previously, we need extra permissions in order to access the S3 Bucket for execution of our Lambda code, follow along to set it up.
Step 1: Scroll down from your Lambda page, you’ll see a few tabs there. Go to Configurations
and select Permissions
from the left sidebar.
Step 2: Click on the Execution Role name
link just under Role name, it will take us to AWS IAM page. Here we will add policies to get full S3 access. Once here, click on the Add permissions
button and select Attach policies
from the drop down list.
Step 3: Search “S3” and you’ll a policy named AmazonS3FullAccess
select that and proceed.
It's advisable to proceed with caution when granting full S3 access, particularly in a production environment. Before deploying your Lambda function with such extensive permissions, it's essential to consult with your system administrator or designated authority to ensure compliance with security protocols and best practices. This step helps mitigate potential risks and ensures that access permissions align with organizational guidelines and requirements.
Please refer to the image above as a comprehensive guide to the policy names that you may consider adding to your Lambda function. Failure to include these policies could result in insufficient privileges, potentially hindering the function's ability to perform necessary operations within the AWS environment.
Congrats, you are just done with one of the major hurdle in running your code. Now, let’s add a trigger.
Adding Triggers
You need to use the Lambda console to build a trigger so that your function can be called immediately by another AWS service (S3, in our case). A trigger is a resource you set up to enable your function to be called by another AWS service upon the occurrence of specific events or conditions.
A function may have more than one trigger. Every trigger functions as a client, independently calling your method, and Lambda transfers data from a single trigger to each event it passes to your function.
To setup the trigger, follow these steps:
Step 1: Click on the + Add trigger
button from the Lambda console.
Step 2: Select S3
from the first drop down of AWS services list. Pick your S3 bucket for the second field.
Step 3: For the Event types field, you can select any number of options you wish. The trigger will occur depending upon what option(s) you choose here. By default, the All object create events
will be selected.
Verify the settings and click on Add
button at bottom right to add this trigger.
Adding Request Layer
We will be using python's request
module which is not included by default in Lambda.
To utilize Python's requests
module within a Lambda function, it's necessary to explicitly add it as a layer. While there may be alternative approaches, it's advisable to adhere to established practices that have been thoroughly tested and proven effective. Therefore, we will proceed with adding requests
as a layer to ensure reliable functionality within the Lambda environment.
Step 1: Follow the steps below to create a zip of the request module and add it as a layer to make it work on AWS lambda.
The commands you’d need:
# make a new directory
mkdir python
# move into that directory
cd python
# install requests module
pip install --target . requests
# zip the contents under the name dependencies.zip
zip -r dependencies.zip ../python
Step 2: To upload your zip file, go to AWS Lambda > Layers and click on Create Layer
. [Not inside your specific Lambda function, just the landing page of AWS Lambda].
Step 3: You’ll be redirected to Layer configurations page, here, give a name to your layer, an optional description, select Upload a .zip file
, click on Upload
and locate the requirements.zip file.
Step 4: Select your desired architecture and pick Python 3.x
as your runtime. Hit Create
. Your layer has now been created. Now lets connect it to our Lambda function which we created to send logs to SigNoz.
Step 5: Go to your Lambda function, scroll down to Layers section and on the right of it, you’ll find a button that says Add a layer
to click on.
Step 6: Pick Custom layers
from the checkbox and select your custom layer from the given drop down below and then click on the button Add
.
Congratulations, the requests
module has been successfully integrated into your code area. By adding this layer, you have resolved the request module not found error
that would have otherwise occurred.
The Lambda Function
Now, we come to the pivotal section of this document: the code implementation.
The Python script's primary function revolves around retrieving gzipped log files stored within an Amazon S3 bucket. Subsequently, it decompresses these files, transforms individual log entries into JSON objects, and transmits the resultant JSON data to a predetermined HTTP endpoint.
Below is the comprehensive code along with detailed comments for clarity:
import json
import gzip
import boto3
import requests
import shlex
# Create an S3 client
s3 = boto3.client('s3')
# Function to convert a log line into a JSON object
def convert_log_line_to_json(line):
# Define the headers to be used for the JSON keys
headers = ["version","account-id","interface-id","srcaddr","dstaddr","srcport","dstport","protocol","packets","bytes","start","end","action","log-status"]
# Split the log line using shell-like syntax (keeping quotes, etc.)
res = shlex.split(line, posix=False)
# Create a dictionary by zipping headers and log line parts
return dict(zip(headers, res))
# Lambda function handler
def lambda_handler(event, context):
# S3 bucket name
bucket_name = '<name_of_your_bucket>'
# List all objects in the specified S3 bucket
response = s3.list_objects_v2(Bucket=bucket_name)
# Iterate through each object in the bucket
for obj in response['Contents']:
# Check if the object is a gzipped log file
if obj['Key'].endswith('.log.gz'):
file_key = obj['Key']
# Download the gzipped file content
file_obj = s3.get_object(Bucket=bucket_name, Key=file_key)
file_content = file_obj['Body'].read()
# Decompress the gzipped content
decompressed_content = gzip.decompress(file_content)
# Convert bytes to string
json_data = str(decompressed_content, encoding='utf-8')
# Split the string into lines
lines = json_data.strip().split('\n')
# Convert the list of strings into a JSON-formatted string
result = json.dumps(lines, indent=2)
# Load the JSON-formatted string into a list of strings
list_of_strings = json.loads(result)
# Convert each log line string into a JSON object
json_data = [convert_log_line_to_json(line) for line in list_of_strings]
req_headers = {
'signoz-ingestion-key': '<SIGNOZ_INGESTION_KEY>',
'Content-Type': 'application/json'
}
# Specify the HTTP endpoint for sending the data
http_url = 'https://ingest.in.signoz.cloud:443/logs/json' # Replace with your actual URL
# Send the JSON data to the specified HTTP endpoint
response = requests.post(http_url, json=json_data, headers=req_headers)
# Print information about the sent data and the response received
print(f"Sent data to {http_url}. Response: {response.status_code}")
Here’s how a raw, unprocessed VPC log line looks like:
2 842928376651 eni-0660f9815e9e8d140 192.3.0.225 192.3.0.154 43276 6319 6 1 52 1701285848 1701215902 ACCEPT OK
In the code, each field corresponds to a header. The purpose of the code above is to transmit Virtual Private Cloud (VPC) logs to the SigNoz endpoint.
The provided code is functional, but exercise caution when copying and pasting it in its entirety. Incorrect configuration could result in the unintentional ingestion of a large volume of data.
Other than the above explanation and the code comments, in a nutshell, what the this code does is:
Sends the parsable content of ENTIRE S3 bucket whenever the lambda function gets triggered. It gets triggered by the condition you set above. Let’s mention that again here.
Step 3: For the Event types field, you can select any number of options you wish. The trigger will occur depending upon what option(s) you choose here. By default, the
All object create events
will be selected.
Lets say you add something to your S3 bucket, it may / may not trigger this lambda function or if you have setup your s3 as if it automatically stores all your ELB/VPC logs, segregated in different folders, so whenever any new log gets added, the function will get triggered and send all the S3 data.
This is obviously not what everyone expects, ideal case would be to have a mass log transfer once the first connection is made to SigNoz otel-collector (which then they later get stored in gp2/gp3 storageClass of EBS), and then send logs lines of only the recently logged one.
To achieve this functionality, you need to add few conditions to the code.
- Assuming all standard log lines have a timestamp field.
- Parse and select the timestamp field from the log line and add it before the
response = requests.post(http_url, json=json_data)
line as a if else condition to only send logs which are x days older (say 3 days).
So, the function now will first check the log timestamp and only send those logs which are 3 days older (say) or even a few hours old.
Let’s consider the below pseudo code for better understanding:
from datetime import datetime, timedelta
# Your given timestamp
given_timestamp_str = "2024-01-01T23:58:02.231919Z"
given_timestamp = datetime.fromisoformat(given_timestamp_str.replace('Z', '+00:00'))
# Current time
current_time = datetime.utcnow()
# Calculate the time difference
time_difference = current_time - given_timestamp
# Check if the time difference is less than 3 days
if time_difference < timedelta(days=3):
# Run your specific function here
print("Running the specific function.")
# ADD THE response = requests.post(http_url, json=json_data) LINE HERE
else:
print("Time difference exceeds 3 days. Function will not run.")
Feel free to modify any part of the code according to your requirements.
Running the code locally.
If you want to run the entire setup locally in your laptop for testing purposes. Here’s the reference code for you:
import os
import gzip
import json
import requests
import shlex
def convert_log_line_to_json(line):
headers= ["version","account-id","interface-id","srcaddr","dstaddr","srcport","dstport","protocol","packets","bytes","start","end","action","log-status"]
res = shlex.split(line, posix = False)
return dict(zip(headers, res))
def process_log_file(file_path):
with gzip.open(file_path, 'r') as f:
log_data = f.read().decode('utf-8')
lines = log_data.strip().split('\n')
result = json.dumps(lines, indent=2)
list_of_strings = json.loads(result)
json_data = [convert_log_line_to_json(line) for line in list_of_strings]
req_headers = {
'signoz-ingestion-key': '<SIGNOZ_INGESTION_KEY>',
'Content-Type': 'application/json'
}
http_url = 'https://ingest.in.signoz.cloud:443/logs/json' # Replace with your actual URL
response = requests.post(http_url, json=json_data, headers=req_headers)
def main():
root_folder = '<folder_name>'
for root, _, files in os.walk(root_folder):
for file in files:
if file.endswith('.log.gz'):
file_path = os.path.join(root, file)
process_log_file(file_path)
if __name__ == '__main__':
main()
To incorporate log files into the folder, including nested directories, the code systematically examines all sub-folders to identify files ending with the .log.gz
extension. Should you wish to target different file types, you have the flexibility to modify the code accordingly.
Testing your Lambda function
Once you've finished writing your code, it's crucial to deploy it and conduct thorough testing to ensure its functionality. Before proceeding with deployment and testing, it's important to consider adjusting the timeout setting for your Lambda function. This adjustment is necessary because the process of transferring data from S3 to an external endpoint may take several minutes, exceeding the default Lambda timeout of 3 seconds.
To extend the timeout duration, follow these steps:
- Navigate to the Lambda function configuration.
- Access the "General Configuration" section.
- Click on the "Edit" button to modify settings.
- Increase the timeout value to a duration exceeding 10 minutes. Typically, the code execution completes within 1-4 minutes at most.
By adjusting the timeout setting, you ensure that your Lambda function has sufficient time to complete the data transfer process without encountering timeouts. This proactive measure enhances the reliability and effectiveness of your deployed solution.
Once you've finished adjusting the timeout setting, navigate to the code editor for your Lambda function. Locate the 'test' button, and from the dropdown menu, select the option labeled 'Configure test events.' Create a new test case by specifying it as an S3 PUT event, then save your configuration.
You're now prepared to proceed. Whenever you make alterations to the code and wish to evaluate them, follow these steps: Deploy the code first (equivalent to pressing 'Save'), and once it's fully deployed, proceed to click on the 'Test' button.
Below is an image showing the process of transmitting VPC logs to the SigNoz endpoint.
Test Case and Output
If the logs are sent successfully, here's how they'll be transmitted. The following output displays the JSON-formatted data as we've printed it to visualize the sent information.
Visualize the logs in SigNoz
Upon accessing the SigNoz logs section, you will notice a considerable influx of logs. You have the option to seamlessly transition to live monitoring of logs as well. Simply click on any log line to view its detailed information.