AWS Lambda Logs via S3 - Forwarder Setup Guide

Overview

This documentation provides a detailed walkthrough on how to set up an AWS Lambda function to collect AWS Lambda logs stored in an AWS S3 bucket and forward them to SigNoz. By integrating with SigNoz, you can efficiently collect, monitor, and analyze your Lambda function logs for better insights into your serverless applications.

SigNoz Cloud users: You can also use the One-Click AWS Integration for automated setup with pre-built dashboards. See the comparison guide to choose the best approach.

OTel Collector Extension Layer: For a code-free approach to collect logs, use the OTel Collector Extension Layer.

Application Logs: If you want to send application logs from Node.js Lambda functions directly (to correlate with traces), refer to the AWS Lambda Node.js Guide.

Prerequisites

AWS account with administrative privilege.

Most steps are identical. To adapt this guide, update the endpoint and remove the ingestion key header as shown in Cloud → Self-Hosted.

Creating / Configuring your S3 bucket

To accomplish the task described, please follow these steps:

Creating an S3 Bucket:

Sign in to the AWS Management Console.
Navigate to the Amazon S3 service.
Click on Create bucket.
Enter a unique bucket name, select the region, and configure any additional settings if needed (such as versioning, logging, etc.).
Click Create bucket to finalize the creation process.

Uploading Data to the S3 Bucket:

See the AWS Lambda extensions blog post to know how to automatically collect Lambda logs in S3.

After creating the bucket, navigate to the S3 Management Console.
Click on the Upload button.
Select the files you wish to upload from your local machine.
Optionally, configure settings like encryption, permissions, etc.
Click Upload to upload the selected files to the bucket.

File Formats:

You can upload files in various formats such as .json, .csv, .log, .gz, .zip, etc.
If you have configured Lambda logging, Lambda logs are automatically saved in .log format in your S3 bucket.

To move forward, we assume that you already have some data in your S3 bucket.

For the scope of this documentation, we assume that all the data in S3 bucket is in the same format. For example, if one file is in .csv format (say), then all files within the bucket will be in .csv format. For files in different formats, you will have to use different parsing functions for each format or update the existing function accordingly.

The general header(table) format of Lambda logs is:

lambda_headers= ["time","type","record"]

There can be nested log data as described below. Modify the code below according to your needs to capture and send only the required info.

// a single lambda log record
{
  "time": "2020-11-12T14:55:06.780Z",
  "type": "platform.report",
  "record": {
    "requestId": "49e64413-fd42-47ef-b130-6fd16f30148d",
    "metrics": {
      "durationMs": 4.96,
      "billedDurationMs": 100,
      "memorySizeMB": 128,
      "maxMemoryUsedMB": 87,
      "initDurationMs": 792.41
    },
    "tracing": {
      "type": "X-Amzn-Trace-Id",
      "value": "Root=1-5fad4cc9-70259536495de84a2a6282cd;Parent=67286c49275ac0ad;Sampled=1"
    }
  }
}

Understanding how lambda function work

When you successfully attach your lambda function with the S3 bucket and configure it correctly, any new addition / deletion / copy / PUT etc, requests made to the S3 bucket will trigger the lambda function and the code written in the lambda function will get executed.

Creating a lambda function

The below attached screenshots are repurposed here (previously for load balancer logs), lambda function creation processes remains the same, adjust the name values according to your needs.

Follow these steps to create the lambda function:

Step 1: Go to your AWS console and search for AWS Lambda, go to Functions and click on Create Function.

*Create Lambda function from AWS Console*

Step 2: Choose the Author from scratch checkbox and proceed to fill in the function name.

*Choose the Author from scratch and fill up other details*

Step 3: Choose Python 3.x as the Runtime version, x86_64 as Architecture (preferably), and keep other settings as default. Select Create a new role with basic Lambda permissionsfor now, we’ll requiring more permissions down the lane. So for now, select this option.

Step 4: Once you are done configuring the lambda function, your Lambda function is created.

*Your barebones Lambda function is created now*

Configuring Policies for Lambda function

As said in Step 3 previously, we need extra permissions in order to access the S3 Bucket for execution of our Lambda code, follow along to set it up.

Step 1: Scroll down from your Lambda page, you’ll see a few tabs there. Go to Configurations and select Permissions from the left sidebar.

*Choosing execution role from Configurations tab*

Step 2: Click on the Execution Role name link just under Role name, it will take us to AWS IAM page. Here we will add policies to get full S3 access. Once here, click on the Add permissions button and select Attach policies from the drop down list.

*Attach policies to your Lambda function*

Step 3: Search “S3” and you’ll a policy named AmazonS3FullAccessselect that and proceed.

*Policies you'll need to run your Lambda function*

It's advisable to proceed with caution when granting full S3 access, particularly in a production environment. Before deploying your Lambda function with such extensive permissions, it's essential to consult with your system administrator or designated authority to ensure compliance with security protocols and best practices. This step helps mitigate potential risks and ensures that access permissions align with organizational guidelines and requirements.

*Extra policies you might require to execute your Lambda function*

Please refer to the image above as a comprehensive guide to the policy names that you may consider adding to your Lambda function. Failure to include these policies could result in insufficient privileges, potentially hindering the function's ability to perform necessary operations within the AWS environment.

Congrats, you are just done with one of the major hurdles in running your code. Now, let’s add a trigger.

Adding Triggers

You need to use the Lambda console to build a trigger so that your function can be called immediately by another AWS service (S3, in our case). A trigger is a resource you set up to enable your function to be called by another AWS service upon the occurrence of specific events or conditions.

A function may have more than one trigger. Every trigger functions as a client, independently calling your method, and Lambda transfers data from a single trigger to each event it passes to your function.

To setup the trigger, follow these steps:

Step 1: Click on the + Add trigger button from the Lambda console.

Step 2: Select S3 from the first drop down of AWS services list. Pick your S3 bucket for the second field.

Step 3: For the Event types field, you can select any number of options you wish. The trigger will occur depending upon what option(s) you choose here. By default, the All object create events will be selected.

*Choose event types from the drop down menu*

Verify the settings and click on Add button at bottom right to add this trigger.

Adding Request Layer

We will be using python's request module which is not included by default in Lambda.

To utilize Python's requests module within a Lambda function, it's necessary to explicitly add it as a layer. While there may be alternative approaches, it's advisable to adhere to established practices that have been thoroughly tested and proven effective. Therefore, we will proceed with adding requests as a layer to ensure reliable functionality within the Lambda environment.

Step 1: Follow the steps below to create a zip of the request module and add it as a layer to make it work on AWS lambda.

The commands you need to run:

# make a new directory
mkdir python
# move into that directory
cd python

# install requests module
pip install --target . requests
# zip the contents under the name dependencies.zip
zip -r dependencies.zip ../python

Step 2: To upload your zip file, go to AWS Lambda > Layers and click on Create Layer. [Not inside your specific Lambda function, just the landing page of AWS Lambda].

Step 3: You’ll be redirected to Layer configurations page, here, give a name to your layer, an optional description, select Upload a .zip file , click on Upload and locate the requirements.zip file.

Step 4: Select your desired architecture and pick Python 3.x as your runtime. Hit Create. Your layer has now been created. Now lets connect it to our Lambda function which we created to send logs to SigNoz.

*Uploading the requirements.zip file to later add it as a layer*

Step 5: Go to your Lambda function, scroll down to Layers section and on the right of it, you’ll find a button that says Add a layer to click on.

Step 6: Pick Custom layers from the checkbox and select your custom layer from the given drop down below and then click on the button Add.

Congratulations, the requests module has been successfully integrated into your code area. By adding this layer, you have resolved the request module not found error that would have otherwise occurred.

The Lambda Function

Now, we come to the pivotal section of this document: the code implementation.

The Python script's primary function revolves around retrieving log files stored within an Amazon S3 bucket and send data to a predetermined HTTP endpoint.

The provided code is functional, but exercise caution when copying and pasting it in its entirety. Incorrect configuration could result in the unintentional ingestion of a large volume of data.

Below is the comprehensive code along with detailed comments for clarity:

import boto3
import requests

# Create an S3 client
s3 = boto3.client('s3')

# Lambda function handler
def lambda_handler(event, context):
    # S3 bucket name
    bucket_name = '<name_of_your_bucket>'

    # List all objects in the specified S3 bucket
    response = s3.list_objects_v2(Bucket=bucket_name)

    # Iterate through each object in the bucket
    for obj in response['Contents']:
        # Check if the object is a gzipped log file
        if obj['Key'].endswith('.log'):
            file_key = obj['Key']

            # Get the file content
            file_obj = s3.get_object(Bucket=bucket_name, Key=file_key)
            file_content = file_obj['Body'].read()
            log_data = file_content.decode('utf-8').replace("'", '"')

            req_headers = {
                     'signoz-ingestion-key': '<your-ingestion-key>',
                     'Content-Type': 'application/json'
                }
            # Specify the HTTP endpoint for sending the data
            http_url = 'https://ingest.<region>.signoz.cloud:443/logs/json'  # Replace <region> with us, eu, or in

            # Send the JSON data to the specified HTTP endpoint
            response = requests.post(http_url, data=log_data, headers=req_headers)

            # Print information about the sent data and the response received
            print(f"Sent data to {http_url}. Response: {response.status_code}")

Replace <region> in the URL with your SigNoz Cloud region. You can find ingestion details (REGION and SIGNOZ_INGESTION_KEY) in the SigNoz dashboard.

Other than the above explanation and the code comments, in a nutshell, what this code does is:

Sends the parsable content of ENTIRE S3 bucket whenever the lambda function gets triggered. It gets triggered by the condition you set above. Let’s mention that again here.

Lets say you add something to your S3 bucket, it may / may not trigger this lambda function or if you have setup your s3 as if it automatically stores all your Lambda logs, segregated in different folders, so whenever any new log gets added, the function will get triggered and send all the S3 data.

This is obviously not what everyone expects, ideal case would be to have a mass log transfer once the first connection is made to SigNoz otel-collector (which then they later get stored in gp2/gp3 storageClass of EBS), and then send logs lines of only the recently logged one.

To achieve this functionality, you need to add few conditions to the code.

Assuming all standard log lines have a timestamp field.
Parse and select the timestamp field from the log line and add it before the response = requests.post(http_url, data=log_data, headers=req_headers) line as a if else condition to only send logs which are x days older (say 3 days).

So, the function now will first check the log timestamp and only send those logs which are 3 days older (say) or even a few hours old.

Let’s consider the below pseudo code for better understanding:

   from datetime import datetime, timedelta

   # Your given timestamp
   given_timestamp_str = "2024-01-01T23:58:02.231919Z"
   given_timestamp = datetime.fromisoformat(given_timestamp_str.replace('Z', '+00:00'))

   # Current time
   current_time = datetime.utcnow()

   # Calculate the time difference
   time_difference = current_time - given_timestamp

   # Check if the time difference is less than 3 days
   if time_difference < timedelta(days=3):
       # Run your specific function here
       print("Running the specific function.")
   # ADD THE response = requests.post(http_url, json=json_data) LINE HERE

   else:
       print("Time difference exceeds 3 days. Function will not run.")

Feel free to modify any part of the code according to your requirements.

Running the code locally

If you want to run the entire setup locally in your laptop for testing purposes. Here’s the reference code for you:

import os
import requests

def send_log(file_path, url):
    with open(file_path, 'r') as file:
        data = file.read()
        log_data = data.replace("'", '"')

    headers = {'Content-Type': 'application/json'}
    requests.post(url, data=log_data, headers=headers)

def main():
    root_folder = 'lambda'
    url = 'http://example.com/upload'  # Replace with your actual URL

    for root, _, files in os.walk(root_folder):
        for file in files:
            if file.endswith('.log'):
                file_path = os.path.join(root, file)
                send_log(file_path, url)

if __name__ == '__main__':
    main()

To incorporate log files into the folder, including nested directories, the code systematically examines all sub-folders to identify files ending with the .log extension. Should you wish to target different file types, you have the flexibility to modify the code accordingly.

Testing your Lambda function

Once you've finished writing your code, it's crucial to deploy it and conduct thorough testing to ensure its functionality. Before proceeding with deployment and testing, it's important to consider adjusting the timeout setting for your Lambda function. This adjustment is necessary because the process of transferring data from S3 to an external endpoint may take several minutes, exceeding the default Lambda timeout of 3 seconds.

To extend the timeout duration, follow these steps:

Navigate to the Lambda function configuration.
Access the "General Configuration" section.
Click on the "Edit" button to modify settings.
Increase the timeout value to a duration exceeding 10 minutes. Typically, the code execution completes within 1-4 minutes at most.

By adjusting the timeout setting, you ensure that your Lambda function has sufficient time to complete the data transfer process without encountering timeouts. This proactive measure enhances the reliability and effectiveness of your deployed solution.

*Configuring execution timeout of Lambda function*

Once you've finished adjusting the timeout setting, navigate to the code editor for your Lambda function. Locate the 'test' button, and from the dropdown menu, select the option labeled 'Configure test events.' Create a new test case by specifying it as an S3 PUT event, then save your configuration.

You're now prepared to proceed. Whenever you make alterations to the code and wish to evaluate them, follow these steps: Deploy the code first (equivalent to pressing 'Save'), and once it's fully deployed, proceed to click on the 'Test' button.

Test Case and Output

If the logs are sent successfully, here's how they'll be transmitted. The following output displays the JSON-formatted data as we've printed it to visualize the sent information.

*Visual representation of code output of Lambda function (SAMPLE ONLY)*

Validate

Upon accessing the SigNoz logs section, you will notice a considerable influx of logs. You have the option to seamlessly transition to live monitoring of logs as well. Simply click on any log line to view its detailed information.

*A sample log line of the logs sent from AWS Lambda (SAMPLE ONLY)*

Send your AWS Lambda logs to SigNoz

Overview

Prerequisites

Creating / Configuring your S3 bucket

Understanding how lambda function work

Creating a lambda function

Configuring Policies for Lambda function

Adding Triggers

Adding Request Layer

The Lambda Function

Running the code locally

Testing your Lambda function

Test Case and Output

Validate

Was this page helpful?

Is this page helpful?

Send your AWS Lambda logs to SigNoz

Overview

Prerequisites

Creating / Configuring your S3 bucket

Understanding how lambda function work

Creating a lambda function

Configuring Policies for Lambda function

Adding Triggers

Adding Request Layer

The Lambda Function

Running the code locally

Testing your Lambda function

Test Case and Output

Validate

Related Resources

Was this page helpful?

Is this page helpful?