Tracing in GKE

Overview

Google Kubernetes Engine (GKE) is a managed Kubernetes service provided by Google that simplifies the deployment, management, and operation of Kubernetes clusters. This guide will help you send traces from your GKE cluster to SigNoz.

Prerequisites

  • A GKE cluster
  • kubectl installed to access the GKE cluster
  • Helm installed
  • A SigNoz Cloud Account For this demonstration, we'll be using SigNoz Cloud. You'll need your Ingestion Key and Ingestion URL, which can be found by signing in to your SigNoz Cloud account and navigating to Settings > Ingestion Settings.

Quick Start

Before we begin, let's verify their cluster status with the following commands:

kubectl get nodes

Make sure all the nodes are in Readystate.

GKE Cluster Status

GKE Cluster Status

Step 1: Add the SigNoz helm repo using this command

helm repo add SigNoz https://charts.signoz.io

Step 2: Install OTel Agent and Deployment using signoz/k8s-infra Helm chart

helm install -n signoz kubelet-otel signoz/k8s-infra \\
--set signozApiKey=<ingestionKey> --set otelCollectorEndpoint="ingest.<region>.signoz.cloud:443" --set OtelInsecure=false --create-namespace

After applying the above commands, check whether the signoz namespace pods are running successfully or not:

Resources Status

Resources Status

The OTel collector configurations can be found in Ote's config map as follows:


apiVersion: v1
kind: ConfigMap
  name: kubelet-otel-k8s-infra-otel-agent
data:
  otel-agent-config.yaml: |2-

    exporters:
      otlp:
        endpoint: "ingest.{region}.signoz.cloud:443"
        headers:
          signoz-access-token: "<SIGNOZ_INGESTION_KEY>"
        tls:
          insecure: false
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      pprof:
        endpoint: localhost:1777
      zpages:
        endpoint: localhost:55679
    processors:
      batch:
        send_batch_size: 10000
        timeout: 200ms
      k8sattributes:
        extract:
          metadata:
          - k8s.namespace.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
          - k8s.deployment.name
          - k8s.node.name
        filter:
          node_from_env_var: K8S_NODE_NAME
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
      resourcedetection:
        detectors:
        - system
        override: true
        system:
          hostname_sources:
          - dns
          - os
        timeout: 2s
      resourcedetection/internal:
        detectors:
        - env
        override: true
        timeout: 2s
    receivers:
      filelog/k8s:
        exclude:
        - /var/log/pods/signoz_kubelet-otel*-signoz-*/*/*.log
        - /var/log/pods/signoz_kubelet-otel*-k8s-infra-*/*/*.log
        - /var/log/pods/kube-system_*/*/*.log
        - /var/log/pods/*_hotrod*_*/*/*.log
        - /var/log/pods/*_locust*_*/*/*.log
        include:
        - /var/log/pods/*/*/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - id: get-format
          routes:
          - expr: body matches "^\\{"
            output: parser-docker
          - expr: body matches "^[^ Z]+ "
            output: parser-crio
          - expr: body matches "^[^ Z]+Z"
            output: parser-containerd
          type: router
        - id: parser-crio
          output: extract_metadata_from_filepath
          regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
          timestamp:
            layout: "2006-01-02T15:04:05.000000000-07:00"
            layout_type: gotime
            parse_from: attributes.time
          type: regex_parser
        - id: parser-containerd
          output: extract_metadata_from_filepath
          regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: attributes.time
          type: regex_parser
        - id: parser-docker
          output: extract_metadata_from_filepath
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: attributes.time
          type: json_parser
        - id: extract_metadata_from_filepath
          output: add_cluster_name
          parse_from: attributes["log.file.path"]
          regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
          type: regex_parser
        - field: resource["k8s.cluster.name"]
          id: add_cluster_name
          output: move_stream
          type: add
          value: EXPR(env("K8S_CLUSTER_NAME"))
        - from: attributes.stream
          id: move_stream
          output: move_container_name
          to: attributes["log.iostream"]
          type: move
        - from: attributes.container_name
          id: move_container_name
          output: move_namespace
          to: resource["k8s.container.name"]
          type: move
        - from: attributes.namespace
          id: move_namespace
          output: move_pod_name
          to: resource["k8s.namespace.name"]
          type: move
        - from: attributes.pod_name
          id: move_pod_name
          output: move_restart_count
          to: resource["k8s.pod.name"]
          type: move
        - from: attributes.restart_count
          id: move_restart_count
          output: move_uid
          to: resource["k8s.container.restart_count"]
          type: move
        - from: attributes.uid
          id: move_uid
          output: move_log
          to: resource["k8s.pod.uid"]
          type: move
        - from: attributes.log
          id: move_log
          to: body
          type: move
        start_at: beginning
      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu: {}
          disk: {}
          filesystem: {}
          load: {}
          memory: {}
          network: {}
      kubeletstats:
        auth_type: serviceAccount
        collection_interval: 30s
        endpoint: ${K8S_HOST_IP}:10250
        extra_metadata_labels:
        - container.id
        - k8s.volume.type
        insecure_skip_verify: true
        metric_groups:
        - container
        - pod
        - node
        - volume
      otlp:
        protocols:
    service:
      extensions:
      - health_check
      - zpages
      - pprof
      pipelines:
        logs:
          exporters:
          - otlp
          processors:
          - k8sattributes
          - batch
          receivers:
          - otlp
          - filelog/k8s
        metrics:
          exporters:
          - otlp
          processors:
          - k8sattributes
          - batch
          receivers:
          - otlp
        metrics/internal:
          exporters:
          - otlp
          processors:
          - resourcedetection/internal
          - resourcedetection
          - k8sattributes
          - batch
          receivers:
          - hostmetrics
          - kubeletstats
        traces:
          exporters:
          - otlp
          processors:
          - k8sattributes
          - batch
          receivers:
          - otlp
      telemetry:
        logs:
          encoding: json
        metrics:
          address: 0.0.0.0:8888

This should start sending signals to SigNoz.

eBPF Tracing

There are solution to collect metrics and traces without modifying the application code. These solutions come under the category of eBPF Tracing. These solutions are relatively new and are still in the early stages of development.

However, there are some open source projects that export metrics and traces to OpenTelemetry.

📝 Note

These solutions may not be suitable for all use cases, and are still may not be production-ready. It is recommended to evaluate solutions and choose the one that best fits your needs.

In this guide will deploy Pixie in the Kubernetes cluster to fetch traces from the cluster.

Step 1: Deploy Pixie in the Kubernetes cluster to fetch traces from the cluster.

For eBPF tracing we can use Pixie, which can be configured using the following instructions in this Documentation.

Step 2: Use a custom Data retention script in the OpenTelemetry Pixie plugin to send the data to the SigNoz.

You can refer to this GitHub repository for the different custom scripts for the different use cases.

Use the appropriate export URL in the script. The export URL will receive all the traces captured by Pixie.

url='otel-collector.kubelet-otel.svc.cluster.local:4317',
insecure=True
Pixie Configuration

Pixie Configuration

Step 3: To test the configuration, deploy a sample application in the cluster to view its traces in the SigNoz.

kubectl run -n signoz my-hello –image=nginx –port=80

Visualize the Traces in SigNoz Cloud

To visualize the traces log into the SigNoz account and navigate to the traces section.

Traces in SigNoz Dashboard

Traces in SigNoz Dashboard

APM and Distributed Tracing

For application-level tracing, you can use the OpenTelemetry SDKs integrated with your application. These SDKs automatically collect and forward traces to the central collector.

Please refer to our SigNoz Tutorials or Blog to find information on how to instrument your application like Spring, FastAPI, NextJS, Langchain, Node.js, Flask, Django, etc.

Sample Python Application

We will use a sample flask app. One may need to add packages for enabling instrumentation.

requirements.txt

Flask==3.0.0 
pymongo==3.12.1 
requests==2.26.0
opentelemetry-api==1.22.0 
opentelemetry-distro==0.43b0 
opentelemetry-instrumentation==0.43b0
opentelemetry-exporter-otlp==1.22.0

Install those packages in dockerfile configuration.


# install dependencies
RUN pip install -r requirements.txt
RUN opentelemetry-bootstrap --action=install

Example YAML

For running this sample-flask-app as pod

sample-flask-app.yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    app: sample-flask-app
  name: sample-flask-app
spec:
  containers:
  - image: mongo:latest
    name: mongo
    ports:
    - name: mongo
      containerPort: 27017
  - image: signoz/sample-flask-app:latest
    name: sample-app
    ports:
    - name: flask
      containerPort: 5002
    env:
    - name: MONGO_HOST
      value: localhost
    - name: OTEL_RESOURCE_ATTRIBUTES
      value: service.name=sample-app
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://<app-fqdn>:4317
    - name: OTEL_EXPORTER_OTLP_PROTOCOL
      value: grpc
  dnsPolicy: ClusterFirst
  restartPolicy: Always

Deployment & Expose port

kubectl apply -f sample-flask-app.yaml

One can expose the application via NodePort or LoadBalancer as service.

 kubectl expose pod --port=5002 --name=sample-flask-svc –type NodePort

Update OTel config map configuration

You need to add SigNoz cloud credentials at OTel config map:

kubectl get configmap kubelet-otel-k8s-infra-otel-agent -n signoz -o yaml
apiVersion: v1
data:
  otel-agent-config.yaml: |2-
    exporters:
      Otlp:
        endpoint: "ingest.{region}.signoz.cloud:443"
        headers:
          signoz-access-token: "<SIGNOZ_INGESTION_KEY>"
        tls:
          insecure: false
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133

Visualize tracing in SigNoz cloud

Application in SigNoz Dashboard

Application in SigNoz Dashboard

Application-level traces in SigNoz

Application-level traces in SigNoz

Troubleshooting

If you encounter any issues while setting up logging and metrics for your GKE cluster, follow these troubleshooting steps:

  1. Check the logs of the OpenTelemetry Collector:

    kubectl logs -f -n signoz -l app.kubernetes.io/component=otel-agent
    

    Review the logs for any error messages or indications of misconfiguration.

  2. Verify the rendered configuration:

    kubectl get configmap/kubelet-otel-k8s-infra-otel-agent -n signoz -o yaml
    

    Ensure the configuration matches your expected settings, including the SigNoz API key and the OpenTelemetry Collector endpoint.

  3. Confirm that the necessary Kubernetes resources are created:

    kubectl get pods,services,configmaps -n signoz
    

    Check if the required pods, services, and config maps are running and in a healthy state.

  4. Verify network connectivity:

    • Ensure that the GKE cluster has network access to the SigNoz ingestion endpoint (ingest.<region>.signoz.cloud:443).
    • Check if there are any network security groups or firewalls blocking the required ports.
  5. Double-check the SigNoz API key:

    • Confirm that the provided signozApiKey is correct and has the necessary permissions to ingest data.
  6. Check if all the Pixie deployments are in the correct order:

    • Check all the configurations and deployments are in the correct order so that Pixie can forward all the scraped traces to the respective endpoint.

Was this page helpful?