When Prometheus fails to start using systemd, the issue typically stems from incorrect permissions, configuration errors, or service unit file problems. You can fix this by checking systemd logs, verifying permissions, and ensuring proper configuration settings. This guide provides step-by-step solutions to get your Prometheus service running.

Understanding Prometheus Systemd Service Failures

Prometheus, when run as a systemd service, depends on systemd for startup, configuration, and maintenance throughout its lifecycle. Understanding how systemd manages Prometheus is essential to diagnose and resolve service failures, which can stem from a variety of issues ranging from permissions to file dependencies.

How systemd Manages Prometheus Services

Systemd is the system and service manager on many Linux distributions, handling the lifecycle of services, like Prometheus, which includes starting, stopping, monitoring, and restarting them as necessary. Prometheus, when installed as a service, is typically controlled by a systemd service unit file. A systemd unit file defines how the service should run, what dependencies it requires, and under what conditions it should restart. This file ensures that Prometheus starts at boot and is restarted in the event of unexpected failures.

Key points to understand:

  • Unit Files: These are configuration files that tell systemd how to manage services, including when to start or stop the service and what dependencies are required.
  • Service Commands: Commands like systemctl start prometheus, systemctl stop prometheus, and systemctl restart prometheus are used to manage the service based on the configurations in its unit file.

Common Error Patterns in systemd Logs

Systemd records service activity and errors in logs, which can be accessed using the journalctl command. Common errors for Prometheus include:

  • Service Not Starting: This can occur due to incorrect configuration or missing files. Look for messages such as:
    • prometheus.service: Failed to start Prometheus
    • execvp failed: No such file or directory (indicating an incorrect path to the Prometheus binary)
  • Permission Denied Errors: If the Prometheus service does not have appropriate permissions to access necessary files or directories, systemd will log “permission denied” errors.
  • Configuration File Access Issues: Incorrect permissions or missing configuration files can prevent Prometheus from starting, which systemd logs can help identify. Errors such as error loading configuration file 'prometheus.yml' can indicate syntax issues or missing files in the configuration.
  • Storage Directory Problems: Prometheus requires access to specific directories for data storage. Problems such as missing directories or improper permissions can cause the service to fail.
  • Service Dependency Failures: Systemd manages dependencies, and if a required service or target (like network availability) is not met, Prometheus may fail to start.

Permission and Ownership Requirements

Prometheus needs specific permissions and ownership settings to function properly:

  • Prometheus Binary: The Prometheus binary must be executable by the user or group specified in the systemd service unit file (typically a non-root user for security reasons, e.g., prometheus).

  • Configuration Files: The prometheus.yml configuration file and any other files Prometheus needs (e.g., alerting rules, TLS certificates) should be readable only by Prometheus or privileged users. Using chown and chmod commands can help set the appropriate permissions and ownership on critical files and directories.

    sudo chown -R prometheus:prometheus /etc/prometheus
    sudo chmod -R 755 /etc/prometheus
    
  • Data Directory: Prometheus stores its time-series data on disk, usually under /var/lib/prometheus/. The Prometheus user needs write permissions to this directory. If the ownership or permissions are incorrect, Prometheus may fail to write data, resulting in errors.

Service Unit File Structure and Dependencies

The Prometheus systemd unit file defines how systemd should start, stop, and restart the service. It contains:

  • [Unit] Section: Describes the service and lists its dependencies.
  • [Service] Section: Contains the commands to start Prometheus, restart options, and user permissions.
  • [Install] Section: Defines whether the service should start at boot.

Here's what a typical systemd service file for Prometheus looks like:

[Unit]
Description=Prometheus Monitoring System
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target
Service File
Service File

This structure allows systemd to manage Prometheus effectively, handling automatic restarts and managing dependencies with other system services, ensuring Prometheus runs smoothly within the system environment.

Quick Troubleshooting Checklist

This checklist offers a step-by-step approach to diagnose and resolve common issues with the Prometheus service running under systemd. By following these steps, you can quickly identify potential causes of service failures and address them effectively.

  1. Check Service Status: Start by verifying whether the Prometheus service is active and running. Using systemctl status, you can check the current status of the Prometheus service and see if it’s in an active, failed, or inactive state. This command also provides insights into any immediate issues, such as failed dependencies or permission errors.

    sudo systemctl status prometheus
    
    Checking Service Status
    Checking Service Status
  2. View Detailed Logs: The journalctl command provides detailed logs for services managed by systemd. By checking the logs for Prometheus, you can find specific error messages or warnings that help pinpoint issues, such as file access errors or missing dependencies. The -f flag allows you to monitor the logs in real-time as changes occur

    sudo journalctl -u prometheus -f
    
    Detailed logs for Prometheus
    Detailed logs for Prometheus
  3. Verify File Permissions: Prometheus requires correct permissions on its configuration files to operate. Incorrect permissions may prevent the service from reading or writing to necessary directories, leading to failures. The following commands check permissions for both the configuration file (prometheus.yml) and the storage directory (/var/lib/prometheus/).

    ls -l /etc/prometheus/prometheus.yml
    ls -l /var/lib/prometheus/
    
    Verifying File permissions
    Verifying File permissions
  4. Validate Service Configuration: Review the Prometheus configuration file (prometheus.yml) to check for any syntax errors or misconfigurations, as these can prevent the service from starting or cause unexpected behavior. After making any changes, remember to reload or restart the service to apply them.

    To verify the systemd service configuration for Prometheus, you can use:

    sudo systemd-analyze verify prometheus.service
    

Common Error Messages Explained

Understanding common Prometheus error messages can save valuable troubleshooting time. Here, we break down frequent issues and provide clear solutions to help you keep your monitoring running smoothly.

  1. Failed to Start Prometheus Server: This error usually means that the Prometheus service encountered a problem during startup, commonly caused by syntax errors in the configuration file or missing dependencies. To troubleshoot this issue, start by checking the syntax of your configuration file.

    Solution: Check Configuration File Syntax Use the promtool command to validate the prometheus.yml file for any syntax errors. This tool will highlight specific mistakes, helping you make corrections before restarting the service.

    promtool check config /etc/prometheus/prometheus.yml
    
    Checking config syntax
    Checking config syntax

    Explanation: The promtool command checks for syntax issues in prometheus.yml. If there are any, this command will specify them, enabling you to fix errors promptly and restart the service without issues.

  2. Permission Denied: This error indicates that Prometheus does not have the necessary permissions to access certain files or directories, such as its configuration file or data directories. Without the proper permissions, Prometheus cannot read or write to the resources it requires to run.

    Solution: Adjust Permissions To resolve this issue, ensure that the Prometheus user owns the configuration and data directories. The following commands adjust ownership and permissions so that Prometheus can access its files as needed:

    # Fix ownership for Prometheus directories
    sudo chown -R prometheus:prometheus /etc/prometheus
    sudo chown -R prometheus:prometheus /var/lib/prometheus
    
    Fix file ownership
    Fix file ownership

    Explanation: These commands assign ownership of critical Prometheus directories to the Prometheus user. This gives Prometheus the permissions it needs to read configuration files and write to data directories, preventing “permission denied” errors.

  3. Service Start Request Repeated too Quickly: This error occurs when systemd detects multiple restart attempts within a short time. If Prometheus repeatedly fails to start, systemd places the service in a “failed” state to prevent continuous, unsuccessful restarts. You’ll need to reset this state before you can attempt to start Prometheus again.

    Solution: Reset Failed State: To clear the service’s failed state, use the reset-failed command and then attempt to start Prometheus anew.

    # Reset the failed state and start Prometheus
    sudo systemctl reset-failed prometheus
    sudo systemctl start prometheus
    

    Explanation: The reset-failed command clears the failed state in systemd, enabling a fresh start attempt for the Prometheus service. This is especially helpful after addressing any underlying issues that were causing the repeated failures.

  4. Exit Code Failures: Exit codes provide information on why the Prometheus service stopped. For example, Exit Code 1 often indicates a configuration or permissions issue, while Exit Code 2 may suggest a syntax error within the prometheus.yml file.

    Solution: Check Logs for Exit Code Details To understand the specific reason for an exit code failure, examine the Prometheus logs with journalctl, which provides a more detailed error message:

    sudo journalctl -u prometheus -f
    

    Explanation: Reviewing logs in journalctl gives context for the exit codes and helps pinpoint the exact problem, such as syntax issues or missing dependencies, so you can apply targeted fixes to restart Prometheus successfully.

Step-by-Step Solutions

For production-grade Prometheus setups, it’s essential to address common issues related to permissions, storage directories, and service configurations. Here’s a guide to troubleshooting and configuring Prometheus step-by-step:

  1. Fix Permission Issues: Permissions are crucial for Prometheus to access its configuration and data directories securely. These steps will ensure Prometheus has the necessary user and group setup and correct permissions.
    1. Create a dedicated Prometheus user: This user will run the Prometheus service without unnecessary system access.

      sudo useradd --no-create-home --shell /bin/false prometheus
      

      Explanation: The -no-create-home and -shell /bin/false flags restrict this user from logging into the server, enhancing security.

    2. Lock the Prometheus User Account: This prevents login access to the Prometheus user.

      sudo usermod -L prometheus
      
    3. Set permissions for Prometheus directories: Ensure that Prometheus has read-write access to the necessary directories.

      sudo chmod 755 /etc/prometheus
      sudo chmod 755 /var/lib/prometheus
      

      Explanation: Setting directory permissions to 755 allows the Prometheus user access while restricting other users from making changes.

  2. Resolve Storage Directory Problems: Prometheus requires a dedicated storage directory to save time series data. If this directory isn’t set up correctly, Prometheus may fail to start or encounter issues.
    1. Create the required storage directory:

      sudo mkdir -p /var/lib/prometheus
      
    2. Set Directory Ownership: Assign the Prometheus user as the owner of the storage directory.

      sudo chown prometheus:prometheus /var/lib/prometheus
      

      Explanation: These commands create the storage directory with Prometheus ownership, ensuring the service has access to store its data reliably.

  3. Update Service Configuration: If the Prometheus service repeatedly fails to start, systemd may prevent it from restarting quickly. You can adjust settings to control restart frequency and delays for smoother operations.
    1. Edit the Prometheus Service Configuration:

      sudo systemctl edit prometheus
      
      Editing Service File
      Editing Service File
    2. Add the following Settings to Configure Service Start Limits and Delays:

      [Service]
      StartLimitInterval=5min
      StartLimitBurst=10
      RestartSec=30s
      
      Add more Service Configs
      Add more Service Configs

      Explanation: These settings limit the number of restart attempts within a 5-minute period (10 attempts) and add a 30-second delay between restart attempts, preventing rapid retries that could cause issues.

Advanced Configuration Tips

For optimized performance and reliability, consider these additional systemd configuration options.

  1. Set Proper Systemd Limits: Prometheus benefits from increased file and process limits, especially in larger deployments. Adjusting these values can prevent resource constraints from affecting performance.
    1. Edit the Prometheus service file

      sudo systemctl edit prometheus
      
    2. Add Systemd Limits under the [Service] section:

      [Service]
      LimitNOFILE=65535
      LimitNPROC=4096
      

      Explanation: LimitNOFILE=65535 increases the maximum number of open files, and LimitNPROC=4096 raises the process limit. These values support high-traffic environments and prevent Prometheus from hitting file or process limits.

  2. Enable Core Dumps for Debugging: Core dumps help capture program states at the moment of failure, useful for debugging crashes or unexpected behavior in Prometheus.
    1. Add the following settings to the [Service] section in the Prometheus service file:

      [Service]
      LimitCORE=infinity
      

      Explanation: LimitCORE=infinity allows core dumps to be generated without size restrictions. For debugging, ensure core dumps are stored securely as they may contain sensitive data.

By following these steps, you can resolve common Prometheus issues, set up secure permissions, and optimize your Prometheus deployment for production reliability and performance.

SigNoz: A Single-Pane Alternative to your Observability Needs

If managing Prometheus with systemd has proven difficult, SigNoz offers a powerful alternative with numerous advantages, including easier setup, automatic recovery, and integration with OpenTelemetry for unified observability. Designed for teams seeking a solution with minimal configuration, built-in alerting, and seamless integration of metrics, traces, and logs, SigNoz provides a modern and efficient approach to monitoring.

Elevate your monitoring game with SigNoz—an open-source observability tool that simplifies your strategy. With seamless OpenTelemetry integration, SigNoz effortlessly collects and correlates metrics, traces, and logs, providing a comprehensive view of your system’s performance. Below, are key advantages that make SigNoz an essential solution for modern observability requirements:

  • Zero Configuration Setup: Say goodbye to complex systemd configurations. SigNoz offers a hassle-free installation process that automatically sets up monitoring services for you.
  • Automatic Failure Recovery: Never miss a beat with built-in recovery features that automatically restart your services, ensuring high availability and reliability.
  • Smart Storage Management: Forget about manual retention policies; SigNoz intelligently manages storage for you, optimizing performance without additional overhead.
  • Pre-configured Alerts and Dashboards: Get started quickly with out-of-the-box alerts and dashboards, allowing you to monitor critical metrics right away.

SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features.

Get Started - Free CTA

You can also install and self-host SigNoz yourself since it is open-source. With 19,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.

Best Practices for Production Deployment

Deploying Prometheus in a production environment requires careful configuration and security settings to ensure performance, data integrity, and reliability. Here are some essential practices to follow:

  1. Security Settings: Securing Prometheus is critical to protect both the monitoring data and configuration settings. A key step in securing Prometheus is to limit access to sensitive files, especially the configuration file (prometheus.yml), which may contain important settings and credentials. Restrict File Permissions: Set strict permissions to restrict access, so only the Prometheus service can read its configuration file.

    sudo chmod 600 /etc/prometheus/prometheus.yml
    
    Restricting File Permissions
    Restricting File Permissions

    Explanation: Setting permissions to 600 ensures that only the owner (Prometheus) has read and write permissions. This prevents other users or services from accessing sensitive information in the configuration file, enhancing security in multi-user environments.

  2. Backup Configuration: Regular backups of Prometheus configuration files are crucial in production environments to recover quickly from accidental changes or data corruption. You can create a simple backup script to automate this process.

    Create Backup Script: Make regular backups of the configuration file to a secure location to ensure you have recent configurations saved.

    # Create backup script
    sudo cp /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.backup
    
    Making Backups from configuration
    Making Backups from configuration

    Explanation: The cp command creates a backup of prometheus.yml, which can be invaluable for restoring configurations if the original file is accidentally modified or deleted. Automating this backup as part of a script or cron job is a best practice in production environments.

  3. Performance Settings: Configuring Prometheus storage to manage resource usage efficiently is important for production environments with large data volumes. You can control the amount of data stored by setting retention policies based on time or storage limits.

    Configure Storage Retention: Adjust retention settings in prometheus.yml to define how much historical data Prometheus should keep.

    storage:
      tsdb:
        retention.time: 15d
        retention.size: 50GB
    
    Managing Performance Settings
    Managing Performance Settings

    Explanation: Here, retention.time limits the stored data to the most recent 15 days, while retention.size caps storage at 50GB. These settings prevent Prometheus from consuming excessive disk space, helping maintain high performance and ensuring that your storage does not fill up unexpectedly. Tailor these values to your organization’s data requirements and storage capacity.

FAQs

Why does Prometheus fail to start after installation?

Prometheus may fail to start due to missing directories, incorrect permissions, or configuration issues. Use sudo journalctl -xe to view systemd logs for specific error details. Common issues include lack of access to /var/lib/prometheus or syntax errors in prometheus.yml.

How do I reset a failed Prometheus service?

To reset a failed state and restart Prometheus, use:

sudo systemctl stop prometheus
sudo systemctl reset-failed prometheus
sudo systemctl start prometheus

This sequence stops the service, clears the failure state, and then starts Prometheus fresh.

What permissions does Prometheus need?

Prometheus requires read-write access to its storage directory (/var/lib/prometheus) and read access to configuration files. Ensure the Prometheus user owns these directories:

sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus

How can I automate Prometheus service recovery?

Add restart options to the Prometheus service file to allow automatic recovery:

[Service]
Restart=on-failure
RestartSec=5s

This setup restarts Prometheus automatically on failure with a 5-second delay, reducing downtime.

Was this page helpful?