Misc

Yaml Arbitrary Code Execution

YAML, short for ‘YAML Ain’t Markup Language,’ is a human-readable data serialization format often used for configuration files. It is valued for its simplicity and readability compared to formats like XML or JSON. However, what many developers might not realize is that YAML can pose serious security risks when improperly parsed. One of the most critical vulnerabilities associated with YAML is arbitrary code execution, which can lead to complete system compromise. Understanding this issue is crucial for developers, system administrators, and anyone working with applications that rely on YAML for configuration or data exchange.

What Is YAML Arbitrary Code Execution?

Definition and Overview

YAML arbitrary code execution refers to a situation where a YAML file, when parsed using unsafe methods, can execute malicious code on the host system. This happens because some YAML parsers allow objects to be instantiated during deserialization. If not properly handled, a malicious YAML payload can exploit this feature to run arbitrary commands.

Why YAML Is Vulnerable

The vulnerability stems from how certain YAML parsers, particularly in dynamic programming languages like Python and Ruby, deserialize data. For instance, in Python, usingyaml.load()from the PyYAML library with default settings can result in the execution of arbitrary Python code embedded in the YAML file.

Common Scenarios Where YAML Is Used

  • Application configuration files (e.g., Docker Compose, Ansible, Kubernetes)
  • Data storage for internal or local application logic
  • CI/CD pipelines configuration (e.g., GitHub Actions, GitLab CI)
  • Cloud service definitions and deployment scripts

Each of these scenarios involves potentially sensitive environments, which makes the impact of YAML vulnerabilities even more dangerous.

How Arbitrary Code Execution Happens

Example in Python

Here is a simplified example using the PyYAML library in Python:

import yaml data = yaml.load('!!python/object/apply:os.system ['ls']', Loader=yaml.FullLoader)

In this case, when the YAML string is loaded, it uses Python’sos.system()function to execute a shell command. If this command were something likerm -rf /, the consequences could be catastrophic.

Example in Ruby

Ruby’s YAML parser is also known for being unsafe by default in older versions:

YAML.load('--- !ruby/object:Kernel {}\nmethod: `ls`')

This code can execute system commands through the backtick operator when deserialized.

Real-World Impact

Security Breaches and CVEs

There have been several real-world examples where YAML vulnerabilities were exploited or patched as critical security flaws. Notable CVEs include:

  • CVE-2017-18342: A vulnerability in the Ruby YAML parser that allowed arbitrary code execution.
  • CVE-2020-1747: A flaw in Ansible that stemmed from unsafe YAML parsing.

These examples show how serious the consequences can be when YAML parsing is not handled securely.

Targets and Consequences

Any application that allows user-supplied YAML input or processes YAML files uploaded from untrusted sources is at risk. Consequences include:

  • Unauthorized access to sensitive data
  • System compromise and malware installation
  • Privilege escalation
  • Destruction or alteration of critical files

Preventing YAML Arbitrary Code Execution

Use Safe Loaders

Most YAML libraries provide a safe loading function. For example, in Python, replaceyaml.load()withyaml.safeload():

import yaml safedata = yaml.safeload(userinput)

This ensures that only basic data types such as dictionaries, lists, and strings are parsed.

Never Trust User Input

Do not allow users to upload or submit YAML files that are later parsed by the server. If this is unavoidable, use strict validation and sanitization mechanisms. Assume that all user input is potentially dangerous.

Keep Libraries Updated

Older YAML parsing libraries may not include the latest security patches. Always ensure that dependencies are up to date and audit them regularly using tools likepip-audit,npm audit, orbundler audit.

Limit Execution Permissions

Run your application in a sandboxed or limited-permission environment. Even if an attacker manages to execute code, the scope of their access can be limited by restricting file system or network privileges.

Secure YAML Parsing by Language

Python (PyYAML)

  • Usesafeloadinstead ofload
  • Update to the latest version of PyYAML

Ruby (Psych)

  • Upgrade Ruby and the Psych library to the latest version
  • Avoid usingYAML.loadon untrusted input

JavaScript (yaml npm package)

  • Use the latest stable version
  • Enable schema restrictions where available

Go (gopkg.in/yaml.v3)

  • Ensure that input is only parsed if it’s trusted
  • Use static typing to avoid injection risks

Developer Best Practices

  • Use static code analysis tools to detect unsafe YAML parsing
  • Write unit tests that simulate malicious YAML payloads and ensure the app responds safely
  • Include YAML parsing in threat modeling discussions
  • Document any instance where YAML is used in configuration, especially with third-party modules

YAML arbitrary code execution is a dangerous vulnerability that can impact applications of all sizes and types. While YAML itself is not inherently insecure, the way it is used and parsed determines the risk. Developers must take active steps to mitigate these risks by using safe loaders, avoiding the deserialization of untrusted input, and staying informed about security updates in their libraries. With proper precautions, YAML can continue to be a reliable and user-friendly tool in modern software development without compromising security.