Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/smolagents_doc/en/secure_code_execution.ipynb
Views: 3347
Secure code execution
[!TIP] If you're new to building agents, make sure to first read the intro to agents and the guided tour of smolagents.
Code agents
Multiple research papers have shown that having the LLM write its actions (the tool calls) in code is much better than the current standard format for tool calling, which is across the industry different shades of "writing actions as a JSON of tools names and arguments to use".
Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets were a better way, this package would have been written in JSON snippets and the devil would be laughing at us.
Code is just a better way to express actions on a computer. It has better:
Composability: could you nest JSON actions within each other, or define a set of JSON actions to re-use later, the same way you could just define a python function?
Object management: how do you store the output of an action like
generate_image
in JSON?Generality: code is built to express simply anything you can have a computer do.
Representation in LLM training corpus: why not leverage this benediction of the sky that plenty of quality actions have already been included in LLM training corpus?
This is illustrated on the figure below, taken from Executable Code Actions Elicit Better LLM Agents.
This is why we put emphasis on proposing code agents, in this case python agents, which meant putting higher effort on building secure python interpreters.
Local code execution??
By default, the CodeAgent
runs LLM-generated code in your environment.
This is inherently risky, LLM-generated code could be harmful to your environment.
Malicious code execution can occur in several ways:
Plain LLM error: LLMs are still far from perfect and may unintentionally generate harmful commands while attempting to be helpful. While this risk is low, instances have been observed where an LLM attempted to execute potentially dangerous code.
Supply chain attack: Running an untrusted or compromised LLM could expose a system to harmful code generation. While this risk is extremely low when using well-known models on secure inference infrastructure, it remains a theoretical possibility.
Prompt injection: an agent browsing the web could arrive on a malicious website that contains harmful instructions, thus injecting an attack into the agent's memory
Exploitation of publicly accessible agents: Agents exposed to the public can be misused by malicious actors to execute harmful code. Attackers may craft adversarial inputs to exploit the agent's execution capabilities, leading to unintended consequences. Once malicious code is executed, whether accidentally or intentionally, it can damage the file system, exploit local or cloud-based resources, abuse API services, and even compromise network security.
One could argue that on the spectrum of agency, code agents give much higher agency to the LLM on your system than other less agentic setups: this goes hand-in-hand with higher risk.
So you need to be very mindful of security.
To improve safety, we propose a range of measures that propose elevated levels of security, at a higher setup cost.
We advise you to keep in mind that no solution will be 100% safe.
Our local Python executor
To add a first layer of security, code execution in smolagents
is not performed by the vanilla Python interpreter. We have re-built a more secure LocalPythonExecutor
from the ground up.
To be precise, this interpreter works by loading the Abstract Syntax Tree (AST) from your Code and executes it operation by operation, making sure to always follow certain rules:
By default, imports are disallowed unless they have been explicitly added to an authorization list by the user.
Furthermore, access to submodules is disabled by default, and each must be explicitly authorized in the import list as well, or you can pass for instance
numpy.*
to allow bothnumpy
and all its subpackags, likenumpy.random
ornumpy.a.b
.Note that some seemingly innocuous packages like
random
can give access to potentially harmful submodules, as inrandom._os
.
The total count of elementary operations processed is capped to prevent infinite loops and resource bloating.
Any operation that has not been explicitly defined in our custom interpreter will raise an error.
You could try these safeguards as follows:
These safeguards make out interpreter is safer. We have used it on a diversity of use cases, without ever observing any damage to the environment.
[!WARNING] It's important to understand that no local python sandbox can ever be completely secure. While our interpreter provides significant safety improvements over the standard Python interpreter, it is still possible for a determined attacker or a fine-tuned malicious LLM to find vulnerabilities and potentially harm your environment. For example, if you've allowed packages like
Pillow
to process images, the LLM could generate code that creates thousands of large image files to fill your hard drive. Other advanced escape techniques might exploit deeper vulnerabilities in authorized packages. Running LLM-generated code in your local environment always carries some inherent risk. The only way to run LLM-generated code with truly robust security isolation is to use remote execution options like E2B or Docker, as detailed below.
The risk of a malicious attack is low when using well-known LLMs from trusted inference providers, but it is not zero. For high-security applications or when using less trusted models, you should consider using a remote execution sandbox.
Sandbox approaches for secure code execution
When working with AI agents that execute code, security is paramount. There are two main approaches to sandboxing code execution in smolagents, each with different security properties and capabilities:
Running individual code snippets in a sandbox: This approach (left side of diagram) only executes the agent-generated Python code snippets in a sandbox while keeping the rest of the agentic system in your local environment. It's simpler to set up using
executor_type="e2b"
orexecutor_type="docker"
, but it doesn't support multi-agents and still requires passing state data between your environment and the sandbox.Running the entire agentic system in a sandbox: This approach (right side of diagram) runs the entire agentic system, including the agent, model, and tools, within a sandbox environment. This provides better isolation but requires more manual setup and may require passing sensitive credentials (like API keys) to the sandbox environment.
This guide describes how to set up and use both types of sandbox approaches for your agent applications.
E2B setup
Installation
Create an E2B account at e2b.dev
Install the required packages:
Running your agent in E2B: quick start
We provide a simple way to use an E2B Sandbox: simply add executor_type="e2b"
to the agent initialization, as follows:
This solution send the agent state to the server at the start of each agent.run()
. Then the models are called from the local environment, but the generated code will be sent to the sandbox for execution, and only the output will be returned.
This is illustrated in the figure below.
However, since any call to a managed agent would require model calls, since we do not transfer secrets to the remote sandbox, the model call would lack credentials. Hence this solution does not work (yet) with more complicated multi-agent setups.
Running your agent in E2B: multi-agents
To use multi-agents in an E2B sandbox, you need to run your agents completely from within E2B.
Here is how to do it:
Docker setup
Installation
Install the required packages:
Running your agent in E2B: quick start
Similar to the E2B Sandbox above, to quickly get started with Docker, simply add executor_type="docker"
to the agent initialization, like:
Advanced docker usage
If you want to run multi-agent systems in Docker, you'll need to setup a custom interpreter in a sandbox.
Here is how to setup the a Dockerfile:
Create a sandbox manager to run code:
Best practices for sandboxes
These key practices apply to both E2B and Docker sandboxes:
Resource management
Set memory and CPU limits
Implement execution timeouts
Monitor resource usage
Security
Run with minimal privileges
Disable unnecessary network access
Use environment variables for secrets
Environment
Keep dependencies minimal
Use fixed package versions
If you use base images, update them regularly
Cleanup
Always ensure proper cleanup of resources, especially for Docker containers, to avoid having dangling containers eating up resources.
✨ By following these practices and implementing proper cleanup procedures, you can ensure your agent runs safely and efficiently in a sandboxed environment.
Comparing security approaches
As illustrated in the diagram earlier, both sandboxing approaches have different security implications:
Approach 1: Running just the code snippets in a sandbox
Pros:
Easier to set up with a simple parameter (
executor_type="e2b"
orexecutor_type="docker"
)No need to transfer API keys to the sandbox
Better protection for your local environment
Cons:
Doesn't support multi-agents (managed agents)
Still requires transferring state between your environment and the sandbox
Limited to specific code execution
Approach 2: Running the entire agentic system in a sandbox
Pros:
Supports multi-agents
Complete isolation of the entire agent system
More flexible for complex agent architectures
Cons:
Requires more manual setup
May require transferring sensitive API keys to the sandbox
Potentially higher latency due to more complex operations
Choose the approach that best balances your security needs with your application's requirements. For most applications with simpler agent architectures, Approach 1 provides a good balance of security and ease of use. For more complex multi-agent systems where you need full isolation, Approach 2, while more involved to set up, offers better security guarantees.