We've observed that current methods (ocker/Cloud Functions) often struggle with two things: security for multi-tenant code and statefulness (the environment resets after every run). To solve this, we're architecting a platform using Firecracker microVMs on bare metal (for high performance/low cost) to provide VM-level isolation. This ensures that when an agent runs code like import pandas as pd; pd.read_csv(...), it's secure and fast.
We need to validate if statefulness is the killer feature. Our questions for those building or deploying agents are:
- Statefulness: For an agent working on a multi-step task (e.g., coding, iterating on a dataset), how critical is the ability to 'pause and resume' the environment with the filesystem intact? Is the current work-around of manual file management (S3/B) good enough, or is it a major bottleneck?
- Compatibility vs. Speed: Is full NumPy/Pandas/Python library compatibility (which Firecracker provides) more important than the potential microsecond startup speeds of a pure WASM environment that often breaks C-extensions?
- The Cost-Security Trade-Off: Given the security risk, would your team tolerate the higher operational complexity of a bare-metal Firecracker solution to achieve VM-level security and a massive cost reduction compared to standard cloud providers?
Thanks for your time, all technical insights are deeply appreciated. We're not selling anything, just validating a strong technical hypothesis.