The First Java Harness Framework Is Here AgentScope Brings OpenClaw to Enterprise Distributed Scenarios

Editor’s Note: This is an exact English translation of the original Chinese article published by Liu Jun on the Alibaba Cloud Developer WeChat account. It announces the AgentScope Java 1.1.0 milestone release and explains how the Harness Framework brings OpenClaw-style engineering to enterprise distributed scenarios.

Introduction

This article formally announces the release of the AgentScope Java 1.1.0 milestone version, with a focus on how this version fully delivers on the “Harness Framework” concept from an engineering-practice standpoint.

Picking up from where we left off: in a previous article I did a deep dive into OpenClaw and the Harness Engineering practices behind it, and sketched out a “Harness Framework” to explain how to apply these ideas to enterprise-grade agent development.

The good news is that AgentScope Java 1.1.0 has been officially released, and in this milestone version we have fully implemented that “Harness Framework” plan. Developers can use version 1.1 to quickly practice Harness — building local apps such as personal-productivity XxxClaw or Coding Agents, as well as enterprise-grade applications like DataAgents and SRE Agents designed for distributed scenarios.

AgentScope Java 1.1.0 delivers four core capabilities:

Workspace-driven Agent runtime environment: The agent’s persona, knowledge, skills, memory, and sub-agent specs are all consolidated in a structured workspace. On every run the context is automatically loaded from the workspace, and memory is automatically written back when the run ends — the agent’s capabilities continuously evolve over time.
Pluggable abstract filesystem: The physical storage backing the workspace can be freely swapped — local disk, remote shared storage, or isolated sandbox — all accessed through the same interface. The same agent logic requires no modifications to adapt from a personal development environment to enterprise distributed deployment.
Out-of-the-box context management: Built-in conversation compaction, a two-layer memory-distillation mechanism, and full-text search address two stubborn problems: long-conversation context bloat and cross-session memory loss. A background maintenance mechanism ensures the memory store does not grow out of control over time.
Sub-agent orchestration and isolated execution: Supports declarative sub-agent definitions and synchronous or asynchronous task delegation. Tool execution can be configured to run inside an isolated sandbox, with sandbox state resumable across multiple conversation turns, while maintaining session- and user-level isolation for multi-tenant scenarios.

OpenClaw / Hermes Are Great — But Unusable in Enterprise Agent Scenarios?

Over the past year, agent products such as OpenClaw, Hermes, and Claude Code have triggered a wave of excitement and popularized the Harness Engineering philosophy behind them — using structured workspaces, context management, and tool conventions to replace the “every conversation starts from scratch” primitive approach. More and more teams have begun transplanting this thinking into their own agent development.

However, people who actually try to implement it often discover that the path gets stuck when they reach “enterprise-grade.” We distilled the five most commonly cited obstacles from front-line developers:

1. Multi-user, multi-replica — what happens to the workspace?

OpenClaw uses a local directory as its workspace, which works perfectly for single-machine, single-user scenarios. But once you expose the service externally, multiple users’ workspaces need to be isolated, and when the agent horizontally scales to multiple machines, the same user’s workspace must be shared across replicas — the local-directory assumption breaks down entirely.

2. Tools and Skill Scripts can’t run on the host machine — how do you isolate execution?

Having the agent invoke a shell or run user-provided code is harmless on a trusted local dev machine, but once it’s a live service, executing any user-supplied command directly on the host machine is a security vulnerability. Sandboxing is mandatory, but “having a sandbox” is only the first step: the tools inside the sandbox still need access to the full context; the same sandbox instance must be resumable across multiple conversation turns rather than starting from zero each time.

3. How do you move the “workspace + filesystem” combination to a distributed environment?

A filesystem-driven workspace is the most intuitive and effective pattern in Harness Engineering, but the prerequisite of that pattern is “a filesystem.” In distributed scenarios there is no unified local disk; remote storage, KV services, and object storage each have their own interfaces. Rewriting from scratch couples the agent logic tightly to the infrastructure.

4. How should Multi-Agent be done correctly?

Sub-task dispatch, context isolation, async execution, result collection, timeout cancellation — each item is not hard in isolation, but assembling them into a manageable orchestration layer causes code complexity to escalate rapidly. Most frameworks only provide primitives; the engineering questions of “how to declare a sub-agent, when to spawn it, how to manage state” are left entirely to the developer to figure out.

5. Is there an out-of-the-box implementation of context compression and layered memory?

Harness Engineering describes these two things very clearly, but actually implementing them involves a great many details: compression timing, compression strategy, fact extraction before compression, retrievability of history, recovery after cross-process restarts… Most frameworks only provide an abstract short/long memory interface; the concrete implementation is still up to you.

The root cause of these five problems is the same: personal-assistant agents and enterprise-grade agents are two different engineering forms. Applying the same set of assumptions to both scenarios will inevitably hit a wall.

From a deployment topology standpoint: A personal assistant is single-user, single-process, with all state on one machine. An enterprise agent must scale horizontally, support multi-tenancy, stay continuously available — state must be storable and recoverable in a distributed manner.
From a security boundary standpoint: Local tool execution poses no risk; arbitrary shell execution on a production server is a severe attack surface. Sandboxing and permission boundaries are not “optional optimizations” but “preconditions for going live.”
From an observability standpoint: If a personal tool breaks, you check the log yourself. Enterprise services require memory persistence to disk, auditable sessions, and trackable state changes.
From a token-economy standpoint: Individual users are insensitive to latency and cost; in enterprise scenarios every needless context re-injection is a real cost.

So — is there a framework that lets you “write one set of logic and switch deployment form on demand”?

AgentScope Java 1.1.0’s Harness module (entry class HarnessAgent) is designed exactly around that goal. It does not replace the ReActAgent inference loop; instead it inserts Hooks at critical points in the loop, supplements a set of tool and workspace conventions, packages the engineering answers to the five problems above, and lets you focus on the agent’s business logic rather than the infrastructure.

AgentScope Harness Design Philosophy: Why Can It Solve These Problems?

The design philosophy of AgentScope Java Harness can be summarized in one sentence: package the engineering answers to “what to do next turn, what to do the next day, what to do when context overflows, what to do when state is lost” — rather than letting every agent project reinvent them.

At the implementation level, two core pillars support the entire framework.

Core Pillar 1: Workspace as the Single Source of Truth

Harness introduces the concept of a workspace for each agent — a structured directory that holds all persistent content the agent needs to operate: persona definition (AGENTS.md), long-term memory (MEMORY.md), domain knowledge (knowledge/), reusable skills (skills/), sub-agent specs (subagents/), and session history (agents/<agentId>/).

This is not a new idea — OpenClaw and Hermes have both found in practice that giving an agent a stable “workbench” is far more effective than re-initializing from scratch each time. Harness systematizes that intuition: the workspace is the agent’s single source of truth (Source of Truth); all state reads and writes revolve around the workspace rather than being scattered across code, databases, and memory.

In actual operation: before each inference begins, WorkspaceContextHook automatically injects AGENTS.md, MEMORY.md, knowledge/, and other key files into the system prompt, ensuring the agent’s persona and knowledge are fully present in every turn. After the agent run ends, MemoryFlushHook distills new facts from the conversation and writes them to the memory file; the background MemoryConsolidator then periodically merges the running log into refined long-term memory. The workspace continuously evolves through conversations — every run knows the user and the task a little better than the last.

Core Pillar 2: AbstractFilesystem — Making the Workspace Run Anywhere

The workspace concept is compelling, but there is one practical constraint: a local disk directory does not work in distributed scenarios. Multiple pods each have their own local disk — where does MEMORY.md get written? Which replica’s version is the “real” one?

AgentScope Java Harness solves this with an AbstractFilesystem abstraction layer. From the upper layer’s perspective, the agent only needs to call unified read/write/ls/grep interfaces without caring where the “files” actually live. From the lower layer’s perspective, it can adapt to local disk, remote object storage (OSS), KV databases (Redis), sandbox filesystems, or any other medium — or route different paths to different backends via CompositeFilesystem.

Based on the AbstractFilesystem interface, AgentScope Java provides three built-in extension implementations corresponding to three usage modes.

In AgentScope Java 1.1, the workspace is the core abstraction for the agent. AbstractFilesystem serves as the physical implementation carrier of the workspace; all file operations, command execution, and memory management tools use AbstractFilesystem as the standard operation entry point.

Based on this filesystem abstraction layer, the AgentScope Java framework directly delivers three major engineering capabilities:

Security and Isolation

Shell/Code/Skill execution is isolated through the sandbox backend; user-input-driven commands no longer run directly on the host machine.
The workspace itself can run inside a sandbox, achieving isolation at the file read/write level.
Tool registration and exposure is managed uniformly by the framework; the execute tool only appears when the backend has implemented a sandbox interface.

Distributed Deployment

Agents can be deployed as peer replicas; MEMORY.md, session logs, and other critical files are routed to shared storage through the Remote backend, naturally achieving cross-node synchronization.
By combining IsolationScope (SESSION / USER / AGENT / GLOBAL) with RuntimeContext, session-level isolation, user-level sharing, and other multi-tenancy policies are achieved without changing the code.

Sub-agent and Async Task Support

Sub-agents’ workspaces, filesystems, and session state are inherited from the parent agent or independently configured; orchestration policy is declared in specs, no manual assembly required.
The async task state machine (PENDING/RUNNING/COMPLETED/FAILED/CANCELLED) and result-collection mechanism are out-of-the-box, with support for swapping to cross-process implementations.

Typical AgentScope Harness Use Cases

The following three scenarios cover the typical development forms from personal to enterprise. They are not mutually exclusive options — they represent three different complexity paths. You can start with the simplest one and migrate incrementally as requirements evolve.

Personal Agent — Exemplified by OpenClaw-Type Applications

Characteristics: Single-user, runs locally, needs to operate local files or execute scripts. Typical products: personal assistants, note-taking bots, local Coding Agents.

The core need in this scenario is “let the agent truly know me and remember me,” not just a stateless Q&A machine. Harness’s value here: AGENTS.md in the workspace defines the agent’s persona and behavioral preferences; after a conversation ends, new facts are automatically distilled and written into memory; the next time you open the app the agent still recognizes you and remembers where you left off. Skills and domain knowledge also live in the workspace — editable and adjustable at any time without touching code.

In a local deployment you can also enable shell execution, letting the agent run scripts and operate the filesystem directly — which is the most attractive aspect of OpenClaw-type products. Harness adds “continuous evolution” on top of that: the workspace is the agent’s brain, growing more experienced with every conversation.

Core capabilities Harness provides in this scenario:

Persistent memory: New facts are automatically distilled and written to the workspace after each conversation; the next launch requires no re-briefing of background context; long-term memory accumulates with use.
Local shell execution: In a trusted local environment, the agent can run scripts and operate files directly, reproducing the core experience of OpenClaw-type products.
Workspace as configuration: Modify AGENTS.md to adjust persona; add new skills in the skills/ directory — changing one file upgrades the agent without recompiling or redeploying.
Cross-process session resume: Close and reopen — as long as the sessionId hasn’t changed, the entire state of the last conversation is restored; it does not start from zero.

Enterprise Data Service — Exemplified by DataAgent

Characteristics: Serves multiple users, needs to execute SQL / Python / Shell, tasks are long-running, input comes from untrusted external users; multi-turn conversation state must be resumable, and multi-replica deployment must deliver a consistent user experience.

The biggest risk in this scenario is execution security — user-driven code must not run without restriction on the server. Harness’s sandbox mechanism confines the agent’s file operations and command execution to an isolated environment, leaving the server process unaffected. More critically, the sandbox is not “use-and-discard” — after each conversation turn the sandbox state is persisted, and on the next turn it is resumed where it left off; users won’t lose their work progress due to service restarts or node switches.

In multi-replica deployments, the user’s long-term memory (the agent’s accumulated understanding of that user) can be stored in shared storage. Regardless of which node the request lands on, the agent sees the same memory. Long analysis tasks can be split into multiple sub-agents running in parallel; the main agent only coordinates and aggregates without blocking the entire time.

Core capabilities Harness provides in this scenario:

Isolated sandbox execution: All code and commands run inside an isolated environment; the host service process is unaffected by user input; security boundaries are clear.
Multi-turn sandbox state resume: Sandbox state is automatically saved after each conversation turn and restored in-place at the start of the next turn or service restart; the user’s work context is not lost.
Distributed memory sharing: User long-term memory is stored in shared storage; all nodes in a multi-node deployment read the same “understanding of this user” — consistent experience.
Sub-agent parallel orchestration: Long tasks can be decomposed into multiple sub-agents running concurrently; the main agent only coordinates; overall efficiency is higher and timeout/failure management is easier.
Multi-tenant isolation: Workspaces and execution environments are isolated at the session or user dimension; multiple online users do not interfere with each other.

Enterprise Online Service — Exemplified by Taobao/Tmall Transaction Agent

Characteristics: Primarily completes tasks by calling business APIs (placing orders, querying, approving, etc.); no need to execute shell on the server; requires multi-instance operation, persistent session state, and cross-user knowledge sharing.

The core need in this scenario is stability and safety — an online service cannot afford an incident caused by the agent invoking an unintended shell command. Harness’s value here: when sandbox execution is not configured, the framework does not expose shell tools by default; the agent can only interact externally through explicitly defined business tools. The security boundary is determined by configuration, not developer discipline.

Session state and memory can be stored in remote storage, shared across multiple service instances; when a user starts a new conversation through a different entry point, the agent can still pick up where the last context left off. When multiple sub-tasks need to run in parallel (e.g., simultaneously querying inventory, calculating discounts, generating summaries), the sub-agent mechanism applies equally.

Core capabilities Harness provides in this scenario:

Default security boundary: Without enabling sandbox execution, the framework does not expose shell tools; the agent can only interact externally through explicitly registered business tools; security policy is determined by configuration.
Multi-instance shared memory: Session state and user memory are persisted to remote storage; any service instance can read the same context; users switch between instances transparently.
Cross-request session continuity: Each request carries the same user identifier; the agent automatically resumes the last conversation state, delivering a truly continuous multi-turn dialogue experience.
Parallel sub-task support: When multiple business steps need to be handled simultaneously, sub-tasks can be delegated to sub-agents for parallel execution; results are aggregated before a unified reply.

AgentScope Harness Deep Dive

Quick Start

Getting started with Harness takes three steps: add the dependency, prepare the workspace, and build and call the agent.

1. Add the dependency

<dependency>
    <groupId>io.agentscope</groupId>
    <artifactId>agentscope-harness</artifactId>
    <version>${agentscope.version}</version>
</dependency>

2. Prepare the workspace

Choose a directory on disk as the workspace and create AGENTS.md inside it. This is not an “optional initialization step” — it is the core entry point for Harness. The agent’s persona, memory, skills, and sub-agent specs all revolve around this directory.

3. Build HarnessAgent and call it

HarnessAgent agent = HarnessAgent.builder()
    .name("my-agent")
    .model(model)
    .workspace(Paths.get(".agentscope/workspace"))
    .compaction(CompactionConfig.builder()
        .triggerMessages(50)
        .keepMessages(20)
        .build())
    .build();

RuntimeContext ctx = RuntimeContext.builder()
    .sessionId("user-session-001")
    .userId("alice")
    .build();

Msg reply = agent.call(userMessage, ctx).block();

Core Concepts

Concept	Definition	Problem Solved	Usage Guidance
HarnessAgent	Engineering wrapper built on ReActAgent; assembles Hooks, built-in tools, skills, and session persistence at build time	“Don’t want to assemble compaction, memory, session, sub-tasks, filesystem from scratch”	Business code only interacts with `HarnessAgent.builder()` and `agent.call(msg, ctx)`
workspace	The agent’s working directory hosting all persistent content	“Where to put persona, knowledge, memory, state”	Plan workspace structure before writing prompts; treat it as a versionable asset
filesystem	Unified file read/write interface; abstraction layer between agent tools and physical storage	“How the same logic switches between local, shared storage, and sandbox”	Choose from three declarative modes (Local / Remote / Sandbox)
RuntimeContext	Identity context for a single `call()`; passed fresh each call, not persisted	“Who is in this turn, where to read/write state, how to isolate multi-tenancy”	Always pass a stable `sessionId`; pass `userId` in multi-tenant scenarios
sandbox	Isolated execution environment; state persisted after each turn and resumed on the next	“Safe execution with untrusted input while keeping multi-turn state continuous”	Enable when code execution is needed; choose isolation granularity per business needs
memory	Two-layer system: daily running log + background-maintained long-term memory with full-text search	“Long conversations don’t lose facts, context doesn’t overflow, history is searchable”	Enable conversation compaction; use search tool to retrieve older facts

Key Features

Workspace: The Agent’s Single Source of Truth

workspace/
├── AGENTS.md              ← Persona and conventions; auto-injected into system prompt
├── MEMORY.md              ← Refined long-term memory; auto-maintained by background process
├── knowledge/             ← Domain knowledge; injected alongside AGENTS.md
├── skills/                ← Reusable skills; auto-assembled into agent toolset
├── subagents/             ← Sub-agent spec declarations; auto-discovered and loaded
└── agents/<agentId>/
    ├── context/           ← Session state snapshots (for process-restart recovery)
    ├── sessions/          ← Conversation JSONL and compressed context for audit
    └── memory/            ← Daily memory running log

Memory Management: Two-Layer Approach

Layer 1 — Daily running log: After each conversation, new facts are distilled by LLM and appended to memory/YYYY-MM-DD.md. Append-only, never modified.

Layer 2 — Long-term memory: A background scheduler periodically reads recent daily logs, merges them with MEMORY.md using LLM, deduplicates and refines — outputting a token-budget-compliant “injectable version.”

Compaction config:

.compaction(CompactionConfig.builder()
    .triggerMessages(50)
    .keepMessages(20)
    .flushBeforeCompact(true)
    .build())

Filesystem: Three Modes

Mode 1: Local + Shell (default) — Workspace is a local directory; shell commands can be executed. Suitable for personal/dev environments.

Mode 2: Remote Shared Storage — Memory and session logs route to remote KV (e.g., Redis). Shell tools not registered by default. For multi-replica online services.

Mode 3: Sandbox Execution — All file ops and command execution in isolated sandbox. Host process unaffected. For DataAgent / Coding Agent.

Sub-Agent Orchestration

Four declaration styles (from lowest to highest flexibility):

Built-in general-purpose agent (mirrors main agent config)
Workspace-file-driven (Markdown in workspace/subagents/)
Code declaration via builder.subagent(spec)
Custom factory for full construction control

Synchronous or asynchronous invocation; anti-infinite-recursion built in.

Summary

AgentScope Java 1.1 converges the capabilities most wanted from Harness Engineering — yet hardest to assemble on your own — into HarnessAgent + workspace conventions + pluggable filesystem + Hook pipeline:

In personal scenarios: an enhanced ReAct Agent with memory, compaction, and sub-tasks.
In enterprise scenarios: infrastructure that turns isolation, multi-tenancy, distributed memory, and sub-agent orchestration into configuration items.

If you are evaluating how to evolve from a personal assistant prototype to a deployable enterprise agent, start by running through the quick start in the Harness overview, choose a declarative filesystem mode, then enable compaction, sandbox, and sub-agents as needed — every step has documentation and runnable examples.

References:

Original article by Liu Jun, published on the Alibaba Cloud Developer WeChat account.

The First Java Harness Framework Is Here | AgentScope Brings OpenClaw to Enterprise Distributed Scenarios

Introduction

OpenClaw / Hermes Are Great — But Unusable in Enterprise Agent Scenarios?

1. Multi-user, multi-replica — what happens to the workspace?

2. Tools and Skill Scripts can’t run on the host machine — how do you isolate execution?

3. How do you move the “workspace + filesystem” combination to a distributed environment?

4. How should Multi-Agent be done correctly?

5. Is there an out-of-the-box implementation of context compression and layered memory?

AgentScope Harness Design Philosophy: Why Can It Solve These Problems?

Core Pillar 1: Workspace as the Single Source of Truth

Core Pillar 2: AbstractFilesystem — Making the Workspace Run Anywhere

Typical AgentScope Harness Use Cases

Personal Agent — Exemplified by OpenClaw-Type Applications

Enterprise Data Service — Exemplified by DataAgent

Enterprise Online Service — Exemplified by Taobao/Tmall Transaction Agent

AgentScope Harness Deep Dive

Quick Start

Core Concepts

Key Features

Workspace: The Agent’s Single Source of Truth

Memory Management: Two-Layer Approach

Filesystem: Three Modes

Sub-Agent Orchestration

Summary

Join Newsletter

Written by Cui Follow