logo
  • 환경
  • 엔터프라이즈
  • 요금제
Blogs
튜토리얼|Nov 8, 2025

Run Enterprise Agents with Eigent & Gemini 3 Pro

Eigent’s Real-World Enterprise Browser Automation with Gemini 3 Pro

EigentEigent
Share to
Run Enterprise Agents with Eigent & Gemini 3 Pro
  • Abstract
  • Background: What is Eigent and How it supports Gemini 3 Pro
  • Github Repository & how to setup Eigent
  • Clone the repository
  • Install frontend dependencies
  • Return to root and run dev mode
  • Under the Hood: Eigent full stack and CAMEL Workforce Architecture
  • Browser Automation Architecture in Eigent
  • Test Gemini 3 Pro in **Real-World Enterprise Tasks** with Eigent browser automation
  • How Gemini 3 Pro Improves Task Performance
  • Conclusion & Next Steps
Automate Everything with
AI Workforce on Desktop
Download Eigent

Abstract

In real enterprise environments, many internal tools, dashboards, and legacy systems operate entirely in the browser, forming the backbone of daily business operations.To automate these complex systems, we introduce Eigent, an open-source multi-agent workforce application that runs locally and can be fully set up from source, with a strong focus on browser automation — essentially serving as your Eigent open source cowork for enterprise workflows.

In this post, we’ll explore how Eigent leverages CAMEL’s Workforce architecture and browser automation to handle complex, multi-step enterprise tasks. We’ll also take a closer look at Gemini 3 Pro, analyzing its performance on three real-world enterprise tasks and examining the architectural features that enable it to perform effectively in long-horizon, agentic browser automation scenarios.

Background: What is Eigent and How it supports Gemini 3 Pro

Eigent is an open-source, multi-agent workforce product that runs on your desktop. It is built with a multi-agent workforce architecture, supported by general abilities such as browser automation, terminal automation and MCPs. This design enables agents in Eigent to perform tasks much like human workers — operating in real desktop environments, without the need for deep API integrations or constant workflow reconfiguration.

As foundation models continue to advance, integrating them with Eigent’s open-source multi-agent system allows developers and enterprise users to apply LLM capabilities directly to real-world use cases quickly and effectively. That’s why Eigent integrated Gemini 3 Pro immediately after the release.

To get started in Cloud Mode, simply select Gemini 3 Pro from the top dropdown. Alternatively, if you prefer to Bring Your Own Key, navigate to the Model Settings page in Eigent, locate the Gemini section, and input your API key. Once the model name is set to Gemini 3 Pro, you are ready to begin. need help? Check out our guide on [configuring your Google Gemini API key].

For a step-by-step walkthrough, check out the video tutorial below.

Your browser does not support the video tag.

Github Repository & how to setup Eigent

GitHub Repository: https://github.com/eigent-ai/eigent

Quick Start: Setting Up the Environment

You have two ways to run Eigent: using the pre-compiled desktop app for immediate usage, or setting up the development environment to inspect the code and customize the agents.

Option A: The "Zero-Config" Desktop App

For users who want to start automating tasks immediately without touching code:

  1. Download the client from the Official Website.
  2. Install the .dmg (macOS) or .exe (Windows).
  3. Launch the app—the local backend starts automatically.

Option B: Developer Setup

To access the source code and run the system locally for development, follow these steps:

1. Prerequisites Ensure you have Node.js (v18-22) and Python installed.

2. Clone and Install

# Clone the repository
git clone https://github.com/eigent-ai/eigent.git
cd eigent

# Install frontend dependencies
npm install

3. Run the Application

# Return to root and run dev mode
npm run dev

Once running, you can configure your LLM providers (Gemini 3 Pro, etc.) directly in the settings. For more detailed information on configuration, advanced features, and troubleshooting, please refer to our Official Documentation.

Under the Hood: Eigent full stack and CAMEL Workforce Architecture

Eigent System Overview

Eigent constitutes a local-first desktop application with multi-agent orchestration, powered by the CAMEL Workforce as its core engine. The system implements a decoupled, full-stack architecture that operates entirely on the user's local infrastructure. This design strictly ensures data sovereignty, eliminating the privacy risks associated with cloud-resident agent execution.

1. The Frontend

The user interface serves as the control plane for agent configuration and workflow monitoring. Built on React and TypeScript within an Electron framework.

Key technical components include:

  • State Management: Zustand is employed for handling transient application state, ensuring efficient reactivity.
  • Visual Orchestration: React Flow is integrated to visualize agent workspace to track real-time agent execution.
  • Communication: The frontend communicates with the backend via secure local HTTP requests.

2. The Backend

The core logic resides in a local Python server utilizing FastAPI and Uvicorn, which acts as the host environment for the CAMEL multi-agent framework.

  • Runtime Environment: The backend runs on Python 3.10+, managed by uv for high-performance dependency resolution and environment isolation.
  • Persistence Layer: PostgreSQL, interfaced via SQLModel/SQLAlchemy ORM, provides robust structured data storage for audit logs, workflow history, and agent states.
  • Multi-agent Framework: The CAMEL framework handles agent orchestration logic (e.g., workforce), interfacing with Large Language Models (LLMs) whether remote (e.g., Gemini) or local (e.g.,via vLLM) for agent running. The CAMEL framework also offers a rich set of toolkits such as browser toolkit, terminal toolkit, document generation toolkit.

CAMEL Workforce: A Multi-Agent System Inspired by Organizational Structures

At the heart of Eigent lies CAMEL Workforce, a multi-agent system architected to resolve complex, real-world tasks through decentralized cooperation. The system utilizes a strict Producer-Consumer pattern, mediated by an asynchronous message channel to manage dependency graphs efficiently.

1. Agent Roles

  • Coordinator Agent: Functions as the primary dispatcher. It maintains the global state and allocates subtasks to specific workers based on availability and capability.
  • Task Agent: Taking responsibility for the semantic decomposition of high-level objectives into executable, atomic units.
  • Worker Agent: Serves as the specialized execution unit. Worker agents consume atomic subtasks and execute them using domain-specific tools.

2. Asynchronous Communication: The TaskChannel

Decoupling between the coordination layer and the execution layer is achieved via the TaskChannel. This asynchronous message queue manages task distribution without blocking the main execution thread.

Execution Flow:

  1. Workforce initiates a task.
  2. Worker nodes poll for assignments.
  3. Upon completion, results are pushed back.

3. Dynamic DAG Construction

Enterprise workflows are rarely linear. CAMEL Workforce implements a dynamic Directed Acyclic Graph (DAG) construction mechanism. When a high-level prompt is received (e.g., "Create Travel Plan"), the Task Agent decomposes this objective into discrete nodes.

The system explicitly maps dependencies, allowing the scheduler to:

  • Execute independent nodes in parallel (e.g., Search Flight Ticket and Search Hotel run concurrently).
  • Block dependent nodes until their predecessors reach a DONE state.

4. Fault-tolerant Mechanism

Given the non-deterministic nature of LLMs, Eigent treats failures as expected state transitions rather than fatal exceptions. The architecture implements a robust recovery mechanism utilizing the following strategies:

  • RETRY: Re-executes the sub-task on the same worker to handle transient errors.
  • REPLAN: The Task Agent modifies the original sub-task based on the failure log before re-queueing the sub-task.
  • REASSIGN: The sub-task is migrated from the current worker to a different agent with a compatible skill set.
  • DECOMPOSE: If a task fails due to excessive complexity, it is recursively broken down into smaller subtasks.

CAMEL Workforce.png

Browser Automation Architecture in Eigent

Yet, a multi-agent workforce architecture can only unlock real enterprise automation when paired with the growing strength of general-purpose capabilities such as browser automation. This is why we emphasize building agents that can operate directly within real business environments rather than relying solely on rigid API integrations.

Eigent adopts a two-layer architecture that separates browser control from agent orchestration:

  • The TypeScript layer is responsible for all browser interactions. It leverages native Playwright APIs to perform DOM operations, capture structured snapshots, generate SoM screenshots, detect occlusions, and handle advanced browser logic directly within the JavaScript runtime. As Playwright is natively built in TypeScript, this layer gains access to cutting-edge features like _snapshotForAI() and ensures better performance, reliability, and developer ergonomics.
  • The Python layer handles AI orchestration. It manages LLM calls, agent decision-making, and task planning. This separation allows Python to focus on agent logic, where the Python ecosystem excels in AI and workflow orchestration.
  • The two layers communicate asynchronously via WebSocket, enabling non-blocking operations. Python sends browser operation requests, TypeScript executes them and returns results. The interaction is transparent to the end user and supports concurrent task execution.

This architecture improves performance, enhances the precision of element interactions, and enables advanced capabilities like dynamic DOM filtering, viewport-aware snapshots, and in-browser SoM rendering. It avoids the limitations of Python-only implementations, such as high latency, limited access to browser internals, and complex image processing logic. By delegating browser tasks to the native execution context, Eigent ensures a robust foundation for agent-based enterprise automation.

During multi-agent execution in enterprise automation scenarios, browser-based automation offers a natural advantage in process visibility. Every step is transparent, inspectable, and easy to debug, making it far more practical for complex and evolving workflows.

CAMEL Browser.png

Test Gemini 3 Pro in Real-World Enterprise Tasks with Eigent browser automation

We have tested Eigent with Gemini 3 Pro to automate sales processes using Eigent browser automation capabilities. The tasks for agents are to automate various stages of the real-world sales cycle, including Lead Capture & Creation, Qualification & Pipeline Management, Quotation, Negotiation, Closing, and Product Management.

Across experimental runs, Gemini 3 Pro consistently shows three key strengths:

  1. Handles complex page structures well, including iframes and nested elements: It can reliably find the right content and buttons, even in complex layouts.
  2. Checks its own actions to stay accurate and short steps: It uses a feedback loop to correct mistakes and make sure the task is really done right.
  3. Uses tools efficiently and flexibly: It avoids unnecessary steps and knows how to combine tools smartly when needed.

Sample task 1:

Identify all B2B companies in the Y Combinator Winter and Summer 2025 batch whose industry focus is related to Marketing. After you obtain the full company list, independently investigate each company’s product information in detail and consolidate all findings into a clean, well-structured CSV file.

This task demonstrates the agent's ability to handle iterative navigation and dynamic data extraction. Unlike simple single-page scraping, this workflow requires the agent to first interact with the Y Combinator directory to apply specific filters (Batch, Industry, B2B tag), and then execute a "List-to-Detail" pattern.

The challenge here is maintaining context: the agent must dive into individual company profiles to extract specific product details, then return to the main list without losing its place or duplicating entries. Gemini 3 Pro successfully orchestrates this loop, parsing diverse landing page layouts and normalizing the unstructured information into a clean CSV format without manual intervention.

Your browser does not support the video tag.

Sample task 2:

The salesforce.com - 200 Widgets deal is progressing well. Move it from 'Needs Analysis' to 'Proposal' stage, and click “Mark as Current Stage’ and go click "Contact Roles" and give me the contact name and Phone number. Back to Opportunities page edit this Next Step as “book a meeting with + the contact name and phone number.”

This browser automation task presents a challenge for standard models. First, it must locate the relevant Opportunity on the Salesforce homepage. Second, it needs to update the Opportunity's stage, navigate to specific page and retrieve contact information, and finally, modify the 'Next Step' field within that information.

Therefore, for this task, it is required to demonstrate stable long-horizon task performance, a deep understanding of complex tasks, logical task planning, and the capacity to execute stable, cross-page operations within the Salesforce environment.

Your browser does not support the video tag.

Furthermore, through a quantitative breakdown (e.g., mapping each step's action to page regions, tracking failure/retry counts), we can align the browser actions reference with the elements in the snapshots item by item. This allows for a deeper analysis of Gemini 3 Pro's performance regarding the accuracy and effectiveness of tool-call and browser actions:

RunTotal Browser ActionsOther Actions UsedSequence CharacteristicsImplication
1st run23NoneCompact sequence (open → type → repeated clicks → snapshot → click → visit_page → click …).No repeated clicking/typing on the same control. One-way progression with low redundancy.
2nd run18note / screenshotsMultiple click/snapshot steps after open to reach the target area; later introduces append_note/create_note/browser_get_page_snapshot for state logging and confirmation.Fewer browser actions; Auxiliary tools are used as external memory and for validation.
3rd run15NoneStarts with multiple open/visit_page, then continuous clicks and one final type.No auxiliary tools; no retries/rollback. Most streamlined action chain.

Based on the three running results, we can see that Gemini 3 Pro demonstrates high robustness and auditability in long-horizon browser task scenarios.

  • Execution Flow: It shows reliable planning of the execution path from the parsed task objective and environment state.
  • Stability & Robustness: The current task browser page contained up to 13 layers of nesting, yet Gemini 3 Pro maintained low retries, no infinite loops during task execution.
  • Efficiency: The three sets of logs showed almost no redundant tool calls, and never displayed multiple clicks or repeated inputs. This efficiency, combined with flexible auxiliary tools (Note/Screenshot), resulted in fewer and more stable browser actions.

Sample task 3:

“ I’m preparing for my monthly sales review. Please go into my forecast, find the opportunity under the Global Media account that’s in the Commit stage, and update its Close Date to November 26th. ”

What if we increase the complexity of the webpage in the task?

The following task is set on a Salesforce Forecast page. The Forecast page is used by sales teams for statistics and overview; its browser page is extremely complex, with approximately 4,763 elements in one snapshot. After multi-level decoding, the snapshot contains 1,222 lines with a maximum nesting depth of 18 layers and an average depth of approximately 14.33. This means the page does not just have a high element count, but also features a deeply nested hierarchy: multi-layer structures such as List → List Item → Link/Button → Icon → Paragraph/Grid Row.

Salesforce Forecasts Webpage

Salesforce Forecasts Webpage

In our tests, Gemini 3 executed this task flawlessly all three times. It needs the ability to correctly update fields within such a dense, highly-nested page while still successfully locking onto the target opportunity (Global Media 180 Widgets) and completing the Close Date update, serves as proof of Gemini 3 Pro's robust parsing, path planning, and stable execution capabilities in browser-use scenarios.

How Gemini 3 Pro Improves Task Performance

Gemini 3 Pro stands out as a well-balanced choice for autonomous enterprise agents. In our real-world tasks, it consistently handles long-horizon, browser-based workflows with a high degree of reliability. Combined with its favorable cost-performance ratio, it presents a practical option for scaling agent-based automation in enterprise environments.

The "State Continuity" Advantage (Thought Signatures) The primary technical differentiator we observed is Gemini 3 Pro's implementation of Thought Signatures. In traditional LLM interactions, the model relies entirely on the conversation history text to reconstruct its context between turns. In complex, long-horizon workflows, this can occasionally lead to "context drift," where an agent loses track of the original intent after multiple browser interactions. Gemini 3 Pro addresses this by returning a thoughtSignature, which is an encrypted representation of its internal reasoning state, after each step.

Impact on Eigent

When our agents execute sequential tasks (e.g., "Check flight status" followed by "Book Taxi"), this signature is passed back to the model. In our tests, this mechanism helped the agent maintain better logical continuity during multi-step function calls, reducing the rate of logic errors in later stages of the workflow compared to peer models.

Robustness in Long-Horizon Planning

Enterprise automation often requires navigating through uncertainty—handling login screens, loading states, or unexpected pop-ups.

Gemini 3 Pro showed high resilience in longer sessions (10+ steps). This aligns with its performance on benchmarks like Vending-Bench 2, which evaluates long-horizon planning.

For standard inquiries, the performance gap between top-tier models might be negligible. However, in Agentic Automation where state maintenance and error recovery are paramount, Gemini 3 Pro currently offers reliable foundation for the Eigent platform. Its ability to preserve "reasoning state" via Thought Signatures makes it the pragmatic choice for complex, multi-step enterprise workflows.

Conclusion & Next Steps

Through this blog, we explored how Eigent, powered by CAMEL’s multi-agent workforce architecture and browser-level capabilities, creates a production-grade environment for deploying AI agents that can actually operate inside enterprise systems. By combining tool-level autonomy with user-overridable workflows, the system remains controllable, observable, and auditable which are the critical properties for any B2B-facing deployment.

We also demonstrated how Gemini 3 Pro, when integrated into Eigent, offers an ideal balance of reasoning ability, stability, and cost effectiveness. Its architectural alignment with multi agent execution, especially through features such as Thought Signatures, makes it particularly well suited for high stakes, long horizon workflows typical of enterprise use cases.

Looking forward, we’re continuing to:

  • Surface failure cases in real-world enterprise deployments, identifying task patterns where current foundation models struggle with state tracking, error recovery, or tool grounding.
  • Establish a standardized enterprise browser automation benchmark, collecting enterprise automation tasks realistic scenarios including email clients, messaging systems, documents, browser UIs, and ERP/CRM platforms.
  • Build a reinforcement learning environment for browser-based enterprise workflows, enabling reinforcement learning for agents through task-based rewards, trajectory tracking, and long-horizon behavior analysis.

Eigent is fully open-source, and we invite developers, researchers, and enterprise teams to explore, extend, and contribute:

👉 GitHub: https://github.com/eigent-ai/eigent

👉 Join our Discord community: https://discord.camel-ai.org

Recent Posts

Best Legal AI Agents in 2026: Top Platforms Compared (+ a Free Alternative)
산업Jun 19, 2026

Best Legal AI Agents in 2026: Top Platforms Compared (+ a Free Alternative)

The best legal AI agents in 2026 compared: Harvey, CoCounsel, Lexis+ Protégé, Kira, and Spellbook — plus Eigent, the free, open-source legal AI you can self-host.

Douglas LaiDouglas Lai
CoCounsel Alternative (Free & Open Source): Why Teams Choose Eigent
산업Jun 19, 2026

CoCounsel Alternative (Free & Open Source): Why Teams Choose Eigent

Looking for a free CoCounsel alternative? Compare CoCounsel Legal with Eigent, the open-source legal AI platform you can self-host, plus a full contract workflow.

Douglas LaiDouglas Lai
Eudia Alternative (Free & Open Source): Why Teams Choose Eigent
산업Jun 19, 2026

Eudia Alternative (Free & Open Source): Why Teams Choose Eigent

Looking for a free Eudia alternative? Compare Eudia's augmented intelligence platform with Eigent, the open-source legal AI you can self-host, plus a full workflow.

Douglas LaiDouglas Lai
Automate everything with AI workforce on desktop
Download Eigent

오늘 Eigent를 사용해보세요

오픈 소스 데스크톱 앱을 다운로드하세요. 여러분의 AI 워크포스가 여러분의 기기에서 실행됩니다.

Eigent 다운로드
Eigent

AI 워크포스 자동화에 대한 최신 소식, 튜토리얼, 출시 정보를 받아보세요.

제품Eigent환경요금엔터프라이즈
둘러보기솔루션활용 사례스킬플러그인블로그
개발자문서GitHubCAMEL-AI오픈소스 펀드파트너
다운로드오픈 소스용
회사회사 소개브랜드채용이용약관개인정보처리방침보안 및 신뢰쿠키 정책환불 및 체험 정책

모든 권리 보유 © 2026 EIGENT UK LTD

Eigent 1.0 새 버전 출시!download