Silicon Agents · Product Docs

Platform pattern

What the product already proves

Raw artifacts can be ingested, parsed, reasoned over, and converted into reviewable actions.
Run history, parser trust, provenance, and exports make the workflow auditable instead of purely conversational.
Enterprise policy and run profiles let the same system adapt across chip programs and review styles.

What the product is not yet

It is not a simulator, compiler, testcase generator, or RTL modification engine.
It is not yet embedded into client EDA stacks through production-grade APIs.
It is not full RBAC or multi-tenant SaaS. That would be later-stage platform work.

The current MVP is best pitched as a verification-first and yield-adjacent workflow wedge with scalable platform architecture, not as a finished end-state product. That is the honest and strongest positioning for pilot and Seeder conversations.

Primary users and use cases

DV engineer

Uses Agent 01 to turn coverage logs and regression outputs into ranked closure actions, evidence, and focused review paths.

Yield / product engineer

Uses Agent 02 to detect mis-bins, leakage anomalies, SPC shifts, and lot-level review priorities without manual spreadsheet triage.

Principal / lead engineer

Defines enterprise policy, review governance, evidence expectations, and escalation rules that shape how the agents respond.

High-level design

Input artifact
  -> Parser layer
  -> Retrieval / RAG layer
  -> Orchestration layer
  -> LLM analysis layer
  -> Decision generation
  -> Human review loop
  -> Run history + exports + pilot metrics

Supporting layers:
  - Enterprise policy store
  - Run profile presets
  - Retrieval documents + embeddings
  - Benchmark / live scorecard engine
  - Pilot access gate
  - Structured export adapters

Low-level workflow

Verification path

User uploads or pastes a coverage report or regression log.
Artifact provenance is captured as bundled sample, uploaded file, or pasted text.
Parser extracts structured items and computes parser confidence and warnings.
RAG retrieves prior closure notes, waiver context, regression decisions, and similar run history.
Orchestration layer merges enterprise policy, active run profile, artifact context, and retrieved evidence.
LLM produces five-step reasoning, ranked actions, and cited sources.
Run history persists raw artifact, retrieved evidence, orchestration, analysis trace, decisions, and scorecard.

Yield path

User provides ATE parametric CSV or SPC trend input.
Parser extracts chips, bins, pass/fail state, and anomaly signals.
RAG retrieves prior lot reviews, yield notes, SPC playbooks, and similar run history.
Orchestration emphasizes revenue risk, containment, operational review style, and retrieved evidence.
LLM generates lot-level review steps, ranked mis-bin or anomaly actions, and cited sources.
Human reviewer accepts or rejects actions and exports results into workflow targets.

Visual flow chart: after upload and Analyse

The runtime flow below shows what the product actually does after a user uploads an artifact and clicks Analyse. The same pattern applies to both Agent 01 and Agent 02, with different parser and prompt specializations per workflow.

Frontend role

Read artifacts, assemble payloads, render the streamed review experience, and hold deployment settings for split frontend/backend pilots.

Backend role

Parse artifacts, retrieve prior evidence, orchestrate context, call the LLM, persist the run, and expose RAG, exports, pilot metrics, and replay APIs.

Reviewer role

Interpret ranked actions, approve or reject them, and route accepted outputs into engineering workflows through history and export adapters.

System layers

1. Presentation layer

HTML dashboards for landing, agents, configuration, RAG console, history, pitch, pilot dashboard, and docs.

2. API layer

FastAPI routes for verification, yield, RAG search/ingest/reindex, feedback, config, benchmark, history, exports, pilot metrics, and access-code generation.

3. Parser layer

Coverage, regression, ATE, and SPC parsers normalize artifacts into structured summaries and items.

4. Retrieval / RAG layer

Run history and manual notes become metadata-filtered retrieval documents with Gemini-ready embeddings, evidence excerpts, and cited sources.

5. Orchestration layer

Builds run-specific prompt plans using enterprise policy, run profile, artifact metadata, workflow mode, and retrieved evidence.

6. LLM reasoning layer

Primary/fallback provider path for multi-step analysis, action ranking, and source-aware recommendations.

7. Persistence layer

SQLite stores enterprise config, decisions, feedback, run history, retrieval documents, raw artifacts, parser trust, and export history, with PostgreSQL/pgvector-ready columns.

Orchestration model

Orchestration is the policy and context synthesis step before the main LLM analysis call. It is important because raw artifacts alone are not enough to produce enterprise-appropriate actions. Different teams care about different failure modes, evidence styles, escalation policies, and workflow outputs.

What the orchestrator consumes

Enterprise policy for the selected agent
Active run profile
Artifact type and parser metadata
Chip focus and client workflow context
Optional historical reference notes

What the orchestrator emits

Chip focus label
Output emphasis
Instruction priorities
Reference priorities
Plan source/provider details

Agent 01: Verification Workflow Copilot

Primary workflows

Functional coverage closure
Regression triage and clustering
Protocol escape-risk prioritization
Evidence-backed review queue creation

Typical users

DV engineers preparing closure actions
Verification leads reviewing high-risk gaps
Program owners tracking first-pass review efficiency

Agent 01 is the current lead wedge for pilot and sponsor demos because verification consumes the largest schedule pressure in many fabless organizations.

Agent 02: Yield Intelligence Copilot

Primary workflows

ATE anomaly review
Mis-bin detection and premium-bin recovery suggestions
Leakage and performance containment prioritization
SPC trend interpretation

Typical users

Product engineering
Yield engineering
Lot review boards
Operations teams escalating excursion risk

Decision review loop

1. Generate

The agent creates ranked actions with evidence, rationale, priority, confidence, and effort.

2. Review

Users inspect findings in the decision queue and the analysis trace separately so reasoning does not crowd action review.

3. Accept or reject

Reviewer feedback is captured per decision and linked to the persisted run record for trust and evaluation.

Run history and traceability

Every run is stored with raw artifact replay, parser confidence, orchestration details, generated decisions, feedback, and export history. This is the foundation for pilot evidence, sponsor reporting, and future workflow integration.

Why it matters

Reproducibility for engineering review
Trust and auditability for AI recommendations
Pilot metrics for Seeder or client reporting
Workflow export trace for Jira, email, and future Confluence adapters

What is stored

Artifact identity and source
Raw artifact text
Parser format, confidence, and warnings
Orchestration plan
Analysis trace and ranked decisions
Feedback summary and export history

Enterprise configuration

Enterprise configuration is not intended to be tuned on every run. It is a durable governance layer that should be set by high-level engineers, review owners, L3/L4 specialists, or delivery leaders. It defines how the agent should speak, escalate, and justify outputs within a given organization.

Run profiles

Run profiles are the day-to-day operational presets used by engineers. They are different from enterprise policy. A run profile chooses the active workflow style for a specific chip program or review type, while enterprise policy sets the organization-wide guardrails.

Run profile examples

USB Controller DV Team · Coverage
USB Regression Triage Team
Mobile SoC Yield Team · ATE
PMIC SPC Operations

Why this separation matters

Principal engineers set policy once
Everyday reviewers pick workflow presets without changing governance
Clients can scale across many teams without duplicating policy input

Why enterprise policy matters

Without enterprise policy, AI recommendations may still be technically useful, but they will not be consistent with the organization’s review culture, escalation cadence, evidence requirements, or decision format. Policy is what turns a generic assistant into an enterprise workflow copilot.

Verification request fields

Field	Purpose	Typical use case
`report_text`	Raw coverage report or regression log input.	Paste or upload a VCS/Xcelium artifact for Agent 01 analysis.
`format`	Parser hint for `auto`, `vcs`, or `xcelium`.	Force a known format when the client artifact style is predictable.
`mode`	Selects coverage closure or triage behavior.	Use `coverage` for uncovered bins, `triage` for regression clustering.
`design_name`	Human-readable runtime label shown in analysis and history.	“USB Controller v2.3” or “NoC QoS Block”.
`project_id`	Project/workspace grouping key for history and feedback.	Used to aggregate runs across the same client stream.
`context`	Extra operational framing for the run.	“Protocol escape review before tape-in signoff.”
`artifact_name`	Stored artifact identifier.	Needed for benchmark matching and run replay clarity.
`artifact_source`	Tells whether input came from a bundled sample, upload, or pasted text.	Important for pilot provenance and parser-risk discussions.
`run_profile_id` / `run_profile_name`	Operational preset identity.	Distinguishes USB DV coverage from LPDDR refresh or secure boot triage.
`chip_type`	Chip or IP family focus.	Guides orchestration emphasis and terminology.
`client_profile`	Human-readable team or client workflow label.	“Fabless verification team” or “InSemi DV pod”.
`custom_instructions`	Run-specific override instructions.	Tell the agent to prefer speed, risk containment, or formal review style for one run.
`reference_data`	Historical notes or prior artifact context.	Feed closure notes, repeated gaps, or known scenario limitations.
`reference_data_label`	Reference-data identity label.	“Historical closure notes” or “Prior regression cluster review”.

Yield request fields

Field	Purpose	Typical use case
`csv_data`	Raw ATE parametric or SPC-like input.	Paste or upload lot-level measurement data into Agent 02.
`lot_id`	Runtime label for the lot or SPC population.	“LOT_004” or an excursion review slice.
`mode`	Selects ATE or SPC reasoning path.	Use `ate` for per-die parametrics and binning, `spc` for trend monitoring.
`project_id`	Project/workstream grouping key.	Group all product-yield runs under a pilot stream or product line.
`context`	Business or engineering framing for the lot review.	“Revenue containment review before customer shipment.”
`artifact_name`	Source identity for persistence and scorecard matching.	“ate_parametric_sample.csv” or a real client export name.
`artifact_source`	Sample/upload/paste provenance.	Used for pilot trust reporting and debugging ingest behavior.
`run_profile_id` / `run_profile_name`	Operational preset identity.	Mobile SoC yield review vs PMIC SPC operations.
`chip_type`	Product category or chip family.	Guides yield reasoning priorities and terminology.
`client_profile`	Team/workflow label.	“Product and yield engineering” or “Lot escalation committee”.
`custom_instructions`	Run-specific focus.	Ask the agent to prioritize revenue recovery or reliability containment.
`reference_data`	Historical lot notes or threshold context.	Earlier lot excursions, premium-bin limits, leakage correlations.
`reference_data_label`	Identity for historical input.	“Historical lot review notes” or “Premium-bin thresholds”.

Run history fields

Field	Meaning	Why it matters
`run_id`	Unique persisted run identifier.	Links history, exports, and feedback.
`status`	Completed or failed.	Needed for pilot reliability reporting.
`provider` / `model`	LLM provider and resolved model.	Supports pilot observability and provider comparisons.
`raw_artifact`	Stored artifact body.	Makes runs replayable and auditable.
`parser_format`	Detected parser family.	Useful when debugging format mismatches.
`parser_confidence`	Confidence score from parser layer.	Prevents silent trust in weak parses.
`parser_warnings`	Warnings about uncertain or partial parse quality.	Helps pilots surface reliability risk honestly.
`orchestration`	Run-specific plan context.	Shows how enterprise policy and run profile shaped the result.
`analysis_log`	Five-step analysis trace.	Human reviewers can audit reasoning separately from actions.
`decisions`	Structured recommended actions.	Core user-facing value output.
`feedback_summary`	Aggregate accept/reject outcome.	Critical pilot trust signal.
`export_history`	Workflow-export event trail.	Shows the product is fitting into real workflows.
`benchmark_title` / `benchmark_score`	Benchmark or live scorecard result.	Measures quality for sponsor walkthroughs and internal reviews.

Enterprise config fields

Agent 01 policy fields

Field	Use
`org_name`	Names the owning business or engineering unit for review framing.
`review_board`	Specifies the body receiving or judging recommendations.
`output_style`	Controls how the agent formats recommendations conceptually.
`escalation_policy`	Defines what should be escalated before optimization tasks.
`evidence_policy`	Forces evidence-first reasoning standards.
`instruction_addendum`	Stores additional durable org guidance.

Agent 02 policy fields

Field	Use
`org_name`	Names the yield/product organization using the agent.
`review_board`	Specifies escalation committee or yield board context.
`output_style`	Guides tone toward revenue-aware and operational lot review.
`escalation_policy`	Prioritizes what must be raised immediately.
`risk_policy`	Explains how the agent should balance revenue and containment concerns.
`instruction_addendum`	Captures durable local review expectations.

Pilot operations

How to run a pilot safely

Enable PILOT_ACCESS_TOKEN before sharing a URL.
Use sanitized artifacts first.
Track parser confidence and warnings during early client ingestion.
Push reviewers to accept or reject findings so the feedback loop becomes measurable.

What to show a sponsor

Pilot dashboard with total runs, acceptance rate, exports, and parser confidence
Run History showing raw artifact replay and decision audit trail
Agent page with artifact summary, scorecard, orchestration, and human review loop

Parser confidence and parser trust

Parser confidence exists because a wrong structured interpretation can look deceptively polished once passed through an LLM. Silicon Agents treats parser trust as a first-class signal and stores it alongside the run.

If parser confidence is weak or warnings appear, the right product behavior is not “silently continue.” The right behavior is “continue visibly, with uncertainty exposed.”

Exports and workflow fit

HTML brief

Executive-readable artifact for sponsor reviews, pilot summaries, and email circulation.

Jira export

Structured payload for issue-creation or review-queue integration in engineering workflows.

Email export

Formatted payload that helps teams circulate review actions without opening the UI.

Current API surface

Endpoint	Purpose	Why it matters
`POST /api/v1/verify`	Run Agent 01 analysis as a streaming workflow.	Core verification workflow entrypoint for future enterprise integration.
`POST /api/v1/yield`	Run Agent 02 analysis as a streaming workflow.	Core yield workflow entrypoint.
`GET /api/v1/runs`	List saved runs.	Supports history, audit, and pilot dashboards.
`GET /api/v1/runs/{run_id}`	Retrieve one saved run in detail.	Enables replay, review, and export orchestration.
`POST /api/v1/rag/search`	Search retrieval documents using metadata filters and embedding-aware ranking.	Supports auditable retrieval over saved runs and ingested notes.
`POST /api/v1/rag/ingest-note`	Chunk and embed a sanitized engineering note.	Adds pilot knowledge without broad document-ingestion risk.
`POST /api/v1/rag/reindex`	Re-embed existing retrieval documents.	Lets teams switch embedding providers or refresh vectors without re-ingesting source content.
`POST /api/v1/feedback`	Persist accept/reject review signals.	Critical for human-in-the-loop trust measurement.
`GET /api/v1/runs/{run_id}/export/jira`	Generate Jira-ready payload for a saved run.	Proves workflow fit beyond the UI.
`GET /api/v1/runs/{run_id}/export/email`	Generate email-ready payload for a saved run.	Lets teams circulate findings immediately.
`GET /api/v1/pilot/metrics`	Aggregate pilot evidence across all saved runs.	Provides sponsor-grade usage and trust signals.
`POST /api/v1/pilot/access-code/generate`	Generate a suggested pilot access token value.	Supports controlled sharing without building full auth.
`GET /api/v1/config/agent01` / `agent02`	Read server-backed enterprise policy for each agent.	Separates governance from day-to-day run selection.

RAG status: retrieval is active, hardening is next

Silicon Agents now has an initial retrieval-augmented pipeline for Agent 01 and Agent 02. It converts saved run history and manually ingested engineering notes into retrieval documents, embeds them, applies project/profile/workflow filters, and injects retrieved evidence into the verification and yield analysis paths with source metadata, visible citations, evidence excerpts, and run-history replay.

Implemented now

Run history converted into retrieval documents
Manual note ingestion with deterministic chunking
Gemini embedding provider with local fallback
Metadata-filtered search across project, agent, mode, profile, and source type
Agent 01 and Agent 02 retrieved-source metadata on decisions
Retrieved-source citations on decision cards
Retrieved-evidence panels with excerpts in agents and run history
RAG console for browser search, manual note ingestion, and reindexing

Still to harden

Production pgvector deployment and index tuning
Full document upload formats beyond text notes
Retrieval thresholds, reranking, and quality evals
Separation between retrieved fact and model inference

Agent 01 retrieval targets

Historical coverage closure notes, waiver rationales, regression debug notes, prior run-history decisions, and methodology guidance for the same IP block.

Agent 02 retrieval targets

Lot review archives, mis-bin investigation notes, SPC playbooks, product-engineering escalation guidance, and yield-containment patterns.

Current RAG tools

/rag provides browser search, note ingestion, and reindex controls over /api/v1/rag/search, /api/v1/rag/ingest-note, and /api/v1/rag/reindex.

Current implementation path: run history/manual notes -> retrieval documents -> Gemini embeddings -> metadata-filtered retrieval -> Agent 01 and Agent 02 cited decisions. Storage also includes a pgvector-ready column for PostgreSQL deployments.

Detailed design reference: RAG_ROADMAP.md in the repository root.

Next MVP phase: enterprise API integration for EDA workflows

The next phase is about making Silicon Agents a pilot-accessible integration layer rather than just a self-contained workflow console. That means exposing artifact ingestion APIs, workflow-trigger APIs, and result-return paths so enterprises can call Silicon Agents from verification dashboards, test operations, and review tooling.

Why this is next

The current MVP already proves the workflow value.
Pilot clients will next ask how this fits their toolchain, not just whether the dashboard looks good.
API exposure is the bridge from demo to embedded pilot usage.

What Phase 2 API work should enable

Artifact submission from external tools
Run status polling or callback integration
Structured results consumption
Workflow embedding into EDA-adjacent review systems

Silicon Agents documentation for pilot users, delivery teams, and sponsors.

Platform pattern

What the product already proves

What the product is not yet

Primary users and use cases

DV engineer

Yield / product engineer

Principal / lead engineer

High-level design

Low-level workflow

Verification path

Yield path

Visual flow chart: after upload and Analyse

Frontend role

Backend role

Reviewer role

System layers

1. Presentation layer

2. API layer

3. Parser layer

4. Retrieval / RAG layer

5. Orchestration layer

6. LLM reasoning layer

7. Persistence layer

Orchestration model

What the orchestrator consumes

What the orchestrator emits

Agent 01: Verification Workflow Copilot

Primary workflows

Typical users

Agent 02: Yield Intelligence Copilot

Primary workflows

Typical users

Decision review loop

1. Generate

2. Review

3. Accept or reject

Run history and traceability

Why it matters

What is stored

Enterprise configuration

Run profiles

Run profile examples

Why this separation matters

Why enterprise policy matters

Verification request fields

Yield request fields

Run history fields

Enterprise config fields

Agent 01 policy fields

Agent 02 policy fields

Pilot operations

How to run a pilot safely

What to show a sponsor

Parser confidence and parser trust

Exports and workflow fit

HTML brief

Jira export

Email export

Current API surface

RAG status: retrieval is active, hardening is next

Implemented now

Still to harden

Agent 01 retrieval targets

Agent 02 retrieval targets

Current RAG tools

Next MVP phase: enterprise API integration for EDA workflows

Why this is next

What Phase 2 API work should enable