Silicon Agents documentation for pilot users, delivery teams, and sponsors.
Silicon Agents is an AI workflow copilot for semiconductor engineering. It does not replace EDA tools. It sits above existing verification, test, and yield workflows and reduces the time engineers spend interpreting raw outputs, prioritizing findings, and deciding what to do next.
Coverage closure and regression triage with evidence-backed next actions.
ATE anomaly review, binning validation, and SPC-driven yield prioritization.
Recommendations are visible, ranked, evidence-cited, and explicitly accepted or rejected.
Workflow embedding into enterprise toolchains is the next scale-up phase after pilot readiness.
Platform pattern
What the product already proves
- Raw artifacts can be ingested, parsed, reasoned over, and converted into reviewable actions.
- Run history, parser trust, provenance, and exports make the workflow auditable instead of purely conversational.
- Enterprise policy and run profiles let the same system adapt across chip programs and review styles.
What the product is not yet
- It is not a simulator, compiler, testcase generator, or RTL modification engine.
- It is not yet embedded into client EDA stacks through production-grade APIs.
- It is not full RBAC or multi-tenant SaaS. That would be later-stage platform work.
Primary users and use cases
DV engineer
Uses Agent 01 to turn coverage logs and regression outputs into ranked closure actions, evidence, and focused review paths.
Yield / product engineer
Uses Agent 02 to detect mis-bins, leakage anomalies, SPC shifts, and lot-level review priorities without manual spreadsheet triage.
Principal / lead engineer
Defines enterprise policy, review governance, evidence expectations, and escalation rules that shape how the agents respond.
High-level design
Input artifact -> Parser layer -> Retrieval / RAG layer -> Orchestration layer -> LLM analysis layer -> Decision generation -> Human review loop -> Run history + exports + pilot metrics Supporting layers: - Enterprise policy store - Run profile presets - Retrieval documents + embeddings - Benchmark / live scorecard engine - Pilot access gate - Structured export adapters
Low-level workflow
Verification path
- User uploads or pastes a coverage report or regression log.
- Artifact provenance is captured as bundled sample, uploaded file, or pasted text.
- Parser extracts structured items and computes parser confidence and warnings.
- RAG retrieves prior closure notes, waiver context, regression decisions, and similar run history.
- Orchestration layer merges enterprise policy, active run profile, artifact context, and retrieved evidence.
- LLM produces five-step reasoning, ranked actions, and cited sources.
- Run history persists raw artifact, retrieved evidence, orchestration, analysis trace, decisions, and scorecard.
Yield path
- User provides ATE parametric CSV or SPC trend input.
- Parser extracts chips, bins, pass/fail state, and anomaly signals.
- RAG retrieves prior lot reviews, yield notes, SPC playbooks, and similar run history.
- Orchestration emphasizes revenue risk, containment, operational review style, and retrieved evidence.
- LLM generates lot-level review steps, ranked mis-bin or anomaly actions, and cited sources.
- Human reviewer accepts or rejects actions and exports results into workflow targets.
Visual flow chart: after upload and Analyse
The runtime flow below shows what the product actually does after a user uploads an artifact and clicks Analyse. The same pattern applies to both Agent 01 and Agent 02, with different parser and prompt specializations per workflow.
Frontend role
Read artifacts, assemble payloads, render the streamed review experience, and hold deployment settings for split frontend/backend pilots.
Backend role
Parse artifacts, retrieve prior evidence, orchestrate context, call the LLM, persist the run, and expose RAG, exports, pilot metrics, and replay APIs.
Reviewer role
Interpret ranked actions, approve or reject them, and route accepted outputs into engineering workflows through history and export adapters.
System layers
1. Presentation layer
HTML dashboards for landing, agents, configuration, RAG console, history, pitch, pilot dashboard, and docs.
2. API layer
FastAPI routes for verification, yield, RAG search/ingest/reindex, feedback, config, benchmark, history, exports, pilot metrics, and access-code generation.
3. Parser layer
Coverage, regression, ATE, and SPC parsers normalize artifacts into structured summaries and items.
4. Retrieval / RAG layer
Run history and manual notes become metadata-filtered retrieval documents with Gemini-ready embeddings, evidence excerpts, and cited sources.
5. Orchestration layer
Builds run-specific prompt plans using enterprise policy, run profile, artifact metadata, workflow mode, and retrieved evidence.
6. LLM reasoning layer
Primary/fallback provider path for multi-step analysis, action ranking, and source-aware recommendations.
7. Persistence layer
SQLite stores enterprise config, decisions, feedback, run history, retrieval documents, raw artifacts, parser trust, and export history, with PostgreSQL/pgvector-ready columns.
Orchestration model
Orchestration is the policy and context synthesis step before the main LLM analysis call. It is important because raw artifacts alone are not enough to produce enterprise-appropriate actions. Different teams care about different failure modes, evidence styles, escalation policies, and workflow outputs.
What the orchestrator consumes
- Enterprise policy for the selected agent
- Active run profile
- Artifact type and parser metadata
- Chip focus and client workflow context
- Optional historical reference notes
What the orchestrator emits
- Chip focus label
- Output emphasis
- Instruction priorities
- Reference priorities
- Plan source/provider details
Agent 01: Verification Workflow Copilot
Primary workflows
- Functional coverage closure
- Regression triage and clustering
- Protocol escape-risk prioritization
- Evidence-backed review queue creation
Typical users
- DV engineers preparing closure actions
- Verification leads reviewing high-risk gaps
- Program owners tracking first-pass review efficiency
Agent 02: Yield Intelligence Copilot
Primary workflows
- ATE anomaly review
- Mis-bin detection and premium-bin recovery suggestions
- Leakage and performance containment prioritization
- SPC trend interpretation
Typical users
- Product engineering
- Yield engineering
- Lot review boards
- Operations teams escalating excursion risk
Decision review loop
1. Generate
The agent creates ranked actions with evidence, rationale, priority, confidence, and effort.
2. Review
Users inspect findings in the decision queue and the analysis trace separately so reasoning does not crowd action review.
3. Accept or reject
Reviewer feedback is captured per decision and linked to the persisted run record for trust and evaluation.
Run history and traceability
Every run is stored with raw artifact replay, parser confidence, orchestration details, generated decisions, feedback, and export history. This is the foundation for pilot evidence, sponsor reporting, and future workflow integration.
Why it matters
- Reproducibility for engineering review
- Trust and auditability for AI recommendations
- Pilot metrics for Seeder or client reporting
- Workflow export trace for Jira, email, and future Confluence adapters
What is stored
- Artifact identity and source
- Raw artifact text
- Parser format, confidence, and warnings
- Orchestration plan
- Analysis trace and ranked decisions
- Feedback summary and export history
Enterprise configuration
Enterprise configuration is not intended to be tuned on every run. It is a durable governance layer that should be set by high-level engineers, review owners, L3/L4 specialists, or delivery leaders. It defines how the agent should speak, escalate, and justify outputs within a given organization.
Run profiles
Run profiles are the day-to-day operational presets used by engineers. They are different from enterprise policy. A run profile chooses the active workflow style for a specific chip program or review type, while enterprise policy sets the organization-wide guardrails.
Run profile examples
- USB Controller DV Team · Coverage
- USB Regression Triage Team
- Mobile SoC Yield Team · ATE
- PMIC SPC Operations
Why this separation matters
- Principal engineers set policy once
- Everyday reviewers pick workflow presets without changing governance
- Clients can scale across many teams without duplicating policy input
Why enterprise policy matters
Verification request fields
| Field | Purpose | Typical use case |
|---|---|---|
report_text | Raw coverage report or regression log input. | Paste or upload a VCS/Xcelium artifact for Agent 01 analysis. |
format | Parser hint for auto, vcs, or xcelium. | Force a known format when the client artifact style is predictable. |
mode | Selects coverage closure or triage behavior. | Use coverage for uncovered bins, triage for regression clustering. |
design_name | Human-readable runtime label shown in analysis and history. | “USB Controller v2.3” or “NoC QoS Block”. |
project_id | Project/workspace grouping key for history and feedback. | Used to aggregate runs across the same client stream. |
context | Extra operational framing for the run. | “Protocol escape review before tape-in signoff.” |
artifact_name | Stored artifact identifier. | Needed for benchmark matching and run replay clarity. |
artifact_source | Tells whether input came from a bundled sample, upload, or pasted text. | Important for pilot provenance and parser-risk discussions. |
run_profile_id / run_profile_name | Operational preset identity. | Distinguishes USB DV coverage from LPDDR refresh or secure boot triage. |
chip_type | Chip or IP family focus. | Guides orchestration emphasis and terminology. |
client_profile | Human-readable team or client workflow label. | “Fabless verification team” or “InSemi DV pod”. |
custom_instructions | Run-specific override instructions. | Tell the agent to prefer speed, risk containment, or formal review style for one run. |
reference_data | Historical notes or prior artifact context. | Feed closure notes, repeated gaps, or known scenario limitations. |
reference_data_label | Reference-data identity label. | “Historical closure notes” or “Prior regression cluster review”. |
Yield request fields
| Field | Purpose | Typical use case |
|---|---|---|
csv_data | Raw ATE parametric or SPC-like input. | Paste or upload lot-level measurement data into Agent 02. |
lot_id | Runtime label for the lot or SPC population. | “LOT_004” or an excursion review slice. |
mode | Selects ATE or SPC reasoning path. | Use ate for per-die parametrics and binning, spc for trend monitoring. |
project_id | Project/workstream grouping key. | Group all product-yield runs under a pilot stream or product line. |
context | Business or engineering framing for the lot review. | “Revenue containment review before customer shipment.” |
artifact_name | Source identity for persistence and scorecard matching. | “ate_parametric_sample.csv” or a real client export name. |
artifact_source | Sample/upload/paste provenance. | Used for pilot trust reporting and debugging ingest behavior. |
run_profile_id / run_profile_name | Operational preset identity. | Mobile SoC yield review vs PMIC SPC operations. |
chip_type | Product category or chip family. | Guides yield reasoning priorities and terminology. |
client_profile | Team/workflow label. | “Product and yield engineering” or “Lot escalation committee”. |
custom_instructions | Run-specific focus. | Ask the agent to prioritize revenue recovery or reliability containment. |
reference_data | Historical lot notes or threshold context. | Earlier lot excursions, premium-bin limits, leakage correlations. |
reference_data_label | Identity for historical input. | “Historical lot review notes” or “Premium-bin thresholds”. |
Run history fields
| Field | Meaning | Why it matters |
|---|---|---|
run_id | Unique persisted run identifier. | Links history, exports, and feedback. |
status | Completed or failed. | Needed for pilot reliability reporting. |
provider / model | LLM provider and resolved model. | Supports pilot observability and provider comparisons. |
raw_artifact | Stored artifact body. | Makes runs replayable and auditable. |
parser_format | Detected parser family. | Useful when debugging format mismatches. |
parser_confidence | Confidence score from parser layer. | Prevents silent trust in weak parses. |
parser_warnings | Warnings about uncertain or partial parse quality. | Helps pilots surface reliability risk honestly. |
orchestration | Run-specific plan context. | Shows how enterprise policy and run profile shaped the result. |
analysis_log | Five-step analysis trace. | Human reviewers can audit reasoning separately from actions. |
decisions | Structured recommended actions. | Core user-facing value output. |
feedback_summary | Aggregate accept/reject outcome. | Critical pilot trust signal. |
export_history | Workflow-export event trail. | Shows the product is fitting into real workflows. |
benchmark_title / benchmark_score | Benchmark or live scorecard result. | Measures quality for sponsor walkthroughs and internal reviews. |
Enterprise config fields
Agent 01 policy fields
| Field | Use |
|---|---|
org_name | Names the owning business or engineering unit for review framing. |
review_board | Specifies the body receiving or judging recommendations. |
output_style | Controls how the agent formats recommendations conceptually. |
escalation_policy | Defines what should be escalated before optimization tasks. |
evidence_policy | Forces evidence-first reasoning standards. |
instruction_addendum | Stores additional durable org guidance. |
Agent 02 policy fields
| Field | Use |
|---|---|
org_name | Names the yield/product organization using the agent. |
review_board | Specifies escalation committee or yield board context. |
output_style | Guides tone toward revenue-aware and operational lot review. |
escalation_policy | Prioritizes what must be raised immediately. |
risk_policy | Explains how the agent should balance revenue and containment concerns. |
instruction_addendum | Captures durable local review expectations. |
Pilot operations
How to run a pilot safely
- Enable
PILOT_ACCESS_TOKENbefore sharing a URL. - Use sanitized artifacts first.
- Track parser confidence and warnings during early client ingestion.
- Push reviewers to accept or reject findings so the feedback loop becomes measurable.
What to show a sponsor
- Pilot dashboard with total runs, acceptance rate, exports, and parser confidence
- Run History showing raw artifact replay and decision audit trail
- Agent page with artifact summary, scorecard, orchestration, and human review loop
Parser confidence and parser trust
Parser confidence exists because a wrong structured interpretation can look deceptively polished once passed through an LLM. Silicon Agents treats parser trust as a first-class signal and stores it alongside the run.
Exports and workflow fit
HTML brief
Executive-readable artifact for sponsor reviews, pilot summaries, and email circulation.
Jira export
Structured payload for issue-creation or review-queue integration in engineering workflows.
Email export
Formatted payload that helps teams circulate review actions without opening the UI.
Current API surface
| Endpoint | Purpose | Why it matters |
|---|---|---|
POST /api/v1/verify | Run Agent 01 analysis as a streaming workflow. | Core verification workflow entrypoint for future enterprise integration. |
POST /api/v1/yield | Run Agent 02 analysis as a streaming workflow. | Core yield workflow entrypoint. |
GET /api/v1/runs | List saved runs. | Supports history, audit, and pilot dashboards. |
GET /api/v1/runs/{run_id} | Retrieve one saved run in detail. | Enables replay, review, and export orchestration. |
POST /api/v1/rag/search | Search retrieval documents using metadata filters and embedding-aware ranking. | Supports auditable retrieval over saved runs and ingested notes. |
POST /api/v1/rag/ingest-note | Chunk and embed a sanitized engineering note. | Adds pilot knowledge without broad document-ingestion risk. |
POST /api/v1/rag/reindex | Re-embed existing retrieval documents. | Lets teams switch embedding providers or refresh vectors without re-ingesting source content. |
POST /api/v1/feedback | Persist accept/reject review signals. | Critical for human-in-the-loop trust measurement. |
GET /api/v1/runs/{run_id}/export/jira | Generate Jira-ready payload for a saved run. | Proves workflow fit beyond the UI. |
GET /api/v1/runs/{run_id}/export/email | Generate email-ready payload for a saved run. | Lets teams circulate findings immediately. |
GET /api/v1/pilot/metrics | Aggregate pilot evidence across all saved runs. | Provides sponsor-grade usage and trust signals. |
POST /api/v1/pilot/access-code/generate | Generate a suggested pilot access token value. | Supports controlled sharing without building full auth. |
GET /api/v1/config/agent01 / agent02 | Read server-backed enterprise policy for each agent. | Separates governance from day-to-day run selection. |
RAG status: retrieval is active, hardening is next
Silicon Agents now has an initial retrieval-augmented pipeline for Agent 01 and Agent 02. It converts saved run history and manually ingested engineering notes into retrieval documents, embeds them, applies project/profile/workflow filters, and injects retrieved evidence into the verification and yield analysis paths with source metadata, visible citations, evidence excerpts, and run-history replay.
Implemented now
- Run history converted into retrieval documents
- Manual note ingestion with deterministic chunking
- Gemini embedding provider with local fallback
- Metadata-filtered search across project, agent, mode, profile, and source type
- Agent 01 and Agent 02 retrieved-source metadata on decisions
- Retrieved-source citations on decision cards
- Retrieved-evidence panels with excerpts in agents and run history
- RAG console for browser search, manual note ingestion, and reindexing
Still to harden
- Production pgvector deployment and index tuning
- Full document upload formats beyond text notes
- Retrieval thresholds, reranking, and quality evals
- Separation between retrieved fact and model inference
Agent 01 retrieval targets
Historical coverage closure notes, waiver rationales, regression debug notes, prior run-history decisions, and methodology guidance for the same IP block.
Agent 02 retrieval targets
Lot review archives, mis-bin investigation notes, SPC playbooks, product-engineering escalation guidance, and yield-containment patterns.
Current RAG tools
/rag provides browser search, note ingestion, and reindex controls over /api/v1/rag/search, /api/v1/rag/ingest-note, and /api/v1/rag/reindex.
Detailed design reference: RAG_ROADMAP.md in the repository root.
Next MVP phase: enterprise API integration for EDA workflows
The next phase is about making Silicon Agents a pilot-accessible integration layer rather than just a self-contained workflow console. That means exposing artifact ingestion APIs, workflow-trigger APIs, and result-return paths so enterprises can call Silicon Agents from verification dashboards, test operations, and review tooling.
Why this is next
- The current MVP already proves the workflow value.
- Pilot clients will next ask how this fits their toolchain, not just whether the dashboard looks good.
- API exposure is the bridge from demo to embedded pilot usage.
What Phase 2 API work should enable
- Artifact submission from external tools
- Run status polling or callback integration
- Structured results consumption
- Workflow embedding into EDA-adjacent review systems