SA
Silicon Agents Console
Product documentation and workflow reference
Product docs

Silicon Agents documentation for pilot users, delivery teams, and sponsors.

Silicon Agents is an AI workflow copilot for semiconductor engineering. It does not replace EDA tools. It sits above existing verification, test, and yield workflows and reduces the time engineers spend interpreting raw outputs, prioritizing findings, and deciding what to do next.

Agent 01
Verification workflow copilot

Coverage closure and regression triage with evidence-backed next actions.

Agent 02
Yield intelligence copilot

ATE anomaly review, binning validation, and SPC-driven yield prioritization.

Trust posture
Human in the loop

Recommendations are visible, ranked, evidence-cited, and explicitly accepted or rejected.

Next MVP phase
EDA/API integration

Workflow embedding into enterprise toolchains is the next scale-up phase after pilot readiness.

Platform pattern

What the product already proves

  • Raw artifacts can be ingested, parsed, reasoned over, and converted into reviewable actions.
  • Run history, parser trust, provenance, and exports make the workflow auditable instead of purely conversational.
  • Enterprise policy and run profiles let the same system adapt across chip programs and review styles.

What the product is not yet

  • It is not a simulator, compiler, testcase generator, or RTL modification engine.
  • It is not yet embedded into client EDA stacks through production-grade APIs.
  • It is not full RBAC or multi-tenant SaaS. That would be later-stage platform work.
The current MVP is best pitched as a verification-first and yield-adjacent workflow wedge with scalable platform architecture, not as a finished end-state product. That is the honest and strongest positioning for pilot and Seeder conversations.

Primary users and use cases

DV engineer

Uses Agent 01 to turn coverage logs and regression outputs into ranked closure actions, evidence, and focused review paths.

Yield / product engineer

Uses Agent 02 to detect mis-bins, leakage anomalies, SPC shifts, and lot-level review priorities without manual spreadsheet triage.

Principal / lead engineer

Defines enterprise policy, review governance, evidence expectations, and escalation rules that shape how the agents respond.

High-level design

Input artifact
  -> Parser layer
  -> Retrieval / RAG layer
  -> Orchestration layer
  -> LLM analysis layer
  -> Decision generation
  -> Human review loop
  -> Run history + exports + pilot metrics

Supporting layers:
  - Enterprise policy store
  - Run profile presets
  - Retrieval documents + embeddings
  - Benchmark / live scorecard engine
  - Pilot access gate
  - Structured export adapters

Low-level workflow

Verification path

  1. User uploads or pastes a coverage report or regression log.
  2. Artifact provenance is captured as bundled sample, uploaded file, or pasted text.
  3. Parser extracts structured items and computes parser confidence and warnings.
  4. RAG retrieves prior closure notes, waiver context, regression decisions, and similar run history.
  5. Orchestration layer merges enterprise policy, active run profile, artifact context, and retrieved evidence.
  6. LLM produces five-step reasoning, ranked actions, and cited sources.
  7. Run history persists raw artifact, retrieved evidence, orchestration, analysis trace, decisions, and scorecard.

Yield path

  1. User provides ATE parametric CSV or SPC trend input.
  2. Parser extracts chips, bins, pass/fail state, and anomaly signals.
  3. RAG retrieves prior lot reviews, yield notes, SPC playbooks, and similar run history.
  4. Orchestration emphasizes revenue risk, containment, operational review style, and retrieved evidence.
  5. LLM generates lot-level review steps, ranked mis-bin or anomaly actions, and cited sources.
  6. Human reviewer accepts or rejects actions and exports results into workflow targets.

Visual flow chart: after upload and Analyse

The runtime flow below shows what the product actually does after a user uploads an artifact and clicks Analyse. The same pattern applies to both Agent 01 and Agent 02, with different parser and prompt specializations per workflow.

Frontend role

Read artifacts, assemble payloads, render the streamed review experience, and hold deployment settings for split frontend/backend pilots.

Backend role

Parse artifacts, retrieve prior evidence, orchestrate context, call the LLM, persist the run, and expose RAG, exports, pilot metrics, and replay APIs.

Reviewer role

Interpret ranked actions, approve or reject them, and route accepted outputs into engineering workflows through history and export adapters.

System layers

1. Presentation layer

HTML dashboards for landing, agents, configuration, RAG console, history, pitch, pilot dashboard, and docs.

2. API layer

FastAPI routes for verification, yield, RAG search/ingest/reindex, feedback, config, benchmark, history, exports, pilot metrics, and access-code generation.

3. Parser layer

Coverage, regression, ATE, and SPC parsers normalize artifacts into structured summaries and items.

4. Retrieval / RAG layer

Run history and manual notes become metadata-filtered retrieval documents with Gemini-ready embeddings, evidence excerpts, and cited sources.

5. Orchestration layer

Builds run-specific prompt plans using enterprise policy, run profile, artifact metadata, workflow mode, and retrieved evidence.

6. LLM reasoning layer

Primary/fallback provider path for multi-step analysis, action ranking, and source-aware recommendations.

7. Persistence layer

SQLite stores enterprise config, decisions, feedback, run history, retrieval documents, raw artifacts, parser trust, and export history, with PostgreSQL/pgvector-ready columns.

Orchestration model

Orchestration is the policy and context synthesis step before the main LLM analysis call. It is important because raw artifacts alone are not enough to produce enterprise-appropriate actions. Different teams care about different failure modes, evidence styles, escalation policies, and workflow outputs.

What the orchestrator consumes

  • Enterprise policy for the selected agent
  • Active run profile
  • Artifact type and parser metadata
  • Chip focus and client workflow context
  • Optional historical reference notes

What the orchestrator emits

  • Chip focus label
  • Output emphasis
  • Instruction priorities
  • Reference priorities
  • Plan source/provider details

Agent 01: Verification Workflow Copilot

Primary workflows

  • Functional coverage closure
  • Regression triage and clustering
  • Protocol escape-risk prioritization
  • Evidence-backed review queue creation

Typical users

  • DV engineers preparing closure actions
  • Verification leads reviewing high-risk gaps
  • Program owners tracking first-pass review efficiency
Agent 01 is the current lead wedge for pilot and sponsor demos because verification consumes the largest schedule pressure in many fabless organizations.

Agent 02: Yield Intelligence Copilot

Primary workflows

  • ATE anomaly review
  • Mis-bin detection and premium-bin recovery suggestions
  • Leakage and performance containment prioritization
  • SPC trend interpretation

Typical users

  • Product engineering
  • Yield engineering
  • Lot review boards
  • Operations teams escalating excursion risk

Decision review loop

1. Generate

The agent creates ranked actions with evidence, rationale, priority, confidence, and effort.

2. Review

Users inspect findings in the decision queue and the analysis trace separately so reasoning does not crowd action review.

3. Accept or reject

Reviewer feedback is captured per decision and linked to the persisted run record for trust and evaluation.

Run history and traceability

Every run is stored with raw artifact replay, parser confidence, orchestration details, generated decisions, feedback, and export history. This is the foundation for pilot evidence, sponsor reporting, and future workflow integration.

Why it matters

  • Reproducibility for engineering review
  • Trust and auditability for AI recommendations
  • Pilot metrics for Seeder or client reporting
  • Workflow export trace for Jira, email, and future Confluence adapters

What is stored

  • Artifact identity and source
  • Raw artifact text
  • Parser format, confidence, and warnings
  • Orchestration plan
  • Analysis trace and ranked decisions
  • Feedback summary and export history

Enterprise configuration

Enterprise configuration is not intended to be tuned on every run. It is a durable governance layer that should be set by high-level engineers, review owners, L3/L4 specialists, or delivery leaders. It defines how the agent should speak, escalate, and justify outputs within a given organization.

Run profiles

Run profiles are the day-to-day operational presets used by engineers. They are different from enterprise policy. A run profile chooses the active workflow style for a specific chip program or review type, while enterprise policy sets the organization-wide guardrails.

Run profile examples

  • USB Controller DV Team · Coverage
  • USB Regression Triage Team
  • Mobile SoC Yield Team · ATE
  • PMIC SPC Operations

Why this separation matters

  • Principal engineers set policy once
  • Everyday reviewers pick workflow presets without changing governance
  • Clients can scale across many teams without duplicating policy input

Why enterprise policy matters

Without enterprise policy, AI recommendations may still be technically useful, but they will not be consistent with the organization’s review culture, escalation cadence, evidence requirements, or decision format. Policy is what turns a generic assistant into an enterprise workflow copilot.

Verification request fields

FieldPurposeTypical use case
report_textRaw coverage report or regression log input.Paste or upload a VCS/Xcelium artifact for Agent 01 analysis.
formatParser hint for auto, vcs, or xcelium.Force a known format when the client artifact style is predictable.
modeSelects coverage closure or triage behavior.Use coverage for uncovered bins, triage for regression clustering.
design_nameHuman-readable runtime label shown in analysis and history.“USB Controller v2.3” or “NoC QoS Block”.
project_idProject/workspace grouping key for history and feedback.Used to aggregate runs across the same client stream.
contextExtra operational framing for the run.“Protocol escape review before tape-in signoff.”
artifact_nameStored artifact identifier.Needed for benchmark matching and run replay clarity.
artifact_sourceTells whether input came from a bundled sample, upload, or pasted text.Important for pilot provenance and parser-risk discussions.
run_profile_id / run_profile_nameOperational preset identity.Distinguishes USB DV coverage from LPDDR refresh or secure boot triage.
chip_typeChip or IP family focus.Guides orchestration emphasis and terminology.
client_profileHuman-readable team or client workflow label.“Fabless verification team” or “InSemi DV pod”.
custom_instructionsRun-specific override instructions.Tell the agent to prefer speed, risk containment, or formal review style for one run.
reference_dataHistorical notes or prior artifact context.Feed closure notes, repeated gaps, or known scenario limitations.
reference_data_labelReference-data identity label.“Historical closure notes” or “Prior regression cluster review”.

Yield request fields

FieldPurposeTypical use case
csv_dataRaw ATE parametric or SPC-like input.Paste or upload lot-level measurement data into Agent 02.
lot_idRuntime label for the lot or SPC population.“LOT_004” or an excursion review slice.
modeSelects ATE or SPC reasoning path.Use ate for per-die parametrics and binning, spc for trend monitoring.
project_idProject/workstream grouping key.Group all product-yield runs under a pilot stream or product line.
contextBusiness or engineering framing for the lot review.“Revenue containment review before customer shipment.”
artifact_nameSource identity for persistence and scorecard matching.“ate_parametric_sample.csv” or a real client export name.
artifact_sourceSample/upload/paste provenance.Used for pilot trust reporting and debugging ingest behavior.
run_profile_id / run_profile_nameOperational preset identity.Mobile SoC yield review vs PMIC SPC operations.
chip_typeProduct category or chip family.Guides yield reasoning priorities and terminology.
client_profileTeam/workflow label.“Product and yield engineering” or “Lot escalation committee”.
custom_instructionsRun-specific focus.Ask the agent to prioritize revenue recovery or reliability containment.
reference_dataHistorical lot notes or threshold context.Earlier lot excursions, premium-bin limits, leakage correlations.
reference_data_labelIdentity for historical input.“Historical lot review notes” or “Premium-bin thresholds”.

Run history fields

FieldMeaningWhy it matters
run_idUnique persisted run identifier.Links history, exports, and feedback.
statusCompleted or failed.Needed for pilot reliability reporting.
provider / modelLLM provider and resolved model.Supports pilot observability and provider comparisons.
raw_artifactStored artifact body.Makes runs replayable and auditable.
parser_formatDetected parser family.Useful when debugging format mismatches.
parser_confidenceConfidence score from parser layer.Prevents silent trust in weak parses.
parser_warningsWarnings about uncertain or partial parse quality.Helps pilots surface reliability risk honestly.
orchestrationRun-specific plan context.Shows how enterprise policy and run profile shaped the result.
analysis_logFive-step analysis trace.Human reviewers can audit reasoning separately from actions.
decisionsStructured recommended actions.Core user-facing value output.
feedback_summaryAggregate accept/reject outcome.Critical pilot trust signal.
export_historyWorkflow-export event trail.Shows the product is fitting into real workflows.
benchmark_title / benchmark_scoreBenchmark or live scorecard result.Measures quality for sponsor walkthroughs and internal reviews.

Enterprise config fields

Agent 01 policy fields

FieldUse
org_nameNames the owning business or engineering unit for review framing.
review_boardSpecifies the body receiving or judging recommendations.
output_styleControls how the agent formats recommendations conceptually.
escalation_policyDefines what should be escalated before optimization tasks.
evidence_policyForces evidence-first reasoning standards.
instruction_addendumStores additional durable org guidance.

Agent 02 policy fields

FieldUse
org_nameNames the yield/product organization using the agent.
review_boardSpecifies escalation committee or yield board context.
output_styleGuides tone toward revenue-aware and operational lot review.
escalation_policyPrioritizes what must be raised immediately.
risk_policyExplains how the agent should balance revenue and containment concerns.
instruction_addendumCaptures durable local review expectations.

Pilot operations

How to run a pilot safely

  • Enable PILOT_ACCESS_TOKEN before sharing a URL.
  • Use sanitized artifacts first.
  • Track parser confidence and warnings during early client ingestion.
  • Push reviewers to accept or reject findings so the feedback loop becomes measurable.

What to show a sponsor

  • Pilot dashboard with total runs, acceptance rate, exports, and parser confidence
  • Run History showing raw artifact replay and decision audit trail
  • Agent page with artifact summary, scorecard, orchestration, and human review loop

Parser confidence and parser trust

Parser confidence exists because a wrong structured interpretation can look deceptively polished once passed through an LLM. Silicon Agents treats parser trust as a first-class signal and stores it alongside the run.

If parser confidence is weak or warnings appear, the right product behavior is not “silently continue.” The right behavior is “continue visibly, with uncertainty exposed.”

Exports and workflow fit

HTML brief

Executive-readable artifact for sponsor reviews, pilot summaries, and email circulation.

Jira export

Structured payload for issue-creation or review-queue integration in engineering workflows.

Email export

Formatted payload that helps teams circulate review actions without opening the UI.

Current API surface

EndpointPurposeWhy it matters
POST /api/v1/verifyRun Agent 01 analysis as a streaming workflow.Core verification workflow entrypoint for future enterprise integration.
POST /api/v1/yieldRun Agent 02 analysis as a streaming workflow.Core yield workflow entrypoint.
GET /api/v1/runsList saved runs.Supports history, audit, and pilot dashboards.
GET /api/v1/runs/{run_id}Retrieve one saved run in detail.Enables replay, review, and export orchestration.
POST /api/v1/rag/searchSearch retrieval documents using metadata filters and embedding-aware ranking.Supports auditable retrieval over saved runs and ingested notes.
POST /api/v1/rag/ingest-noteChunk and embed a sanitized engineering note.Adds pilot knowledge without broad document-ingestion risk.
POST /api/v1/rag/reindexRe-embed existing retrieval documents.Lets teams switch embedding providers or refresh vectors without re-ingesting source content.
POST /api/v1/feedbackPersist accept/reject review signals.Critical for human-in-the-loop trust measurement.
GET /api/v1/runs/{run_id}/export/jiraGenerate Jira-ready payload for a saved run.Proves workflow fit beyond the UI.
GET /api/v1/runs/{run_id}/export/emailGenerate email-ready payload for a saved run.Lets teams circulate findings immediately.
GET /api/v1/pilot/metricsAggregate pilot evidence across all saved runs.Provides sponsor-grade usage and trust signals.
POST /api/v1/pilot/access-code/generateGenerate a suggested pilot access token value.Supports controlled sharing without building full auth.
GET /api/v1/config/agent01 / agent02Read server-backed enterprise policy for each agent.Separates governance from day-to-day run selection.

RAG status: retrieval is active, hardening is next

Silicon Agents now has an initial retrieval-augmented pipeline for Agent 01 and Agent 02. It converts saved run history and manually ingested engineering notes into retrieval documents, embeds them, applies project/profile/workflow filters, and injects retrieved evidence into the verification and yield analysis paths with source metadata, visible citations, evidence excerpts, and run-history replay.

Implemented now

  • Run history converted into retrieval documents
  • Manual note ingestion with deterministic chunking
  • Gemini embedding provider with local fallback
  • Metadata-filtered search across project, agent, mode, profile, and source type
  • Agent 01 and Agent 02 retrieved-source metadata on decisions
  • Retrieved-source citations on decision cards
  • Retrieved-evidence panels with excerpts in agents and run history
  • RAG console for browser search, manual note ingestion, and reindexing

Still to harden

  • Production pgvector deployment and index tuning
  • Full document upload formats beyond text notes
  • Retrieval thresholds, reranking, and quality evals
  • Separation between retrieved fact and model inference

Agent 01 retrieval targets

Historical coverage closure notes, waiver rationales, regression debug notes, prior run-history decisions, and methodology guidance for the same IP block.

Agent 02 retrieval targets

Lot review archives, mis-bin investigation notes, SPC playbooks, product-engineering escalation guidance, and yield-containment patterns.

Current RAG tools

/rag provides browser search, note ingestion, and reindex controls over /api/v1/rag/search, /api/v1/rag/ingest-note, and /api/v1/rag/reindex.

Current implementation path: run history/manual notes -> retrieval documents -> Gemini embeddings -> metadata-filtered retrieval -> Agent 01 and Agent 02 cited decisions. Storage also includes a pgvector-ready column for PostgreSQL deployments.

Detailed design reference: RAG_ROADMAP.md in the repository root.

Next MVP phase: enterprise API integration for EDA workflows

The next phase is about making Silicon Agents a pilot-accessible integration layer rather than just a self-contained workflow console. That means exposing artifact ingestion APIs, workflow-trigger APIs, and result-return paths so enterprises can call Silicon Agents from verification dashboards, test operations, and review tooling.

Why this is next

  • The current MVP already proves the workflow value.
  • Pilot clients will next ask how this fits their toolchain, not just whether the dashboard looks good.
  • API exposure is the bridge from demo to embedded pilot usage.

What Phase 2 API work should enable

  • Artifact submission from external tools
  • Run status polling or callback integration
  • Structured results consumption
  • Workflow embedding into EDA-adjacent review systems