Chapter 21Annotated bibliography

This book is explicit about its relationship to existing sources. Where canonical work exists, this book cites and cross-references rather than re-derive. This bibliography is the single point of departure for readers who want the full hands-on, academic, or framework-specific treatment of any pattern the book covers in compressed form.

Reading order, for a reader evaluating whether to invest. The recommended sequence positions this book relative to its sources rather than replacing them. Read Anthropic’s Building Effective Agents first (an afternoon) for the canonical workflow vocabulary; then Andrew Ng’s four-pattern overview (a few hours) for the foundational taxonomy; then Gulli’s Agentic Design Patterns (a project’s worth of reading) for hands-on code per pattern. Use this book as the architectural complement that ties them together with the disciplines, bounded autonomy, governance, the ingestion pipeline, the trace, the harness, that make production systems work. The book’s value is in the integration and the production discipline, not in re-deriving the patterns those sources already cover; a reader who has not read them will get more from reading them alongside this book than from this book alone.

Entries are grouped by category. Each entry includes a one-paragraph annotation that names what the source covers, what its strengths are, and where in this book it is cited.

Given the pace of the field, URLs may drift. The titles and authors given here are sufficient to locate the current home of each resource through a standard search.

Canonical pattern catalogs

Antonio Gulli, Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems (Springer Nature, 2025). ISBN 978-3-032-01401-6 / eBook 978-3-032-01402-3. 424–470 pages. 21 patterns, one per chapter. Each pattern receives a detailed overview, practical applications, hands-on code examples in LangChain/LangGraph/CrewAI/Google ADK, and key takeaways. The most thorough single-author treatment available, written by a Google senior director with a deep AI background. A free PDF is also distributed alongside the published edition. Used in this book as the principal cross-reference for cognitive patterns (Chapter 4) and as the source the book defers to for hands-on code throughout. Link: https://link.springer.com/book/10.1007/978-3-032-01402-3

Anthropic, “Building Effective Agents” (Anthropic Engineering Blog, December 2024). The canonical short treatment of workflow patterns: prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer, and the distinction between workflows (LLM calls in deterministic structures) and agents (open-ended loops with tools). The most-cited single piece in the agentic literature, terse and architecturally clear. Used in this book as the canonical workflow vocabulary cited throughout Chapter 9. Link: https://www.anthropic.com/research/building-effective-agents

Andrew Ng, “Agentic Design Patterns” (DeepLearning.AI Newsletter / Course, 2024). Four foundational patterns: reflection, tool use, planning, multi-agent collaboration. Short, accessible, widely cited as the introductory taxonomy. The patterns are not all of agentic design, but they are the patterns most students and practitioners encounter first. Used in this book for foundational vocabulary; cited in Chapter 4 for reflection and tool use. Link: https://www.deeplearning.ai/the-batch/issue-241/ (and related course materials at deeplearning.ai)

Liu, Lu, Zhu, Xu, et al., “Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model Based Agents,” Journal of Systems and Software, vol. 220, p. 112278 (February 2025). CSIRO/Data61. The peer-reviewed academic catalog. Treats agentic patterns with the rigor of the software-architecture literature, with explicit taxonomy, forces, and consequences for each pattern. The closest analog in form to Gang of Four for agentic systems. Used in this book as the academic reference standard; cross-referenced in Chapter 4 and Chapter 9.

“Agentic Design Patterns: A System-Theoretic Framework,” arXiv:2601.19752 (2026). A 12-pattern system-theoretic treatment, formalizing patterns in the language of systems theory rather than software architecture. Useful for readers who want a more abstract framing of pattern composition. Used in this book as a cross-reference where systems-theoretic framing illuminates structural commitments. Link: https://arxiv.org/abs/2601.19752

“A Reference Architecture for Autonomous Networks: An Agent-Based Approach,” arXiv:2503.12871 (2025). A reference architecture for autonomous networks, with hierarchical agents at Resource, Service, and Business layers. Specific to networking but the layering principle generalizes to other domains. Used in this book as a cross-reference for hierarchical multi-agent shapes (Chapter 9) and reference-architecture practice. Link: https://arxiv.org/abs/2503.12871

Augment Code, “What Are Agentic Design Patterns? 2026 Pattern Catalog” (industry guide). A consolidation of Andrew Ng’s foundational patterns, Anthropic’s workflow patterns, and a growing set of emergent reliability and memory patterns from 2025–2026 into a 12+13 pattern catalog with maturity ratings. Self-described as orientation, not deep dissection. Useful as a quick scan of the modern pattern landscape. Used in this book as a vocabulary reference and a pointer to the breadth of current industry thinking. Link: https://www.augmentcode.com/guides/agentic-design-patterns

SitePoint, “Agentic Design Patterns: The 2026 Guide to Building Autonomous Systems” (industry guide, 2026). A free, web-available guide to the current pattern landscape with practical examples and decision rules. Less rigorous than Gulli but accessible. Link: https://www.sitepoint.com/the-definitive-guide-to-agentic-design-patterns-in-2026/

Foundational pattern papers

Yao, Zhao, Yu, et al., “ReAct: Synergizing Reasoning and Acting in Language Models,” arXiv:2210.03629 (2022). The original ReAct paper. The pattern is now partially absorbed into reasoning models but the paper remains the canonical exposition. Used in this book in Chapter 4.

Wang, Wei, Schuurmans, et al., “Self-Consistency Improves Chain-of-Thought Reasoning in Language Models,” arXiv:2203.11171 (2022). The original self-consistency paper. Cited as the canonical reference for variance reduction via multiple reasoning traces.

Yao, Yu, Zhao, et al., “Tree of Thoughts: Deliberate Problem Solving with Large Language Models,” arXiv:2305.10601 (2023). The Tree-of-Thought paper. Treats structured search-based reasoning as a pattern. Heavy machinery; used less often in production than its citation count would suggest.

Du, Li, Torralba, et al., “Improving Factuality and Reasoning in Language Models through Multiagent Debate,” arXiv:2305.14325 (2023). The multi-agent debate paper. Cited as the canonical reference for the debate coordination pattern.

Zheng, Lian, et al., “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,” arXiv:2306.05685 (2023). The foundational paper formalizing the use of strong LLMs to evaluate the outputs of other LLMs. Required reading for the statistical limitations and biases, position bias, verbosity bias, self-preference, inherent in Layer 3 quality testing (Chapter 12).

Skills, agents, and runtime standards

Anthropic, “Equipping agents for the real world with Agent Skills” (Anthropic Engineering, 2025). Anthropic’s announcement and design discussion for the Skills standard. Introduces the format and the progressive disclosure mechanism. Required reading for anyone adopting the Skills layer. Used in this book as the primary source for Chapter 10. Link: https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills

agentskills.io, the open standard hub. The open specification of the Agent Skills format. Lists the ecosystem of adopting tools (Claude Code, OpenAI Codex, Gemini CLI, Cursor, GitHub Copilot, Goose, OpenHands, Letta, and many others). Also hosts the quickstart, specification, and reference materials. Used in this book in Chapter 10; consult directly for current specification. Link: https://agentskills.io/

Anthropic, Claude Agent Skills documentation (platform.claude.com). Anthropic-specific Skills documentation for Claude Code, the Claude Agent SDK, and the Claude Developer Platform. Link: https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview

Model Context Protocol (MCP) specification. The transport protocol for tools, resources, and prompts between agents and external services. Distinct from Skills (Chapter 10) but often used in concert.

OpenAI Agents SDK; Anthropic Agent SDK. The framework SDKs from the principal model providers, both of which treat handoff as a primitive. Cited in Chapter 9.

Memory, RAG, and context engineering

Lewis, Perez, Piktus, et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” arXiv:2005.11401 (2020). The RAG paper. Pre-dates much of the agentic discussion but established the retrieval-augmented architecture this book treats in Chapter 7.

Letta (formerly MemGPT) documentation and papers. A platform and a line of research on stateful agents with long-term memory. Useful exposition of the engineering of episodic memory and persistent agent identity. Link: https://www.letta.com/

Anthropic, context engineering documentation and engineering blog posts (2025). Anthropic’s treatment of context engineering: prompt structure, caching, conversation compaction, progressive disclosure via Skills. Practical and architecture-relevant.

Anthropic, “Project Vend: Can Claude run a small shop?” (Anthropic Research, June 2025). A month-long field experiment in which a Claude Sonnet 3.7 instance (“Claudius”) operated a small in-office retail shop with pricing, inventory, supplier search, and customer-interaction tools. Documents long-horizon economic drift — below-cost sales, employee-directed discounting, a hallucinated Venmo payment address — over an extended run with no architectural in-flight governance gate beyond configuration. Cited in Chapter 5 as a primary-sourced parallel to aggregate-exposure failures (economic, not only token-cost). Link: https://www.anthropic.com/research/project-vend-1

Agent evaluation

Yehudai, Eden, Li, et al., “Survey on Evaluation of LLM-based Agents,” arXiv:2503.16416 (2025). Maps the evaluation landscape across planning, tool use, reflection, memory, web tasks, software engineering, and conversational and generalist agents. Flags gaps in cost-efficiency, safety, stability, and fine-grained scalable evaluation — the same gaps that make the trace-driven, continuous-evaluation architecture of Chapter 12 necessary rather than optional. Used in this book as the academic reference for the evaluation landscape; cited in Chapter 12. Link: https://arxiv.org/abs/2503.16416

Software architecture foundations

Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1994). The original Gang of Four. The book this manuscript draws its formal-pattern tone from, despite operating in a different domain.

Martin Fowler, Patterns of Enterprise Application Architecture (Addison-Wesley, 2002). The closest precedent in form: architectural patterns for a class of systems (enterprise applications), with deep treatment, tradeoffs, and structural reasoning.

Martin Kleppmann, Designing Data-Intensive Applications (O’Reilly, 2017). Cited for its tone and its model of an architectural reference that respects the engineering reality of probabilistic systems (data systems with eventual consistency, partial failure, etc.).

Chris Richardson, Microservices Patterns (Manning, 2018). The Saga pattern is from this lineage; cited in Chapter 9 and Chapter 16.

Operations, security, and reliability

Simon Willison, “Lethal Trifecta” writings (blog posts, 2024–2025). The most-cited articulation of the lethal-trifecta vulnerability class: untrusted content + sensitive data access + external action capability. Catalogs the exfiltration pattern across products; for the June 2024 GitHub Copilot Chat disclosure, see Johann Rehberger (Embrace The Red), who researched and published the primary finding. Required reading on agent security. Link: https://simonwillison.net/tags/llmsecurity/

Johann Rehberger, “GitHub Copilot Chat: From Prompt Injection to Data Exfiltration” (Embrace The Red, June 2024). Primary disclosure of prompt-injection-driven data exfiltration in GitHub Copilot Chat: private repository context, attacker-influenced content in the same session, and Markdown image rendering combined to leak chat history via outbound image URLs. Cited in Chapter 5 and Chapter 11. Link: https://embracethered.com/blog/posts/2024/github-copilot-chat-prompt-injection-data-exfiltration/

IBM and Ponemon Institute, Cost of a Data Breach Report (annual, 2024 edition cited in this book). Industry benchmark on breach lifecycle duration, post-incident response cost bands, and the economics of detection and containment. Cited in Chapter 6 for remediation-magnitude grounding in the worked cost model. Link: https://www.ibm.com/reports/data-breach

U.S. Bureau of Labor Statistics, Occupational Employment and Wage Statistics (OEWS) (May 2024 release, software developers SOC 15-1252). Median hourly wage data used to parameterize the loaded labor rate in Chapter 6’s worked cost model. Link: https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm

OWASP, Top 10 for LLM Applications (OWASP, ongoing). Industry-consensus list of the most critical LLM-application security risks. Required reading for production deployment. Link: https://owasp.org/www-project-top-10-for-large-language-model-applications/

Site Reliability Engineering and operational literature (Beyer et al., Google SRE books, 2016 and later). The discipline of running production systems with measured reliability. Cited as the tone reference for Chapter 18.

OpenTelemetry, Semantic Conventions for Generative AI (OpenTelemetry project, ongoing). The industry-consensus standard for structuring GenAI traces. Defines the span attributes (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, and the like) that make the replay and correlation architecture of Chapter 12 portable across vendors. Link: https://opentelemetry.io/docs/specs/semconv/gen-ai/

NVIDIA NeMo Guardrails; Microsoft Presidio. Reference implementations of deterministic policy gating and input/output sanitization. NeMo Guardrails provides a DSL for programmable rails; Presidio provides regex- and NER-based PII detection and redaction. Together they exemplify the deterministic substrate behind the governance layer (Chapter 6), policy enforced in code, not in prompts. Link: https://github.com/NVIDIA/NeMo-Guardrails; https://github.com/microsoft/presidio

Runtime governance architectures

The academic literature has, over 2025–2026, converged on this chapter’s central governance claim — that agent governance is an architectural concern enforced at runtime, not a policy document. MI9, governance-by-design, and Dhanorkar et al. (2026) on human oversight in practice make the argument directly or supply empirical grounding; SAGA supplies the adjacent security architecture the governance layer depends on. All are cross-referenced from Chapter 6.

Dux, Alaimo, Roussière, “Governance by Design: Architecting Agentic AI for Organizational Learning and Scalable Autonomy,” arXiv:2605.20210 (2026). Argues that enterprise agent governance is implemented through architectural choices — what the agent can do, which tools and data it can access, how memory is handled, how improvements are introduced — rather than through policy documents. The same thesis Chapter 6 advances, stated in the academic literature. Used in this book as corroboration that governance-as-architecture is a converging consensus, not this book’s idiosyncratic claim. Link: https://arxiv.org/abs/2605.20210

Wang, Singhal, Kelkar, et al., “MI9: An Integrated Runtime Governance Framework for Agentic AI,” arXiv:2508.03858 (2025). Proposes runtime controls for agentic systems: agency-risk indexing, semantic telemetry, continuous authorization monitoring, conformance engines, goal-conditioned drift detection, and containment. The core claim — that pre-deployment review is insufficient because agent behavior changes at runtime — is the argument this book makes for runtime gates over prompt-based policy (Chapter 6). Used in this book as the academic reference for runtime governance mechanisms. Link: https://arxiv.org/abs/2508.03858

Syros, Suri, Ginesin, et al., “SAGA: A Security Architecture for Governing AI Agentic Systems,” arXiv:2504.21034 (2025). A security architecture for governing agentic systems: agent registration, user-defined access-control policies, inter-agent communication controls, and cryptographic access-control tokens. Adjacent to the governance-as-architecture claim rather than a statement of it — SAGA supplies the identity, delegation, and access-control substrate that a runtime governance layer such as Chapter 6’s rests on, and is directly relevant to multi-agent coordination (Chapter 9). Used in this book as a cross-reference for agent identity, delegation, and access control. Link: https://arxiv.org/abs/2504.21034

Dhanorkar, Passi, and Vorvoreanu, “Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents,” arXiv:2606.05391 (2026). Exploratory qualitative study (17 experienced developers, largely one large tech employer) of how oversight work is actually performed with software agents. Identifies four emergent modes — configuration before prompting, co-planning, real-time monitoring, and post hoc review — and finds co-planning and in-flight monitoring often thinner than configuration and post hoc review. Cross-referenced from Chapter 6 for the temporal placement of oversight and the plan approval gate. Used in this book as empirical grounding for the before-delegation / at-plan-time / in-flight oversight frame. Link: https://arxiv.org/abs/2606.05391

Regulation (EU) 2024/1689 (Artificial Intelligence Act). The EU’s risk-based AI regulation. Article 14 requires high-risk AI systems to be designed for effective human oversight, including the ability to intervene in or interrupt operation. Conformity obligations are staged on a rolling calendar subject to deferral. Cross-referenced from Chapter 6 and Chapter 18. Used in this book as the primary regulatory reference for human-oversight-as-architecture, not as legal guidance. Link: https://artificialintelligenceact.eu/

Surveys and landscape overviews

Luo, Zhang, Yuan, et al., “Large Language Model Agent: A Survey on Methodology, Applications and Challenges,” arXiv:2503.21460 (2025). A methodology-centered taxonomy of LLM-agent systems, linking architectural foundations, collaboration mechanisms, and evolutionary pathways. The broadest academic survey of the field. Used in this book as a cross-reference for the academic taxonomy of agent architectures; cited in Chapter 4. Link: https://arxiv.org/abs/2503.21460

“Understanding the Planning of LLM Agents: A Survey,” arXiv:2402.02716 (2024). A taxonomy of LLM-agent planning: task decomposition, plan selection, external modules, reflection, and memory. Useful for readers who want the academic treatment of the planning patterns this book compresses in Chapter 4. Link: https://arxiv.org/abs/2402.02716

“LLM-Based Agents for Tool Learning: A Survey,” Data Science and Engineering (Springer, 2025). A systematic review of tool-learning agents: the definition of the tool-learning task, the typical architecture, and the open challenges. LLMs do not inherently know user-defined tools or their real-world constraints, which is the gap the governed tool surface of Chapter 5 and Chapter 19 exists to address. Link: https://link.springer.com/article/10.1007/s41019-025-00296-9

Google Cloud Architecture Center, “Choose a design pattern for your agentic AI system” (Google Cloud documentation). Vendor-aligned but useful decision-tree treatment of pattern selection. Link: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system

Google, “Introduction to Agents” (whitepaper, 2025). A vendor-authored but technically substantive treatment of production-grade agent architecture: cognitive architectures, tool use, orchestration, and the building blocks of generative AI agents. Useful as a vendor-perspective companion to this book’s framework-agnostic treatment. Link: https://kaggle.com/whitepapers/introduction-to-agents

Bain & Company, “State of the Art of Agentic AI Transformation” (technology report, 2025). Frames agentic capability in levels and argues that the agentic value pool depends on access to enterprise systems of record and systems of action — an industry restatement of this book’s claim that integration and governance, not model capability, are the binding constraints (Chapter 14). Link: https://www.bain.com/insights/state-of-the-art-of-agentic-ai-transformation-technology-report-2025/

MIT, “2025 AI Agent Index” (2025). Documents the origins, design, capabilities, ecosystem, and safety features of 30 leading production agents in depth — a revised cohort following the project’s 2024 index (67 agents). A rare systematically documented survey of what production agents actually do and what safeguards they ship with — empirical grounding for bounded autonomy (Chapter 5), governance (Chapter 6), and observability (Chapter 12). Link: https://aiagentindex.mit.edu/data/2025-AI-Agent-Index.pdf

How to use this bibliography

The book’s compressed treatment of patterns (Chapter 4 and parts of Chapter 9) defers to Gulli (2025) for hands-on code; to Anthropic Building Effective Agents for the workflow vocabulary; to the CSIRO catalog for the academic taxonomy; and to Andrew Ng for the foundational four. The reading-order note at the head of this chapter positions the four relative to one another and to this book; it is not repeated here.

For readers responsible for deploying agentic systems in regulated, multi-tenant, or otherwise high-stakes environments: the security and operations sources above are not optional reading, and neither is the runtime-governance literature (MI9, governance-by-design, SAGA) or the agent-evaluation survey, which together constitute the academic backing for this book’s governance and testing chapters. The architectural discipline this book describes is necessary but not sufficient, it must be combined with the threat modeling, runtime governance, and operational practice those sources cover.

The bibliography is maintained as the field develops. Entries that become outdated will be marked. New entries will be added where they offer treatments this book defers to or argues against. Current bibliography entries, errata, and updated links for sources whose URLs drift are maintained at architecting-agentic-systems.net.