Wednesday, 14 January 2026

Why trustworthy AI isn’t optional - it’s the foundation

In the rapidly evolving landscape of generative AI, data security and data privacy are not just compliance requirement - they are the bedrock of trust and innovation. 

After all who would want to waste time with a dodgy and fake AI?😑

While data security focuses on protecting AI models and datasets from breaches, tampering, or unauthorised access (through measures like encryption, access controls, and secure APIs), data privacy ensures that the data powering these models is collected, processed, and used ethically and legally. This distinction becomes particularly critical in grounding techniques, where AI models are anchored to external knowledge bases, APIs, or real-time data sources. The more an AI model is trained on accurate, well-structured data, the fewer inaccuracies and hallucinations it produces.

What is grounding?

In simple terms, grounding connects AI models to external sources of truth - such as sensors, databases, or APIs - to ensure responses are accurate, context-aware, and reliable. Without grounding, AI risks hallucinations or relying on outdated or biased data.

For example, imagine grounding an AI customer support chatbot in a SQL database of product manuals and FAQs. The chatbot queries the database in real-time to provide accurate, up-to-date answers—rather than inventing responses. However, if the database connection isn’t secured with encryption or the data isn’t anonymized, the grounded AI system could expose sensitive customer information or fall victim to prompt injection attacks.

And without robust security, grounded AI systems risk exposing sensitive data or being manipulated through techniques like prompt injection or data poisoning. Without privacy safeguards, they may inadvertently violate regulations like GDPR or CCPA by misusing or retaining personal data embedded in their knowledge sources.

For data architects, data engineers and AI engineers, grounding introduces unique challenges. When an AI model queries external databases, APIs, or data lakes, every interaction must be both secured (e.g., using TLS 1.3, OAuth 2.0, or zero-trust architectures) and privacy-preserving (e.g., via differential privacy-federated learning, or data anonymization). For example, if you’re grounding a large language model (LLM) in a relational database or Delta Lake, you MUST ensure that:

Security: The connection is encrypted, access is role-based (e.g., RBAC in Snowflake or Azure Synapse), and queries are logged for audits.

Privacy: The underlying data is scrambled, anonymised or pseudonymized, and the model only retrieves data it’s authorised to use - aligning with the principle of least privilege. Tools like Python’s faker or SQL’s dynamic data masking can help strip PII from responses, your grounding frameworks MUST enforce strict data access policies.

The Future: Trustworthy Grounding in AI

The future of AI hinges on trustworthy grounding - where models don’t just perform well but also respect data sovereignty and user consent. As you design AI systems that interact with databases, data lakes, or Lakehouses, prioritise :

Privacy-by-Design: Embed consent checks into API calls, database connections and ensure data minimisation, authorised data only. 
Security-by-Default: Encrypt even vectors-embeddings in your vector databases and all related grounding data assets and enforce strict access controls.

If you’re experimenting with grounding, start by auditing your data sources

  • Are they secure? 
  • Are they privacy-compliant? 

The answers will define whether your use of AI with your database is not just smart, but also trustworthy.


Thursday, 1 January 2026

Exploring C4 Models with Structurizr DSL, VSCode, and Diagramming Tools

Introduction

As a Data Architect, creating clear and effective diagrams is crucial for communicating and documenting software and data architectures. The C4 model, with its focus on abstraction-first design—a principle I firmly believe is the backbone of software engineering—immediately caught my interest. To explore this further, I recently began experimenting with C4 modeling using Structurizr DSL, (DSL=Domain Specific Language) VSCode, and popular diagramming tools like PlantUML and Mermaid. I used Cairo and Graphviz in the past but these newer libraries require less tinkering. Here’s a look at my journey and the insights I gained along the way while trying the diagram as code approach.

Why C4 Models?

The C4 model is a powerful way to describe software systems at Context, Containers, Components, and Code—often referred to as the "4 Cs." Its simplicity, scalability, and developer-friendly approach make it a perfect fit for both new (greenfield) and existing (brownfield) projects.

Since I prefer to avoid cloud-based tools for richer experience and control, initially I set up a local environment using VSCode and Docker on my trusty old but fast Ubuntu laptop. This way, I can create while keeping everything offline and efficient. Looking at it again, I decided that even Docker is an overkill. I decided Vscode is enough to code and diagram.

My Setup

I took a quick look at the Structurizr DSL Python wrapper, but I also skipped it—I wanted to dive straight into the native DSL syntax and see my diagrams render with minimal overhead. After all, treating diagrams as code means I can , keeping everything clean and reproducible.

While I could have spun up Structurizr Lite in a Docker container (because who doesn’t love local, self-hosted solutions?), I went lighter—just VSCode extensions to get the job done. My philosophy? . No unnecessary layers, no cloud dependencies, just code and diagrams, the way it should be. 

They integrate seamlessly with wiki platforms (like Confluence, Notion, or GitLab/GitHub Wikis) and Git repositories, allowing you to embed dynamic, version-controlled diagrams directly in your documentation.

Tools in Action

  • Structurizr DSL: Writing diagrams as code in DSL in vscode and for better previews run their server on localhost
  • VSCode: With extensions for PlantUML and Mermaid, I could preview diagrams instantly in vscode.
  • PlantUML & Mermaid: Both tools integrated seamlessly with VSCode via extensions, though I found Mermaid’s syntax more intuitive for quick sketches and wiki integration. Mermaid has its own markup.

Outcomes

I successfully created Context, Container, and Component diagrams for a sample imaginary project. The ability to generate diagrams locally ensured full control and flexibility, no SaaS. Here are two examples of what I built:


Figure 1: Output from Structurizr server running on localhost:8080 in docker with code on the left generating the C4 model diagram on the right



Figure 2: Output from Mermaid vscode extension showing Mermaid code on the left generating the diagram on the right


Final Thoughts

I find the C4 model and tools like PlantUML and Mermaid are a game-changer for architecture documentation—it shifts the process from static, manual diagrams to code-driven, version-controlled clarity. By leveraging Structurizr DSL in VSCode and pairing it with Mermaid/PlantUML, I’ve crafted a workflow that’s both flexible and precise, giving me full control over how my systems are visualized.

There’s something deeply satisfying about coding your diagrams or misaligned Bézier curves. Just clean, maintainable DSL and instant visual feedback. I’m officially done with joining rectangles by hand; from now on, it’s code all the way.