Illegal or Misleading Connections Between Elements

Estimated reading: 7 minutes 8 views

When I first started mentoring junior analysts, one of the most frequent red flags I’d see wasn’t in the logic of the process, but in the connections themselves. A data store linked directly to another data store? An external entity sending data to another external entity? These aren’t just visual glitches—they’re violations of core DFD principles that break the model’s integrity.

These illegal DFD connections aren’t random mistakes. They stem from misunderstanding the fundamental role of processes as transformers of data. Every data flow must begin and end at a process, or at a boundary point—external entity or data store. A flow that skips the process is invisible to the system’s logic. It’s like a river flowing without a source, or a pipe with no pump.

Understanding why these connections are invalid isn’t about memorizing rules—it’s about grasping the purpose of DFDs: to model *what changes*, not just where data lives. You need to see data flow as a story of transformation, not mere movement. When you think this way, the logic becomes intuitive.

Here, you’ll learn not just what to avoid, but why. We’ll walk through three common illegal DFD connections with real-world examples, corrected versions, and the reasoning behind each fix. By the end, you’ll have the confidence to audit your own models with precision.

Why Certain Data Flows Break the Rules

Classic DFD theory is clear: data flows must originate from or terminate at a process. Data stores and external entities are passive containers. They don’t generate or consume data—they hold it.

Any flow that goes directly from one data store to another, or from one external entity to another, bypasses this rule. These flows are not just incorrect—they create ambiguity. Who decides when to move data? What triggers the transfer? The model loses its causal logic.

Even more troubling: such flows often hide assumptions. A data store to data store error might imply a behind-the-scenes job, but that job isn’t modeled. This leads to untraceable data movement, which becomes a major source of bugs and audit failures.

Rule of Thumb: Every Flow Must Touch a Process

Ask yourself: does this flow begin or end at a process? If not, it’s an invalid DFD data flow.

Even if the flow appears to “make sense” at first glance—like customer data moving from one database to another—it still violates the principle. The transfer must be triggered by a process. That process could be a batch job, a synchronization routine, or a business rule. But it must exist.

Common Anti-Patterns and How to Fix Them

1. Data Store to Data Store Error

Imagine a system where customer data is stored in a “Customer DB” data store, and a backup is kept in a “Legacy Archive” data store. A line connects these two directly. This is a data store to data store error.

Why it fails: Data stores are not active agents. They don’t initiate transfers. Moving data between them requires a process—like “Sync Customer Data Nightly.”

Corrected Version:

Insert a process: “Sync Customer Data Nightly“
Draw input flow from “Customer DB” to the process
Draw output flow from the process to “Legacy Archive”

Now the flow is traceable. The process defines *when* and *why* the data moves. It can be tested, monitored, and audited.

2. External to External Flow DFD

Consider a DFD where a “Vendor System” sends data directly to a “Customer Portal.” This is an external to external flow DFD—a classic anti-pattern.

Why it fails: External entities are endpoints. They represent sources or sinks of data. Data can’t move directly between them without a process to mediate the transfer.

Corrected Version:

Introduce a process: “Process Vendor Data for Customer Portal“
Draw input flow from “Vendor System” to the process
Draw output flow from the process to “Customer Portal”

Now the data transformation—formatting, validation, enrichment—becomes explicit. This process can be reviewed, optimized, and versioned.

3. Flows with No Process (Black Box Data)

Some DFDs show a flow entering or leaving a data store with no process in sight. For example, a flow labeled “Send Invoice Data” appears from thin air, ending at a “Payment DB.”

This is a classic invalid DFD data flow. The data appears to “magically” appear or disappear, which violates the principle of conservation of data. Every data element must be accounted for through a transformation.

Corrected Version:

Identify the missing process: “Generate and Send Invoice“
Add it to the diagram
Connect “Invoice Request” to the process as input
Connect “Sent Invoice” from the process to “Payment DB”

This ensures the flow is driven by a real business or technical action. It also makes testing and implementation possible.

Underlying Principles Behind the Rules

These rules aren’t arbitrary. They stem from four core principles of DFD modeling:

Transformation > Movement: DFDs model changes in data states, not just physical transfers.
Processes are Agents of Change: Only processes can create, modify, or destroy data.
Traceability Matters: Every flow must have a clear origin and purpose.
Separation of Concerns: External entities and data stores are passive; processes are active.

When you break these, you lose control. A data store to data store error may seem harmless, but it erases the audit trail of who, when, and why data was moved. That’s a compliance risk.

Practical Checklist: Spotting Illegal DFD Connections

Use this checklist when reviewing or creating DFDs:

Does every data flow begin and end at a process, external entity, or data store?
Are there any flows between two data stores? If yes, is there a process mediating the transfer?
Are there any flows between two external entities? If yes, is there a process handling the exchange?
Do all flows have clear input and output processes? Can you name the transformation?
Is there a process that explains how data moves between systems or databases?

Apply this checklist during peer reviews. It’s a simple but powerful way to catch issues early.

Frequently Asked Questions

Can a process have no input flows?

Yes, but only if it’s generating data from scratch—like a “Generate Daily Report” process. Input flows are not required, but output flows must exist. The process must still be clearly defined.

What if data moves between data stores automatically via a script?

Even if automatic, the script must be modeled as a process. Otherwise, it’s invisible to the DFD. Name it “Run Daily Sync Script” and connect it to both data stores.

Why can’t I just draw a flow between data stores to save space?

Because it hides critical logic. A flow without a process is a black box. You can’t test it, monitor it, or explain it. It defeats the entire purpose of modeling.

Are there exceptions to the rule?

Only in rare cases involving system-level integration where the transfer is fully automated and not under business control. Even then, the process should be modeled as a “system event” to maintain traceability.

How do I explain this to stakeholders who prefer simple diagrams?

Frame it as risk reduction. A model with illegal DFD connections may look clean, but it hides where data can be lost, corrupted, or improperly transferred. Transparency builds trust and reduces rework.

Can a DFD have a loop from a data store back to itself?

Only if it’s part of a process that retrieves and updates the same store. For example, “Update Customer Record” pulls data from the store, modifies it, and writes it back. The flow is not direct—it goes through a process.