← All field notes
azuredata factoryfor security teams

Your data factory is already an exfiltration engine

A data factory stores the credentials to reach every store it connects, so an attacker who can author a pipeline gets ready-made exfiltration that looks like normal ETL.

Some tools are dangerous because they do exactly what they are built to do. A data factory moves data at scale, and that is precisely what makes it a perfect exfiltration engine.

How the attack works

An attacker with Data Factory contributor-style access authors a copy pipeline whose copy activity references the existing source linked services and a new destination linked service pointing outside the tenant. The pipeline authenticates to the source using the existing linked-service credential, or simply rides the factory’s managed identity, so the attacker never needs a new grant. A schedule trigger is attached for recurrence, and pipeline runs read sensitive datasets from the source store and write them to the attacker’s sink. Because copying data is the factory’s whole job, it looks like ordinary ETL. The pipeline-run and activity-run logs are the authoritative record of what moved where. This maps to T1530, Data from Cloud Storage, and T1537, Transfer Data to Cloud Account.

Why it works

Anyone who can author a pipeline inherits the factory’s data access through its stored credentials or an over-permissioned managed identity. The copy activity is the tool’s normal function, so detection has to focus on the destination and the recurrence, not the copy itself.

How to fix it

Deleting the destination dataset is a trap: the pipeline and credentials still work, so the attacker re-points the sink and the scheduled copy resumes. The real containment is to disable the rogue pipeline and its trigger and then remove the access it rode, by rotating the source linked-service credentials and scoping the factory managed identity to least privilege. Because the trigger was scheduled, scope every run from the pipeline-run logs, not just the first. Durably, vault linked-service secrets, restrict who can author pipelines and create external linked services, and alert on new external destinations.

Practice it

We built this as a GraphLattice Range scenario so security teams learn to cut the credential, not just the dataset, and to account for every scheduled run.