Azure ML token theft: when a notebook schedules its own persistence
A notebook on ML compute steals the workspace identity token, then schedules a job that re-mints it on a timer. Deleting one compute does not help. Here is what kills the loop.
Azure Machine Learning compute runs with a managed identity so jobs can reach storage and Key Vault. Any notebook on that compute can ask the metadata endpoint for the token, and that is the opening.
How the attack works
An interactive notebook runs on a compute instance and shells out to the instance metadata identity endpoint, returning the workspace and compute managed-identity token. That token reaches the workspace default storage account and Key Vault. The attacker then establishes persistence by creating a recurring pipeline schedule that re-acquires the token on every run, and attaches an extra compute target as a second foothold for re-minting. In ATT&CK terms this chains T1651, Cloud Administration Command, with T1552, Unsecured Credentials, T1078.004, Valid Accounts: Cloud Accounts, and persistence under T1525.
Why it works
The workspace identity carried broad role assignments, and compute could be created, attached, and scheduled freely. Normal training uses the SDK, not raw metadata-endpoint calls, so the only thing the broad identity needed was a notebook to ask for its token.
How to fix it
You cannot recall an Entra token that was already issued, and a scheduled job keeps minting fresh ones. The non-obvious move is to scope the workspace and compute managed identity to least privilege so every token, including fresh ones, authorizes nothing, then delete the rogue scheduled job and the extra attached compute so the re-minting loop dies. Deleting only the one compute the notebook ran on leaves the schedule and the second compute alive. Scope the access from Azure ML run history plus storage and Key Vault diagnostic logs and the identity’s Entra sign-ins, since role assignments show capability, not activity.
Practice it
We built this as a GraphLattice Range scenario so responders can rehearse the token theft, the persistence loop, and the scope-and-remove containment that breaks it.