Azure Machine Learning Clemens Siebler  

Invoking Azure Machine Learning Pipelines from Azure Data Factory using DataPath

This is a quick post for showing how to call Azure Machine Learning Pipelines from Azure Data Factory. This includes passing data dynamically into the Machine Learning Pipeline using DataPath.

Pipeline Creation

First, let’s create an AzureML Pipeline that we can use for this example. Please note that this code is syntactically correct, but probably won’t run unless you adapt a few parameters, e.g., change the environment, adapt the data path, add a training script, etc.

Most notably, we publish the pipeline as a PublishedPipeline and then add it to a PipelineEndpoint. A PipelineEndpoint acts as a “router” for multiple PublishedPipelines, and presents a static URL to its callers. As we re-run this code, it’ll just add our new pipeline behind the current endpoint and sets it as the new default.

Furthermore, we are using DataPath and PipelineParameter to make the data input dynamic. DataPath allows us to specify an arbitrary path on a datastore as an input, and PipelineParameter allows to dynamically pass in the DataPath when invoking the pipeline.

In the next step, we’ll call the PipelineEndpoint from Azure Data Factory.

Setup in Data Factory

In Data Factory, first create a Linked Service to your Azure Machine Learning Workspace. Then create a new Pipeline and add the Machine Learning Execute Pipeline activity.

Next, we can configure the Machine Learning component:

From the workspace, we can first select the pipeline we would like to execute. For this, we select our newly created PiplineEndpoint as it allows swapping out the active AzureML Pipeline in the backend – without touching Azure Data Factory. Under Experiment name, we pass in the name under which the pipeline should be executed in AzureML. Lastly, we need to pass in the DataPath via a Data path assignment. For this, we need to put the name of the pipeline parameter(s) for the DataPath in the big text box, then click the small down arrow left to it and add:

  • DataStoreName: point to your AzureML Datastore name
  • RelativePath: point to your path inside the Datastore

In this example, training_data_path was defined in our code in line 18 (datapath_parameter = PipelineParameter(name="training_data_path", default_value=data_path)).

Finally, we can publish the ADF pipeline, and run it using Add trigger, then select Trigger now. Once it ran, we should see the results in our experiment in Azure Machine Learning Studio:

Looks good! We can see that the experiment was named properly and that the data was correctly pulled from what we set in Azure Data Factory.

Hope this quick tip was helpful!

Leave A Comment