13

How do we save data inside of an XML payload to blob storage?

input

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>sdfsdfjijOIJOISJDFQPWORPJkjsdlfkjlksdf==</bytes>
</root>

desired result

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>/blob/path/toSavedPayload</bytes>
</root>
  1. save bytes somewhere in blob
  2. replace bytes with URI of where bytes were saved

How do we use data factory to extract a node from XML and save it to blob?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062
  • Need a bit more info - define the XML payload? Do you mean it's a file in data lake or is returned in the ADF pipeline from a web or database call? – wBob Jan 17 '20 at 15:08
  • xml blob is stored in a storage account, i'm looking for functionality that is similar to a regular XSLT transform – Alex Gordon Jan 17 '20 at 15:11
  • I used Azure Batch Custom Activity to solve this problem. Here is another SO answer where I mentioned that: https://stackoverflow.com/questions/59938374/converting-xml-files-to-json-or-csv – Joel Cochran Mar 06 '20 at 16:48

3 Answers3

2

Currently, ADF doesn’t support XML natively. But

  1. You may write your own code and then use custom activity of ADF.
  2. SSIS has built-in support for XML as a source. Maybe you could take a look.
Fang Liu
  • 2,325
  • 2
  • 13
  • 18
  • just wondering, why are you mentioning SSIS? are you saying we can export from SSIS and import into ADF? – Alex Gordon May 21 '19 at 14:29
  • 1
    Yes, you can "lift and shift" your existing SSIS packages to a cloud instance of integration services. Read more here: https://learn.microsoft.com/en-us/sql/integration-services/lift-shift/ssis-azure-lift-shift-ssis-packages-overview?view=sql-server-2017 – Martin Esteban Zurita May 24 '19 at 12:02
  • @MartinEstebanZurita would love an example of an SSIS package that does an XML transform and then how that can be lifted and shifted into ADF – Alex Gordon May 28 '19 at 14:22
  • XML is now supported in ADF ... https://learn.microsoft.com/en-us/azure/data-factory/format-xml – SteveC Sep 08 '20 at 10:21
2

For that case you have to use some custom code to do this. I would choose from these options

  • Azure Functions - only for some simple data processing
  • Azure Databricks - in the case you need to process some big XML data
y0j0
  • 3,369
  • 5
  • 31
  • 52
  • XML is now supported in ADF ... https://learn.microsoft.com/en-us/azure/data-factory/format-xml – SteveC Sep 08 '20 at 10:21
2

As Azure Data Factory does not support XML natively, I would suggest you to go for SSIS package.

  1. In the Data flow task, have XML source and read bytes from the xml into a variable of DT_Image datatype.
  2. Create a script task, which uploads the byte array (DT_Image) got in step no.1 to azure blob storage as mentioned in the below. Code slightly modified for the requirement. Reference of SO post
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;    

// Retrieve storage account from connection string.
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse("StorageKey");

// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");

// Retrieve reference to a blob named "myblob".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("myblob");

byte[] byteArrayIn = Dts.Variables["User::ImageVariable"].Value;

// Create or overwrite the "myblob" blob with contents from a local file.
using (var memoryStream = new MemoryStream(byteArrayIn);)
{
    blockBlob.UploadFromStream(memoryStream);
}
  1. Now, host this SSIS Package in SSIS Runtime in Azure Data Factory and execute the SSIS package.

SSIS Runtime in Azure DataFactory

Venkataraman R
  • 12,181
  • 2
  • 31
  • 58
  • 1
    XML is now supported in ADF ... https://learn.microsoft.com/en-us/azure/data-factory/format-xml – SteveC Sep 08 '20 at 10:21