0

I have some (complex to me) XML code that I need to convert into CSV, I need absolutely every value added to the CSV for every submission, I have tried a few basic things however I cant get past the deep nesting and the different structures of this file.

Could someone please help me with a powershell script that would, I have started but cannot get the output of all data out I only get Canvas Results

Submissions.xml To large to post here (102KB)

$d=([xml](gc submissions.xml)).CANVASRESULTS | % { 
  foreach ($i in $_.CANVASRESULTS) {
    $o = New-Object Object
    Add-Member -InputObject $o -MemberType NoteProperty -Name Submissions -Value $_.Submission
    Add-Member -InputObject $o -MemberType NoteProperty -Name Submission -Value $i
    $o
  }
}
$d | ConvertTo-Csv -NoTypeInformation -Delimiter ","
  • So here's my first thought... XML supports nested arrays, and CSV does not. Do you know if your XML has nested arrays? If it does you're going to need to iterate through any nested array, and figure out how you want to handle the data from it. – TheMadTechnician Oct 01 '15 at 22:54
  • That makes sense, Sounds like I am better of throwing this data into an SQL Database instead of exporting to CSV? I believe it is heavily nested, the link above will show you the data; I would like to hear your thoughts on how best to import the data so we are able to use it. –  Oct 01 '15 at 23:00
  • Yeah, in PowerShell terms you have objects nested 3 layers deep (at least). How to best access the data is kind of up to what your needs are. What are you really trying to accomplish here, what's the end game? Reporting? Metrics? – TheMadTechnician Oct 01 '15 at 23:11
  • What I am trying to achieve is to have all important data such as customers, Plant Hours, Dates, Names etc entered into a database from where I can then use a portal (yet to be created or designed) to search and query information for the purpose of reporting and reference. –  Oct 01 '15 at 23:21
  • As an example Plant Machine Hours which we use to invoice the customer, hours will be entered in every other day and the most recent hour entry that is the highest will be the referenced value for the report. This tool will be basic but specific and will ultimately be able to do queries on Customer X, for Site X for all or specific machines X and or Y Hours and movement dates. In addition we will use the hours on the plant to plan our service schedules if plant hours are 3000 then in 250 HRs a service is due. –  Oct 01 '15 at 23:21
  • In addition this is all towards pulling information entered via GoCanvas apps, they API however they have no documentation on the API downloads and how to configure them; I can use the API URL’s which is where I am getting my XML currently but I would much prefer to use their API assuming it as able to inject the submissions into a database. However I am at a loss. http://www.gocanvas.com/content/how-to/post/how-to-use-webservices-with-canvas –  Oct 01 '15 at 23:25
  • Ok, you're best off getting this data into SQL I think, but I'm not sure how best to do that. You're going to need to map out what tables you'll need (Users, Sites, Machines, etc), and how to link them as a first step I'm thinking. Then you probably _could_ use PS to load the data, but I'm betting there is a better way to map and import the data. – TheMadTechnician Oct 01 '15 at 23:33
  • How often is this data created? Do you have to store this XML again and again or do you query this one large XML and that's it? If you get this data again and again: will the structure be always the same? SQL Server is pretty good in dealing with XML natively. Could be an idea to take the XML into an XML typed column without any conversions and query this directly... Depends on your needs... – Shnugo Oct 01 '15 at 23:38
  • Data is created Daily / every hour. I can choose to pull the data from date to date, or every 10 minutes or every 24 hours, most likely will pull every 10 minutes, this way there are no duplicate entries to worry about. I believe the data structure will not change. However I do want to have historic data to review –  Oct 01 '15 at 23:42
  • @TroyThomas - you really need to understand how to properly design a relational database before you do this. It's not that difficult to understand the basics, and once you understand it you'll know how to the tables in the database need to be created and the relations between the tables. If you don't understand those concepts, you will be lucky to do it right, and I would highly recommend you get this right as it's really a pain in the butt to do it over again later if they want to add in some more data to it and the table structure will require redesign to accommodate it. – Tony Hinkle Oct 01 '15 at 23:54
  • Thanks Tony and All. I will end this question and go and do some research on relational database basics. Thanks for being understanding and helpful. –  Oct 01 '15 at 23:59

1 Answers1

2

Anytime a complex XML has deeply nested structures and you require migration into a flat file format (i.e., txt, csv, xlsx, sql), consider using XSLT to simplify your XML format. As information, XSLT is a declarative, special-purpose programming language used to style, re-format, re-structure XML/HTML and other SGML markup documents for various end-use purposes. Aside - SQL is also a declarative, special-purpose programming language.

For most softwares to import XML into flat file formats in two dimensions of rows and columns, XML files must follow repeating elements (i.e., rows/records) with one level of children for columns/fields:

<data>
  <row>
     <column1>value</column1>
     <column1>value</column1>
     <column1>value</column1>
     ...
  </row>
  <row>
  ...
</data>

Nearly every programming language maintains an XSLT processor including PowerShell, Java, C#, Perl, PHP, Python, SAS, even VBA with your everyday MS Excel. For your complex XML, below is an example XSLT stylesheet with following output. Do note I manually create nodes based on values from original XML:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:strip-space elements="*"/>
 <xsl:output method="xml" indent="yes"/>   

 <xsl:template match="CanvasResult">    
   <Data>    
    <xsl:for-each select="//Responses">
     <Submission>  
       <Fitter><xsl:value-of select="Response[contains(Label, 'Fitter Name')]/Value"/></Fitter>
       <Date><xsl:value-of select="Response[Label='Date']/Value"/></Date>
       <Time><xsl:value-of select="Response[Label='Time']/Value"/></Time>
       <Client><xsl:value-of select="Response[Label='Client']/Value"/></Client>
       <Machine><xsl:value-of select="Response[Label='Machine']/Value"/></Machine>
       <Hours><xsl:value-of select="Response[Label='Hours']/Value"/></Hours>
       <Signature><xsl:value-of select="Response[Label='Signature']/Value"/></Signature>
       <SubmissionDate><xsl:value-of select="Response[Label='Submission Date:']/Value"/></SubmissionDate>
       <SubmissionTime><xsl:value-of select="Response[Label='Submission Time:']/Value"/></SubmissionTime>
       <Customer><xsl:value-of select="Response[Label='Customer:']/Value"/></Customer>
       <PlantLocation><xsl:value-of select="Response[Label='Plant Location']/Value"/></PlantLocation>
       <PlantType><xsl:value-of select="Response[Label='Plant Type:']/Value"/></PlantType>
       <PlantID><xsl:value-of select="Response[Label='Plant ID:']/Value"/></PlantID>
       <PlantHours><xsl:value-of select="Response[Label='Plant Hours:']/Value"/></PlantHours>
       <RegoExpiryDate><xsl:value-of select="Response[Label='Rego Expiry Date:']/Value"/></RegoExpiryDate>
       <Comments><xsl:value-of select="Response[Label='Comments:']/Value"/></Comments>        
     </Submission>
    </xsl:for-each>      
  </Data>   
 </xsl:template>

</xsl:stylesheet>

Output

<?xml version='1.0' encoding='UTF-8'?>
<Data>
 ...
 <Submission>
    <Fitter>Damian Stewart</Fitter>
    <Date/>
    <Time/>
    <Client/>
    <Machine/>
    <Hours/>
    <Signature/>
    <SubmissionDate>28/09/2015</SubmissionDate>
    <SubmissionTime>16:30</SubmissionTime>
    <Customer>Dicks Diesels</Customer>
    <PlantLocation/>
    <PlantType>Dozer</PlantType>
    <PlantID>DZ09</PlantID>
    <PlantHours>2213.6</PlantHours>
    <RegoExpiryDate>05/03/2016</RegoExpiryDate>
    <Comments>Moving tomorrow from Daracon BOP to KCE BOP S6A Dam&#13;
Cabbie to operate</Comments>
  </Submission>
  ...
</Data>

From there, you can import the two-dimensional XML into a usable rows/columns format. Below are the same import into an MS Access Database and MS Excel spreadsheet. You will notice gaps in the data due to XML content not populating the created nodes (handled in XSLT). A simple SQL cleanup can render final dataset.

XML Import into Access Database

Database Import

XML Import into Excel spreadsheet

Community
  • 1
  • 1
Parfait
  • 104,375
  • 17
  • 94
  • 125