1

I use a command to read a JSON file, this all works perfectly, until the file becomes large.

I currently have a JSON file of about 1.5GB. I read the file using Powershell using the following command:

get-content -Path C:\TEMP\largefile.json | out-string | ConvertFrom-Json

It returns the following error:

out-string : Exception of type 'System.OutOfMemoryException' was thrown.
+ ... oices = get-content -Path C:\TEMP\largefile.json | out-string | Conve ...
+                                                        ~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [Out-String], OutOfMemoryException
+ FullyQualifiedErrorId : System.OutOfMemoryException,Microsoft.PowerShell.Commands.OutStringCommand

I've increased the memory as shown here:

get-item wsman:localhost\Shell\MaxMemoryPerShellMB


WSManConfig: Microsoft.WSMan.Management\WSMan::localhost\Shell

Type            Name                           SourceOfValue   Value
----            ----                           -------------   -----
System.String   MaxMemoryPerShellMB                            8096

Any ideas on how to process this?

Edit (additions based on comments):

When I remove the out-string I get this error:

ConvertFrom-Json : Exception of type 'System.OutOfMemoryException' was thrown.
    + ... oices = get-content -Path C:\TEMP\largefile.json | ConvertFrom-Json ...
    +                                                        ~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Out-String], OutOfMemoryException
    + FullyQualifiedErrorId : System.OutOfMemoryException,Microsoft.PowerShell.Commands.OutStringCommand

The Powershell version that I have is: 5.1.17763.1490

The file contains multiple columns regarding PDF files. These files are exported via an API into a JSON so it contains the file metadata such as owner and when it was created but also the actual PDF file in the column Body which later will be decoded to an actual PDF file. The structure is as followed:

[{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"}
{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"}
{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"}
{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"}
{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"}
]
vulkoek
  • 23
  • 5
  • 1
    Have you tried removing `| Out-String`? `Get-Content` already returns strings and `ConvertFrom-Json` is able to stich them back together on its own – Mathias R. Jessen May 03 '21 at 12:23
  • It returns the same error. – vulkoek May 03 '21 at 12:38
  • Obvious idea would be to either not use PowerShell, or use simplified logic with some text preprocessing that doesn't rely on converting the whole thing at once but only parts of the document. The overhead from converting JSON to a dynamic object is considerable (both in memory and in time) and there's only so much increasing the memory limits can do. You can process JSON incrementally by leveraging a library like Json.NET and using `JsonTextReader`, but that's a little more clunky in PowerShell. – Jeroen Mostert May 03 '21 at 13:05
  • 3
    "***It returns the same error.***" How can it return the same `out-string : Exception` error if it has been removed??? [Please help us to be able to help you](https://stackoverflow.com/help/how-to-ask). Also share any details (or the whole/partly/structure) of the json file, e.g. is the top structure an array (starting with `[`) or an dictionary (starting with `{`)? What PowerShell version are you using? (note that there are quiet some difference with regards to the `json` engine used). Did you [validate the json](https://jsonformatter.curiousconcept.com/) file? – iRon May 03 '21 at 13:27
  • Hi @iRon, thanks for the feedback, i've added the additions to the main post. Let me know if more information is needed to help me out. – vulkoek May 03 '21 at 13:37

1 Answers1

3

Thank for the details.
For this issue I would try to convert each line separately and stream that through your process:

Get-Content C:\TEMP\largefile.json | ForEach-Object {
    $_ = $_.Trim().TrimStart('[').TrimEnd(']')
    if ($_) { $_ | ConvertFrom-Json }
}

As already suggested, I wouldn't be surprised if these memory issues wouldn't appear in PowerShell core. if possible, I recommend you to also give that a try.

iRon
  • 20,463
  • 10
  • 53
  • 79
  • 1
    It looks like this was helpfull, let me try and work with it. Will let you know, thanks – vulkoek May 03 '21 at 14:19
  • Ok, this is getting somewhere, the only problem that I have is that I need to add that row into a variable, but I can't seem to get that working. So foreach-object I need to do some actions so I need it in a variable to process and after that process the foreach-object has to replace the variable with the new row. – vulkoek May 04 '21 at 08:27
  • Good to hear. I am not sure if I am still on the same page but as you ar dealing with memory issues, it is important to respect the PowerShell pipeline from the start to end here. Meaning if you need specific rows, you might do something like `... |Where-Object ID -eq 'MyID' | ...`. If add the other hand, you need to change a few rows but require them *all* to passthrough until the last cmdlet, you might do something like `... |ForEach-Object { if ($_.Id -eq 'MyID') { $_.Name = 'test' }; $_ } | ...`. – iRon May 04 '21 at 10:21
  • If this doesn't help you further, I recommend you to [accept](https://stackoverflow.com/help/someone-answers) this answer and open a new one, referring to this one and more details where you currently at ([mcve], actual results, and expected results). – iRon May 04 '21 at 10:21
  • Thanks, i'll create a new question. Again thanks for your help you really made the difference! – vulkoek May 04 '21 at 12:47
  • 1
    I've created the new question here: https://stackoverflow.com/questions/67385194/powershell-foreach-object-column-variables – vulkoek May 04 '21 at 12:55