2

I have a Java service writing and reading Parquet files using parquet-avro 1.11.1.

Now as you probably know this requires another dependency (which is not included in parquet-avro) for some hadoop classes, for example:

org.apache.hadoop.conf.Configuration
org.apache.hadoop.fs.Path

I ended up using hadoop-core 1.2.1, which works. The thing is, this library is from 2013, and it's the latest version! I wonder if there's a newer alternative? I tried the following:

Both compile but don't work without a native Hadoop installation, which I'd like to avoid - I only want the classes needed for working with Parquet.

Eugene Marin
  • 1,706
  • 3
  • 23
  • 35
  • Take a look at the IntelliJ plugin that I wrote to read Parquet files - https://github.com/benwatson528/intellij-avro-parquet-plugin/blob/master/build.gradle. I had to fight with the same problems that you're encountering now. I avoid `Path` entirely by using `LocalInputFormat`. Ignore the local 1.11.1-SNAPSHOT JAR and just pretend I'm using 1.11.0. – Ben Watson Mar 15 '21 at 08:50
  • Also https://stackoverflow.com/questions/59939309/read-local-parquet-file-without-hadoop-path-api might help you as it outlines my process for solving this problem. – Ben Watson Mar 15 '21 at 11:11

0 Answers0