1
val rdd = spark.read.format("csv")
                    .option("delimiter","\t").option("header", "false") 
                    .csv("/mnt/adls/myDb/myTb/s_year_month=201806/s_day=10")

Now this reads in data for a specific partition (20180610). Is there a way I can read all partitions in the myTb folder into one rdd? So it can later by accessed like this

SELECT * FROM  myDb.myTb WHERE (CONCAT(s_year_month, s_day) = '20180610')

If I did just wild card read, it would lose the partitioning aspect.

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
test acc
  • 561
  • 2
  • 11
  • 24

0 Answers0