Read multiple partitions into one dataframe/rdd

Asked Jul 12 '18 at 15:00

Active Jul 12 '18 at 15:56

Viewed 53 times

val rdd = spark.read.format("csv")
                    .option("delimiter","\t").option("header", "false") 
                    .csv("/mnt/adls/myDb/myTb/s_year_month=201806/s_day=10")

Now this reads in data for a specific partition (20180610). Is there a way I can read all partitions in the myTb folder into one rdd? So it can later by accessed like this

SELECT * FROM  myDb.myTb WHERE (CONCAT(s_year_month, s_day) = '20180610')

If I did just wild card read, it would lose the partitioning aspect.

edited Jul 12 '18 at 15:56

Alper t. Turker

34,230
9
83
115

asked Jul 12 '18 at 15:00

test acc

@user8371915 Yes, I know how to load all files, however they lose its partitioning and it just becomes one big cluster with no partitioning . – test acc Jul 12 '18 at 16:08
1

Note `basePath` option. – Alper t. Turker Jul 12 '18 at 20:11

Read multiple partitions into one dataframe/rdd

0 Answers0