1

Initial data is in Dataset<Row> and I am trying to write to csv file an each cell value to be placed in quotes.

result.coalesce(1).write()
            .option("delimiter", "|")
            .option("header", "true")
            .option("nullValue", "")
            .option("quoteMode", "ALL_NON_NULL")
            .csv(Location);

Expected output:

null
"London"|"UK"
"Delhi"|"India"
"Moscow"|"Russia"

Current Output:

null
London|UK
Delhi|India
Moscow|Russia

Spark version is 2.3

Ram Grandhi
  • 397
  • 10
  • 27

2 Answers2

3

As @Oli answered, the first option you have is "quoteMode" in CSV writer.

If you need more control, then you can use a concat function on all your columns to prefix and suffix your values with a quote. example below

import org.apache.spark.sql.functions.{concat, lit, col}

val df = Seq(
("1","a",null,"c"),
("3",null,"d","c"),
("4","a","b",null)
).toDF("id","A","B","C")

df.show()

+---+----+----+----+
| id|   A|   B|   C|
+---+----+----+----+
|  1|   a|null|   c|
|  3|null|   d|   c|
|  4|   a|   b|null|
+---+----+----+----+

val dfquotes = df.select(df.columns.map(c => concat(lit("\""), col(c), lit("\"")).alias(c)): _*)

dfquotes.show()

+---+----+----+----+
| id|   A|   B|   C|
+---+----+----+----+
|"1"| "a"|null| "c"|
|"3"|null| "d"| "c"|
|"4"| "a"| "b"|null|
+---+----+----+----+
Remis Haroon - رامز
  • 3,304
  • 4
  • 34
  • 62
2

"quoteMode" is an option of databrick's CSV writer. Here you are using spark's built in CSV writer which does not support that option. Have a look at this page for the available options.

In your case, the option you are looking for is .option("quoteAll", true).

Oli
  • 9,766
  • 5
  • 25
  • 46
  • What if I want quotes only for non-null values? Is there any option in place of quoteAll? – Ram Grandhi Feb 04 '20 at 15:33
  • I am not sure that's possible with the standard csv writer. Just out of curiosity, why do you need quotes everywhere? – Oli Feb 04 '20 at 15:38
  • This is a already existing code in Spark 1.6. It is generating the quotes. Now while trying to upgrade to 2.3 it is not giving quotes. Our business users don't want any kind of changes to output. I am new to spark and this project as well – Ram Grandhi Feb 05 '20 at 07:46
  • 1
    Well, I don't have any easy solution for you. You can either find a way to keep using spark 1.6 (with yarn for instance) or code your own CSV writer (this is not that complicated). – Oli Feb 05 '20 at 11:11