How to lower the case of column names of a data frame but not its values?

Question

How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?

Input data frame (Imagine I have 100's of these columns in uppercase)

NAME | COUNTRY | SRC        | CITY       | DEBIT
---------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

taget dataframe

name | country | src        | city       | debit
------------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

Ramesh Maharjan · Answer 1 · 2018-02-08T03:48:26.963

31

If you are using scala, you can simply do the following

import org.apache.spark.sql.functions._
df.select(df.columns.map(x => col(x).as(x.toLowerCase)): _*).show(false)

And if you are using pyspark, you can simply do the following

from pyspark.sql import functions as F
df.select([F.col(x).alias(x.lower()) for x in df.columns]).show()

edited Feb 08 '18 at 03:48

answered Feb 08 '18 at 03:26

Ramesh Maharjan

41,071
6
69
97

I am looking for a raw sql solution mainly like `sparkSession.sql("query")` – user1870400 Feb 08 '18 at 12:18
@Ramesh Maharjan how to write it using Java? – Curious Techie Jan 02 '19 at 06:29
Thank You Ramesh. Especially for giving both syntaxes. It helped me first with the Python version, and now I have to rewrite all my code in scala (for production-grade purpose) so the Scala version was really welcome as well – Mehdi LAMRANI Dec 26 '19 at 21:02

score 8 · Answer 2 · answered Jan 02 '19 at 07:40

Java 8 solution to convert the column names to lower case.

import static org.apache.spark.sql.functions.col;
import org.apache.spark.sql.Column;

df.select(Arrays.asList(df.columns()).stream().map(x -> col(x).as(x.toLowerCase())).toArray(size -> new Column[size])).show(false);

shakedzy · Answer 3 · 2018-02-08T00:12:10.520

How about this:

Some fake data:

scala> val df = spark.sql("select 'A' as AA, 'B' as BB")
df: org.apache.spark.sql.DataFrame = [AA: string, BB: string]

scala> df.show()
+---+---+
| AA| BB|
+---+---+
|  A|  B|
+---+---+

Now re-select all columns with a new name, which is just their lower-case version:

scala> val cols = df.columns.map(c => s"$c as ${c.toLowerCase}")
cols: Array[String] = Array(AA as aa, BB as bb)

scala> val lowerDf = df.selectExpr(cols:_*)
lowerDf: org.apache.spark.sql.DataFrame = [aa: string, bb: string]

scala> lowerDf.show()
+---+---+
| aa| bb|
+---+---+
|  A|  B|
+---+---+

Note: I use Scala. If you use PySpark and are not familiar with the Scala syntax, then df.columns.map(c => s"$c as ${c.toLowerCase}") is map(lambda c: c.lower(), df.columns) in Python and cols:_* becomes *cols. Please note I didn't run this translation.

Why this is not selected as answer since the OP is asking for "raw sql"? I feel this the is the closest to what OP asked. — panc, Jun 14 '21 at 19:19

Artem Popov · Accepted Answer · 2023-02-07T19:39:42.573

for Java 8

Dataset<Row> input;
for (StructField field : input.schema().fields()) {
   String newName = field.name().toLowerCase(Locale.ROOT);
   input = input.withColumnRenamed(field.name(), newName);
   if (field.dataType() instanceof StructType) {
       StructType newStructType = (StructType) StructType.fromJson(field.dataType().json().toLowerCase(Locale.ROOT));
       input = input.withColumn(newName, col(newName).cast(newStructType));
   }
}

score 0 · Answer 5 · answered Jul 19 '19 at 16:01

0

You can use df.withColumnRenamed(col_name,col_name.lower()) for spark dataframe in python

answered Jul 19 '19 at 16:01

Harshit Mehta

328
1
2
11

How to lower the case of column names of a data frame but not its values?

5 Answers5

Linked