在 Scala 中计算行百分比

时间:2021-03-24 15:25:29

标签: scala apache-spark apache-spark-sql

scala 还是个新手。我正在尝试计算 Scala 中各行的百分比。考虑以下df

val df = Seq(("word1", 25, 75),("word2", 15, 15),("word3", 10, 30)).toDF("word", "author1", "author2")

df.show

+-----+-------+-------+
| word|author1|author2|
+-----+-------+-------+
|word1|     25|     75|
|word2|     15|     15|
|word3|     10|     30|
+-----+-------+-------+

我知道我可以使用如下代码并获得预期的输出,但是我想知道是否有更好的方法来做到这一点:

val df_2 = df
  .withColumn("total", $"author1" + $"author2")
  .withColumn("author1 pct", $"author1"/$"total")
  .withColumn("author2 pct", $"author2"/$"total")
  .select("word", "author1 pct", "author2 pct")

df_2.show

+-----+-----------+-----------+
| word|author1 pct|author2 pct|
+-----+-----------+-----------+
|word1|       0.25|       0.75|
|word2|        0.5|        0.5|
|word3|       0.25|       0.75|
+-----+-----------+-----------+

奖励积分以百分比格式提供,带有“%”且没有小数。谢谢!

1 个答案:

答案 0 :(得分:1)

也许你可以直接计算并选择百分比,而不是使用.withColumn,并使用concat在末尾添加一个%符号:

val df2 = df.select(
    $"word", 
    concat(($"author1"*100/($"author1" + $"author2")).cast("int"), lit("%")).as("author1 pct"), 
    concat(($"author2"*100/($"author1" + $"author2")).cast("int"), lit("%")).as("author2 pct")
)

df2.show
+-----+-----------+-----------+
| word|author1 pct|author2 pct|
+-----+-----------+-----------+
|word1|        25%|        75%|
|word2|        50%|        50%|
|word3|        25%|        75%|
+-----+-----------+-----------+

如果你想保留数字数据类型,那么你可以这样做

val df2 = df.select(
    $"word", 
    ($"author1"*100/($"author1" + $"author2")).cast("int").as("author1 pct"), 
    ($"author2"*100/($"author1" + $"author2")).cast("int").as("author2 pct")
)
相关问题