在类之间共享SparkContext / Session的Pythonic方法是什么?

时间:2016-09-01 22:48:25

标签: apache-spark pyspark

假设我有两个类,每个类都使用Spark。目前,我在其中一个类的$i = 1; $class_keys = array(2, 8, 12); $output = '<ul>'; foreach($rows as $key => $row) { // ^ use this $default_class = 'grid__item'; if(in_array($key + 1, $class_keys)) { $default_class .= " grid__item--deco grid__item--deco-{$i}"; $i++; } $output .= "<li class=\"$default_class\"></li>"; } $output .= '<ul>'; 方法中初始化SparkSession。但是现在我想写一个新类,它也会调用Spark。什么是Pythonic的方法呢?

2 个答案:

答案 0 :(得分:2)

您可以将spark上下文传递给__init__方法,例如:

class MySparkCallingClass:
    def __init__(self, sc):
        self.sc = sc

答案 1 :(得分:0)

我已经接受了@maxymoo的答案,但为了完整起见,我想表明我将如何做到这一点:

from pyspark.sql import SparkSession

class SparkWrapper:
    def __init__(self):
        self._spark = None

    def __del__(self):
        self._spark.stop()

    def __enter__(self):
        self._spark = SparkSession.builder.appName('SparkApp').getOrCreate()
        return self

    def __exit__(self):
        self._spark.stop()

    @property
    def spark(self):
        return self._spark

class UsesSparkClass:
    def __init__(self, sc):
        self._spark = sc

def main():
    with SparkWrapper() as sc:
        model = UsesSparkClass(sc.spark)