在Mac上安装分布式tensorflow

时间:2016-09-22 19:27:33

标签: tensorflow

有人可以提供显示如何执行上述操作的链接吗?我尝试了所有相关的地方,但我找不到程序。如果我只是按照TF网站上的描述安装TF for Mac,默认情况下这会给我分发版本吗?

---安装了GPU版本的TF并运行答案中给出的测试脚本---

(tensorflow) acbc32a44fc1:~ z001jly$ python test.py 
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.dylib locally
Traceback (most recent call last):
File "test.py", line 2, in <module>
import tensorflow as tf
File "/Users/z001jly/anaconda/lib/python2.7/site-packages/tensorflow/__init__.py", line 23, in <module>
from tensorflow.python import *
File "/Users/z001jly/anaconda/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 48, in <module>
from tensorflow.python import pywrap_tensorflow
File "/Users/z001jly/anaconda/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/Users/z001jly/anaconda/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in    swig_import_helper_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)

ImportError: dlopen(/Users/z001jly/anaconda/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: @rpath/libcudart.7.5.dylib

Referenced from: /Users/z001jly/anaconda/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found

如果我将它与TF的CPU版本一起使用,脚本将成功运行。

1 个答案:

答案 0 :(得分:0)

这是官方TensorFlow二进制文件的一部分。您可以在下面运行脚本来检查它是否有效,应该看到“成功”

import subprocess
import tensorflow as tf
import time
import sys

flags = tf.flags
flags.DEFINE_string("port1", "12222", "port of worker1")
flags.DEFINE_string("port2", "12223", "port of worker2")
flags.DEFINE_string("task", "", "internal use")
FLAGS = flags.FLAGS

# setup local cluster from flags
host = "127.0.0.1:"
cluster = {"worker": [host+FLAGS.port1, host+FLAGS.port2]}
clusterspec = tf.train.ClusterSpec(cluster).as_cluster_def()

def run():
  dtype=tf.int32
  params_size = 1

  with tf.device("/job:worker/task:0"):
    params = tf.get_variable("params", [params_size], dtype,
                             initializer=tf.zeros_initializer)
  with tf.device("/job:worker/task:1"):
    update_variable = tf.get_variable("update_variable", [params_size], dtype,
                                      initializer=tf.ones_initializer)
    add_op = params.assign_add(update_variable)

  init_op = tf.initialize_all_variables()

  # launch distributed service
  def runcmd(cmd): subprocess.Popen(cmd, shell=True, stderr=subprocess.STDOUT)
  runcmd("python "+sys.argv[0]+" --task=0")
  runcmd("python "+sys.argv[0]+" --task=1")
  time.sleep(1)

  sess = tf.Session("grpc://"+host+FLAGS.port1)
  sess.run(init_op)
  print("Adding 1 on %s to variable on %s"%(update_variable.device,
                                            params.device))
  result = sess.run(add_op)
  if result == [1]:
    print("Success")


if __name__=='__main__':
  if not FLAGS.task:
    run()

  else: # Launch TensorFlow server
    server = tf.train.Server(clusterspec,
                             job_name="worker",
                             task_index=int(FLAGS.task),
                             config=tf.ConfigProto(log_device_placement=True))
    server.join()