Question

我正在尝试在MacOS Mojave PyCharm for Anaconda 2019.1.2 Pro上设置远程conda解释器，但无法正常工作。我现有的远程conda环境（v4.5.12）在Amazon's Deep Learning AMI实例化的Ubuntu 16 EC2计算机上运行

我尝试了setting up an ssh-interpreter，并将其定向到：/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python，这是我的conda环境。然后，我尝试在此解释器上运行一个简单的Tensorflow GPU测试，并收到以下消息，强烈表明未激活该环境：（故意混淆了服务器的IP地址和公司名称）

ssh://ubuntu@xx.xx.xx.xx:22/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python -u /home/ubuntu/company/DeepLearning_copy/apps/test_gpu.py
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/company/DeepLearning_copy/apps/test_gpu.py", line 1, in <module>
    import tensorflow as tf
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

Process finished with exit code 1

当SSH进入服务器时，先运行conda activate tensorflow_p36，然后再运行python gpu_test.py，该代码可以完美运行。

我希望能使用现有的远程conda环境进行远程调试的任何解决方法。同时，我已经打开an issue with JetBrains，并打开了Anaconda community group。

Answer 1

您可以做的是：

转到“运行/调试配置”
在“环境”下，您可以看到“环境变量”
您必须设置到cuda的正确路径。就我而言，它是：“ LD_LIBRARY_PATH = / usr / local / cuda-9.0 / lib64”

我也很失望，JetBrains团队默认没有这样做。

Answer 2

我认为这是一个cuda错误。 Cuda配置不正确。您在使用tensorflow-gpu吗？

Answer 3

OP，这可能是某人对您的环境所做的事情弄乱了CUDA安装，就像其他人提到的那样。

我刚刚在AWS上配置了一个新的深度学习AMI实例-这对您来说可行吗？

无论如何，在ssh配置到（新配置的）服务器之后，我执行了以下步骤：

初始激活

$ conda activate tensorflow_p36
WARNING: First activation might take some time (1+ min).
Installing TensorFlow optimized for your Amazon EC2 instance......
Env where framework will be re-installed: tensorflow_p36
Instance p2.xlarge is identified as a GPU instance, removing tensorflow-serving-cpu
Installation complete.

方案1：在tensorflow_p36的conda环境中运行GPU测试：

执行此操作以确保Tensorflow在OP的情况下正常运行。

$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> # Creates a graph.
... a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>> # Creates a session with log_device_placement set to True.
... sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
>>> # Runs the op.
... print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]

方案2：停用环境，并像在环境中一样调用相同的python可执行文件。

此应该与配置远程解释器以使用特定的python解释器相同。请注意，与上述情况相比，sess = tf.Session(...)之后的输出要多得多，但是一切仍然可以正常进行。

$ conda deactivate
$ /home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python

Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> # Creates a graph.
... a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>> # Creates a session with log_device_placement set to True.
... sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-05-31 07:14:23.840474: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-31 07:14:23.841300: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55ec160ca020 executing computations on platform CUDA. Devices:
2019-05-31 07:14:23.841334: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-05-31 07:14:23.843647: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300060000 Hz
2019-05-31 07:14:23.843845: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55ec16131af0 executing computations on platform Host. Devices:
2019-05-31 07:14:23.843870: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-05-31 07:14:23.844965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.11GiB
2019-05-31 07:14:23.844992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-31 07:14:23.845991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-31 07:14:23.846013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-31 07:14:23.846020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-31 07:14:23.846577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-05-31 07:14:23.847176: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7

>>> # Runs the op.
... print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:14:25.478310: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:14:25.478383: I tensorflow/core/common_runtime/placer.cc:1059] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:14:25.478413: I tensorflow/core/common_runtime/placer.cc:1059] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]

方案3：现在尝试在PyCharm Python控制台中使用Jetbrains PyCharm将特定的conda环境解释器用作远程解释器

请注意，输出基本上与上述方案2中的输出相同，但是Tensorflow GPU测试可以正常工作，并且不会引发任何错误。

ssh://ubuntu@XX.XX.XX.XX:22/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python -u /home/ubuntu/.pycharm_helpers/pydev/pydevconsole.py --mode=server

Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 6.4.0
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux

import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
2019-05-31 07:17:03.883169: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-31 07:17:03.883577: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55be28eef280 executing computations on platform CUDA. Devices:
2019-05-31 07:17:03.883609: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-05-31 07:17:03.886035: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300060000 Hz
2019-05-31 07:17:03.886752: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55be28f56d50 executing computations on platform Host. Devices:
2019-05-31 07:17:03.886777: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-05-31 07:17:03.886983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 508.38MiB
2019-05-31 07:17:03.887009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-31 07:17:03.887658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-31 07:17:03.887681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-31 07:17:03.887697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-31 07:17:03.887881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 283 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-05-31 07:17:03.889133: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:17:03.890673: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:17:03.890718: I tensorflow/core/common_runtime/placer.cc:1059] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:17:03.890750: I tensorflow/core/common_runtime/placer.cc:1059] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]

Answer 4

代替指定“ python”的路径，它对我来说可以这样指定“激活”的路径：

ssh [host] "source ~/anaconda3/bin/activate [name of conda env] ; cd [pick a dir] ; [command]"

对于[command]，请尝试使用“ conda env list”来查看激活了哪个环境。或者，您也可以执行“ python foo.py”。

您可能需要调整路径“〜/ anaconda3 / bin / activate”。

设置PyCharm远程conda解释器

4 个答案: