Switching from NVIDIA to AMD (including tensorflow)

I have been using my Geforce 1060 extensively for deep learning, both with Python and R. But the always painful play with the closed source drivers and kernel updates, paired with the collapse of my computer’s PSU and/or GPU, I decided to finally do the switch to AMD graphic card and open source stack. And you know what, within half a day I had everything, including Tensorflow running. Yeah to Open Source!

Preliminaries

So what is the starting point: I am running Debian/unstable with a AMD Radeon 5700. First of all I purged all NVIDIA related packages, and that are a lot I have to say. Be sure to search for nv and nvidia and get rid of all packages. For safety I did reboot and checked again that no kernel modules related to NVIDIA are loaded.

Firmware

It seems that the current version of the amd-gpu-firmware is sufficiently recent, so there is no need to manually update the firmware.

Debian ships the package amd-gpu-firmware but this is not enough for the current kernel and current hardware. Better is to clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git and copy everything from the amdgpu directory to /lib/firmware/amdgpu.

I didn’t do that at first, and then booting the kernel did hang during the switch to AMD framebuffer. If you see this behavior, your firmware files are too old, please update the above mentioned package, or use the manual method shown.

Kernel

If you are using the Debian provided kernels in version 5.7 or 5.8 then you should be fine. If you compile your own kernel, make sure that the options shown in the following paragraph are activated:

The advantage of having open source driver that is in the kernel is that you don’t have to worry about incompatibilities (like every time a new kernel comes out the NVIDIA driver needs patching). For recent AMD GPUs you need a rather new kernel, I have 5.6.0 and 5.7.0-rc5 running. Make sure that you have all the necessary kernel config options turned on if you compile your own kernels. In my case this is

CONFIG_DRM_AMDGPU=m
CONFIG_DRM_AMDGPU_USERPTR=y
CONFIG_DRM_AMD_ACP=y
CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_DCN=y
CONFIG_HSA_AMD=y

When installing the kernel, be sure that the firmware is already updated so that the correct firmware is copied into the initrd.

Support programs and libraries

WARNING: this description is for ROCm 3.3, which is not available anymore. OTOH, AMD now ships ROCm 3.8, but that cannot be installed directly due to a packaging “bug”. See a later blog post on how to fix it.

All the following is more or less an excerpt from the ROCm Installation Guide!

AMD provides a Debian/Ubuntu APT repository for software as well as kernel sources. Put the following into /etc/apt/sources.list.d/rocm.list:

deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main

and also put the public key of the rocm repository into /etc/apt/trusted.d/rocm.asc.

After that apt-get update should work.

I did install rocm-dev-3.3.0, rocm-libs-3.3.0, hipcub-3.3.0, miopen-hip-3.3.0 (and of course the dependencies), but not rocm-dkms which is the kernel module. If you have a sufficiently recent kernel (see above), the source in the kernel itself is newer.

The libraries and programs are installed under /opt/rocm-3.3.0, and to make the libraries available to Tensorflow (see below) and other programs, I added /etc/ld.so.conf.d/rocm.conf with the following content:

/opt/rocm-3.3.0/lib/

and run ldconfig as root.

Last but not least, add a udev rule that is normally installed by rocm-dkms, put the following into /etc/udev/rules.d/70-kfd.rules:

SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"

This allows users from the video group to access the GPU.


Up to here you should be able to boot into the system and have X running on top of AMD GPU, including OpenGL acceleration and direct rendering:

$ glxinfo
ame of display: :0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
...
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
...

Tensorflow

WARNING: Although the below example of addition of integers worked, floating point number computations are still (even at ROCm 3.8) NOT supported. For this reason, I have switched back to using my nVidia card for deep learning, and use the AMD for the graphic output. See this blog for details on how to do multiple GPU cards.

Thinking about how hard it was to get the correct libraries to get Tensorflow running on GPUs (see here and here), it is a pleasure to see that with open source all this pain is relieved.

There is already work done to make Tensorflow run on ROCm, the tensorflow-rocm project. The provide up to date PyPi packages, so a simple

pip3 install tensorflow-rocm

is enough to get Tensorflow running with Python:

>> import tensorflow as tf
>> tf.add(1, 2).numpy()
2020-05-14 12:07:19.590169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
...
2020-05-14 12:07:19.711478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7444 MB memory) -> physical GPU (device: 0, name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT], pci bus id: 0000:03:00.0)
3
>>

Tensorflow for R

Installation is trivial again since there is a tensorflow for R package, just run (as a user that is in the group staff, which normally own /usr/local/lib/R)

$ R
...
> install.packages("tensorflow")
..

Do not call the R function install_tensorflow() since Tensorflow is already installed and functional!

With that done, R can use the AMD GPU for computations:

$ R
...
> library(tensorflow)
> tf$constant("Hellow Tensorflow")
2020-05-14 12:14:24.185609: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
...
2020-05-14 12:14:24.277736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7444 MB memory) -> physical GPU (device: 0, name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT], pci bus id: 0000:03:00.0)
tf.Tensor(b'Hellow Tensorflow', shape=(), dtype=string)
> 

AMD Vulkan

From the Vulkan home page:

Vulkan is a new generation graphics and compute API that provides high-efficiency, cross-platform access to modern GPUs used in a wide variety of devices from PCs and consoles to mobile phones and embedded platforms.

Several games are using the Vulkan API if available and it is said to be more efficient.

There are Vulkan libraries for Radeon shipped in with mesa, in the Debian package mesa-vulkan-drivers, but they look a bit outdated is my guess.

The AMDVLK project provides the latest version, and to my surprise was rather easy to install, again by following the advice in their README. The steps are basically (always follow what is written for Ubuntu):

  • Install the necessary dependencies
  • Install the Repo tool
  • Get the source code
  • Make 64-bit and 32-bit builds
  • Copy driver and JSON files (see below for what I did differently!)

All as described in the linked README. Just to make sure, I removed the JSON files /usr/share/vulkan/icd.d/radeon* shipped by Debians mesa-vulkan-drivers package.

Finally I deviated a bit by not editing the file /usr/share/X11/xorg.conf.d/10-amdgpu.conf, but instead copying to /etc/X11/xorg.conf.d/10-amdgpu.conf and adding there the section:

Section "Device"
        Identifier "AMDgpu"
        Option  "DRI" "3"
EndSection

.

To be honest, I did not follow the Copy driver and JSON files literally, since I don’t want to copy self-made files into system directories under /usr/lib. So what I did is:

  • copy the driver files to /opt/amdvkn/lib, so I have now there /opt/amdvlk/lib/i386-linux-gnu/amdvlk32.so and /opt/amdvlk/lib/x86_64-linux-gnu/amdvlk64.so
  • Adjust the location of the driver file in the two JSON files /etc/vulkan/icd.d/amd_icd32.json and /etc/vulkan/icd.d/amd_icd64.json (which were installed above under Copy driver and JSON files)
  • added a file /etc/ld.so.conf.d/amdvlk.conf containing the two lines:
    /opt/amdvlk/lib/i386-linux-gnu
    /opt/amdvlk/lib/x86_64-linux-gnu
    

With this in place, I don’t pollute the system directories, and still the new Vulkan driver is available.

But honestly, I don’t really know whether it is used and is working, because I don’t know how to check.


With all that in place, I can run my usual set of Steam games (The Long Dark, Shadow of the Tomb Raider, The Talos Principle, Supraland, …) and I don’t see any visual problem till now. As a bonus, KDE/Plasma is now running much better, since NVIDIA and KDE has traditionally some incompatibilities.

The above might sound like a lot of stuff to do, but considering that most of the parts are not really packaged within Debian, and all this is rather new open source stack, I was surprised that in half a day I got all working smoothly.

Thanks to all the developers who have worked hard to make this all possible.

10 Responses

  1. Omer Ferhat says:

    Hello, could you please test tensorflow with basic mnist training. After installation I get SIGABRT 156 error while training.

  2. Omer Ferhat Sarioglu says:

    I have RX 5700 XT and I installed all things according to your guide and I get error.

    Here’s code:
    from __future__ import print_function
    import tensorflow.keras as keras
    from tensorflow.keras.datasets import mnist
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, Dropout, Flatten
    from tensorflow.keras.layers import Conv2D, MaxPooling2D
    from tensorflow.keras import backend as K

    batch_size = 128
    num_classes = 10
    epochs = 12

    # input image dimensions
    img_rows, img_cols = 28, 28

    # the data, split between train and test sets
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    if K.image_data_format() == ‘channels_first’:
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
    else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

    x_train = x_train.astype(‘float32’)
    x_test = x_test.astype(‘float32’)
    x_train /= 255
    x_test /= 255
    print(‘x_train shape:’, x_train.shape)
    print(x_train.shape[0], ‘train samples’)
    print(x_test.shape[0], ‘test samples’)

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes)

    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
    activation=’relu’,
    input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation=’relu’))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation=’relu’))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation=’softmax’))

    model.compile(loss=keras.losses.categorical_crossentropy,
    optimizer=keras.optimizers.Adadelta(),
    metrics=[‘accuracy’])

    model.fit(x_train, y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(x_test, y_test))
    score = model.evaluate(x_test, y_test, verbose=0)
    print(‘Test loss:’, score[0])
    print(‘Test accuracy:’, score[1])

    Here’s output:
    (conda-dl) ferhat@ferhat-desktop:~/py$ python main.py
    x_train shape: (60000, 28, 28, 1)
    60000 train samples
    10000 test samples
    2020-08-04 23:14:53.088581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
    2020-08-04 23:14:53.134386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties:
    pciBusID: 0000:03:00.0 name: Device 731f ROCm AMD GPU ISA: gfx1010
    coreClock: 2.1GHz coreCount: 20 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: -1B/s
    2020-08-04 23:14:53.173682: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
    2020-08-04 23:14:53.174508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
    2020-08-04 23:14:53.179126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
    2020-08-04 23:14:53.179363: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
    2020-08-04 23:14:53.179476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
    2020-08-04 23:14:53.179753: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
    2020-08-04 23:14:53.184194: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3399905000 Hz
    2020-08-04 23:14:53.184564: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abb0110ec0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-08-04 23:14:53.184578: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
    2020-08-04 23:14:53.186057: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abb0112a00 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
    2020-08-04 23:14:53.186085: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Device 731f, AMDGPU ISA version: gfx1010
    2020-08-04 23:14:53.186211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties:
    pciBusID: 0000:03:00.0 name: Device 731f ROCm AMD GPU ISA: gfx1010
    coreClock: 2.1GHz coreCount: 20 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: -1B/s
    2020-08-04 23:14:53.186249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
    2020-08-04 23:14:53.186273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
    2020-08-04 23:14:53.186282: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
    2020-08-04 23:14:53.186292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
    2020-08-04 23:14:53.186324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
    2020-08-04 23:14:53.186335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
    2020-08-04 23:14:53.186340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
    2020-08-04 23:14:53.186344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
    2020-08-04 23:14:53.186412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7384 MB memory) -> physical GPU (device: 0, name: Device 731f, pci bus id: 0000:03:00.0)
    Segmentation fault (core dumped)

    • Indeed, I also get a segfault. That is not good. Are you sure the code runs correctly on nvidia/cuda, ie. there are no bugs in the code?

      It could be easily that the amdgpu tensorflow part has some bugs.

    • That is the gdb backtrace, not that I know what to do with it

      ore was generated by `python3 test.py'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x0000000000567b08 in PyErr_SetString ()
      [Current thread is 1 (Thread 0x7fbcb671c740 (LWP 17798))]
      (gdb) bt
      #0  0x0000000000567b08 in PyErr_SetString ()
      #1  0x00007fbc76845f05 in pybind11::detail::translate_exception(std::__exception_ptr::exception_ptr) ()
         from /usr/local/lib/python3.8/dist-packages/tensorflow/python/_tf_stack.so
      #2  0x00007fbc7664dc38 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
         from /usr/local/lib/python3.8/dist-packages/tensorflow/python/_pywrap_tfe.so
      #3  0x0000000000520e1c in PyCFunction_Call ()
      
  3. Witold says:

    You can safely remove the part about firmware or Kernel configuration. It is unnecessary confusion to less versed users. The firmware and kernel in Debian testing and unstable does have everything built in (and had for more than a year). No need to compile anything. I use standard kernel from testing. Just recommend using linux kernel version 5.7 or 5.8 from Debian, and it will be safe, but even 5.6 works fine.

    • Thanks, I will remove (or comment) the firmware part, but since I compile all my kernels I will leave the kernel part in there, but mention that Debian provided kernels are actually fine.

      Thanks for the suggestions!

  4. Witold says:

    You probably should mention that installing rocm-libs3.8.0 is rather tricky on Debian now, due to llvm-amdgpu3.8.0 depending on libgcc-7-dev. This can be fixed by repackaging this package and removing this dependency, or forcing it by dpkg -i somehow. See https://github.com/RadeonOpenCompute/ROCm/issues/1125

    • Yes, I have worked around this problem in the same way, but this article was written for rocm 3.3 where it worked out of the box. Unfortunately, since then AMD has made completely unreasonable changes. I will try to mention this, too.

  5. Gediz GÜRSU says:

    I worked on similar transition two days. First I have manged vfio passthrough with latest arch linux kernel and made rx 6500 XT work in virutal machine both on windows and linux . Played 2k 100FPS apex with a freesync monitor then Installed rocm 5.0.2 then intalled amdvlk vulkan then installed glwf libraries and installed vulkan kompute.

    Struggled for days for dual rocm install cupy on rocm tensorflow 2.8.0 on rocm. Altough I am not fluent at C++ digged into vulkan API and tried to make vulkan compute work.

    Again I have the feeling there is a gigantic mess. Installed AUR packages pacman packages. Compiled from source. Everytime I get lots of errors and warnings when compiling things using gcc lvm. I havent learned SPIR-V but I am in the process of deciding if it is worth it.

    Having the ability to code whatever you want in any gpu and os platform sounds promising. However when we have qemu-kvm – ESXİ and bandwidth and cloud servers. Do we really need to use vulkan as functional designers and inventor engineers to prototype ? Only for mobile (maybe mobile can get compute from cloud too considering 5G speeds).

    I cant decide really. I thought It would be fascinating to write an indirect encoding neuroevolution agent for realtime AI usage (learns adapts and acts). However I am not sure it I should learn all the optimizations for vulkan to do this. A fast and fluent start would motivate me a lot. Seeing results of some compiled code output will be outstanding. That many struggle to setup. I already feel that I am inadequate. Even though I have written some cuda code in C++ before …

    It would be really nice to use vulkan API for compute easily on linux …

  1. 2020/05/17

    […] recently switched from NVIDIA to AMD graphic cards, in particular a RX 5700, I found out that I can get myself a free upgrade to the RX […]

Leave a Reply to Omer Ferhat Cancel reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>