TVM RPC Android 踩坑记

先说结论

系统环境:Ubuntu 18.04 LTS

解决方法:卸载 OpenJDK-11,安装 OpenJDK-8

备注说明:VMWare 无法正常使用独立显卡,深度学习请不要在虚拟机里进行,否则只能利用 CPU 进行学习。

当编译 android_rpc 时,你必须jni/config.ml 里指定 libOpenCL.so,然后在 GitHub 下载 CL-headers。

When you compile android_rpc, you HAVE TO specify the libOpenCL.so in jni/config.mk, which pulled from your Android phone. Then, download CL-headers from github!

READ THE OFFICIAL INSTRUCTIONS CAREFULLY!!

仔细阅读官方指南!!

问题细节

第一天 No OpenCL platform

遇到了一个错误:

tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options …

然后找了好多文章,找到了官方的文档,却发现2019年3月就提出了这个问题,是bug,并且被修复了。

现在是2020年3月7日,我不应该再遇到这个bug,于是看了看源代码,发现获取不到我的CL信息

终端里输入

$ clinfo

却显示

clinfo number of platforms 0

然后找了找教程,安装了一下驱动(大环境是我已经安装了 opencl-icd-dev 之类的包了)

sudo apt install mesa-opencl-icd

然后就修好了, clinfo 能打出来东西了:

Number of platforms 1
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 19.2.8
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA

Platform Name Clover
Number of devices 0

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, …) Clover
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, …) Clover
clCreateContext(NULL, …) [default] No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1

然后我用RPC跑GPU测试,又遇到了

CommandLine Error: Option ‘help-list’ registered more than once!

这个问题,淦!

.

搜了一会,发现了这样一篇文章:https://discuss.tvm.ai/t/llvm-error-option-registered-more-than-once-while-loading-libtvm-so/269/10

其中有一段话:

This happens when you compile TVM runtime with both set(USE_OPENCL ON) and set(USE_LLVM ON)
You should enable only ONE option but not both.

我试了一下,编译的时候开着OPENCL,LLVM关掉,发现错的更离谱了

然后我把OPENCL关掉,LLVM打开,Vulkan打开,还是出错

我试试LLVM打开,OPENCL关闭,Vulkan关闭

(这不就是之前的状态吗???)


第二天 No OpenCL device

睡了一觉,第二天了,

遇到了新的错误:

Using CPU OpenCL device
No OpenCL device

按照 tqchen 的说法,make clean 然后 make

还是不行(我重装了llvm)

后来,在 https://askubuntu.com/questions/809450/installing-opencl-for-svga-ii-adapter

找到这么一段话:https://askubuntu.com/posts/809481/timeline

According to the top answer to this question, the Intel SDK does not work on VMWare. It suggests instead trying to use the AMD APP SDK

现在我还是反复遇到 bug,找到这样一段描述:

TVM DOES NOT SUPPORT openjdk-11 !!!

TVM DOES NOT SUPPORT openjdk-11 !!!

TVM DOES NOT SUPPORT openjdk-11 !!!

apt install openjdk-8-jdk ONLY!

Then set JAVA_HOME in your environment!


第三天 解决

When you compile android_rpc, you HAVE TO specify the libOpenCL.so in jni/config.mk, which pulled from your Android phone. Then, download CL-headers from github!

READ THE OFFICIAL INSTRUCTIONS CAREFULLY!!