Install Impala ODBC for Cloudera on Arch Linux
For a project, I had to automate requests to a Impala database hosted on a Cloudera VM and I had to install an OCDB for this, but it wasn't so simple to do. I got problems to configure correctly unixodbc and afterwards to install the driver which is not available on Arch. I detail in this article how I manage the problems and what are the solutions I found. If you are in a hurry, there is a TL;DR at the end.
Process
The first issue I encountered was that the drivers given by Cloudera are not compatible for Arch Linux. They are only available for Red Hat, Debian and Suse. So, the first thing to do was to install debtap, a little program to convert.deb packages to Arch Linux packages. Once done, I could convert the Cloudera driver for Debian to an Arch package. This is not very difficult. However, it seems it's not perfectly done because I had issues when trying to connect to the database with this driver afterwards.
After using a ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
I could see that the libsasl2.so.2
was not found. And this library should be
installed with the cyrus-sasl package. So
I should have needed to add the package to the Cloudera package I just
converted. In fact, I just installed the package manually. But that is not
finished because, furthermore, after installing the cyrus-sasl package, I
still didn't had the libsasl2.so.2 library. Accordingly, I did a bad thing
which was to create libsasl2.so.2 from libsasl2.so.3 because I had this last
library. And it worked.
sudo cp /usr/lib/libsasl2.so.3 /usr/lib/libsasl2.so.2
Okay, but now, I had only installed the driver. Before that, I had to install unixodbc which is needed to effectively use an ODBC driver. Unixodbc needs to be configured according to your odbc drivers and according to the databases on which you would like to connect. In my case, I wished to connect to an Impala database. Thanks to @manuel_lemaire who spent a day and a half configuring it correctly, I could use his configurations and manage to make it work in about one hour. Below are the configuration files we used (you need to create manually the cloudera.impalaodbc.ini file ) :
/etc/odbcinst.ini
[ODBC Drivers]
Impala=Installed
[Impala]
Description=Cloudera Impala ODBC Driver (64-bit)
Driver = /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
Don't forget to put your credentials into the configuration file below:
/etc/odbc.ini
[ODBC Data Sources]
Impala=Cloudera Impala ODBC Driver 64-bit
[Impala]
Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
HOST=localhost
PORT=21050
UID=_YOUR-USERNAME_
PWD=_YOUR-PASSWORD_
DATABASE=_YOUR-DATABASE-NAME_
/etc/cloudera.impalaodbc.ini
[Driver]
## - Note that this default DriverManagerEncoding of UTF-32 is for iODBC.
## - unixODBC uses UTF-16 by default.
## - If unixODBC was compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the correct value.
## Execute 'odbc_config --cflags' to determine if you need UTF-32 or UTF-16 on unixODBC
## - SimbaDM can be used with UTF-8 or UTF-16.
## The DriverUnicodeEncoding setting will cause SimbaDM to run in UTF-8 when set to 2 or UTF-16 when set to 1.
DriverManagerEncoding=UTF-32
ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/
LogLevel=0
LogPath=
SwapFilePath=/tmp
## - Uncomment the ODBCInstLib corresponding to the Driver Manager being used.
## - Note that the path to your ODBC Driver Manager must be specified in LD_LIBRARY_PATH (LIBPATH for AIX).
## - Note that AIX has a different format for specifying its shared libraries.
# Generic ODBCInstLib
# iODBC
ODBCInstLib=libiodbcinst.so
# SimbaDM / unixODBC
#ODBCInstLib=libodbcinst.so
# AIX specific ODBCInstLib
# iODBC
#ODBCInstLib=libiodbcinst.a(libiodbcinst.so.2)
# SimbaDM
#ODBCInstLib=libodbcinst.a(odbcinst.so)
# unixODBC
#ODBCInstLib=libodbcinst.a(libodbcinst.so.1)
Alright, almost done. We just needed to set up some environment variables and to add them to our bashrc or zshrc, whatever we are using. We needed to add these following lines:
~/.bashrc
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/odbc"
export ODBCINI="/etc/odbc.ini"
export ODBCSYSINI="/etc"
export CLOUDERAIMPALAINI="/opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini"
export LD_PRELOAD="/usr/lib/libodbcinst.so"
Caution: the LD_PRELOAD line is different for Debian which is:
export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libodbcinst.so"
Summary / TL;DR
Well, I hope you understood my process of trying to make it work. To enumerate in the right order:
- Install unixodbc
- Add the 3 configuration files and the environment variables above
- Download the Cloudera Impala ODBC Driver for Debian
- Convert it to Arch Linux package with debtap
- Add the cyrus-sasl package to the dependencies of the Arch package
- Install it
- If you have an issue when trying to use the ODBC, it may be because of the libsasl2.so.2 which is missing and you can fix it as I did above
Bonus:
If you wish to use Python, the following script made by @manuel_lemaire should make it easier for you to start:
import pyodbc
pyodbc.autocommit = True
conn = pyodbc.connect('DSN=Impala;',autocommit=True)
cursor = conn.cursor()
cursor.execute('SELECT * FROM table')
results = cursor.fetchall()
print results
Note:
Don't forget to install the pyodbc module for Python. As Arch has the latest version of Python and it's often causing problems, I recommend you to create a virtualenv with Python 2.7 before doing anything:
virtualenv -p /usr/bin/python2.7 _folder-name
_pip install pyodbc_
_
Conclusion
I'm completely a newbie using this kind of software, but I'm quite happy to
manage to make it work correctly. Many thanks to @manuel_lemaire who helped me
to set up the configuration for the ODBC. Why don't I submit the Cloudera
Impala ODBC to AUR to make it easier for you? Because first I don't know how
to redistribute it legally, I need to investigate the license and second, the
way I dealt the libsasl2.so.2
library wasn't very good if you don't have the
libsasl2.so.3
for instance.
Don't hesitate to react if you find mistakes in this article!
A comment?
You found an error in this article? Some advice? You can send a comment by email to "blog at killiankemps.fr" with "[Comment][en][Install Impala ODBC for Cloudera on Arch Linux]" as subject.
Send a comment by email(The "@" has been replaced by "at" to avoid bad bots to parse the email address)