by Gregory Vandenbrouck
Software Engineer, Mirosoft
This post is the second in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux.
Last time we covered pulling data from SQL Azure to an R client on Windows. This time we’ll be pulling data from MariaDB (compatible with MySQL) hosted on a Linux VM in Azure to an R client on Windows.
Creating the database
The Azure Management site changes quite often, therefore these instructions are valid “at the time of this writing” :o)
- Log on the Azure Management site with your Azure subscription (see previous post),
- Create a VM: select “New”, “Compute”, “Virtual Machine”, “Quick Create” and then fill-in the information. In our case we chose “SUSE Linux Enterprise” for the image, but most images should work, both Windows and Linux.
- Once the creation is completed, click on “Virtual Machines”, your machine name, and then “endpoints”. You should already have SSH (port 22) listed if you chose a Linux VM (and if not, add it). Add MySQL (port 3306) to the list.
- Connect to your VM via ssh.
- Once logged on to the VM via ssh, install MySQL or MariaDB:
sudo zypper install mariadb*
.
Note: this command and subsequent ones may vary depending on the choice of Linux distribution. For example the equivalent of “zypper” may instead be called: “yum”, “apt-get”, “synaptic”, “aptitude”, “dpkg-install”, etc. - Start the server.
sudo rcmysql start
- Secure the server:
sudo mysql_secure_installation
-
Add a user that’s authorized to connect remotely. Below is the full transcript of a session where we add user MyUser with password MyPassword that can connect remotely from any location with all privileges (not recommended), and we also create the MyDatabase database:
sshuser@MyServer:~> mysql -s --user=root -p Enter password: MariaDB [(none)]> create user 'MyUser'@'%' identified by 'MyPassword'; MariaDB [(none)]> grant all privileges on *.* to 'MyUser'@'%' with grant option; MariaDB [(none)]> create database MyDatabase; MariaDB [(none)]> quit sshuser@MyServer:~>
Connecting to the database from outside of R
Optional step, but can be useful for troubleshooting. For example to solve firewall, port and credential issues.
- Install mysql.exe client or MariaDB client.
-
Below an example of a successful test session in a command prompt (search for “cmd” in the Start menu or screen to start a command prompt):
C:\>"c:\Program Files\MySQL\MySQL Workbench 6.2 CE\mysql.exe" -s -hMyServer.cloudapp.net -uMyUser -pMyPassword -DMyDatabase Warning: Using a password on the command line interface can be insecure. mysql> select 1+1 as Answer; Answer 2 mysql> quit C:\>
(replace MyServer, MyUser, MyPassword and MyDatabase with your values)
Note: there’s no space between the flags and values; e.g. it’s “-pMyPassword” and not “-p MyPassword”.
Connecting to the database from R on Windows
Using RODBC’s odbcDriverConnect
function
MySQL’s ODBC drivers need to be installed. To see if you have these installed and get the driver’s name, open the “ODBC Data Source” (see previous post for instructions), and look in the “Drivers” tab. Below an example with MySQL drivers installed:
If the drivers aren’t installed, you can get them by installing MySQL’s or MariaDB’s client. There are also ODBC-drivers-only installation options.
library(RODBC)
myServer <- "MyServer.cloudapp.net"
myUser <- "MyUser"
myPassword <- "MyPassword"
myDatabase <- "MyDatabase"
myDriver <- "MySQL ODBC 5.3 Unicode Driver"
connectionString <- paste0(
"Driver=", myDriver,
";Server=", myServer,
";Database=", myDatabase,
";Uid=", myUser,
";Pwd=", myPassword)
conn <- odbcDriverConnect(connectionString)
sqlQuery(conn, "SELECT 142857 * 3 AS Cyclic")
## Cyclic
## 1 428571
close(conn) # don't leak connections !
Using RODBC’s odbcConnect
function
Similar to the Microsoft SQL Server case, you can persist a Data Source Name (DSN). This time I created a “System DSN” (available to everyone):
Contrary to the SQL Server driver, MySQL allows for saving credentials. This is handy as credentials no longer need to be present in clear text in the R script. Unfortunately, the credentials are saved in clear text in Windows’ registry. Choose your poison!
For the drivers we used, the registry locations are HKEY_CURRENT_USER\Software\ODBC\ODBC.INI
for “User DSN” and HKEY_LOCAL_MACHINE\Software\ODBC\ODBC.INI
for “System DSN”.
When using a DSN with saved credentials, the R code is quite simplified.
library(RODBC)
# No need to specify uid and pwd: these are part of the DSN
conn <- odbcConnect("MyMariaDBAzure")
sqlQuery(conn, "SELECT 10001 - 73 * 137 AS Zero")
## Zero
## 1 0
close(conn)
Using RJDBC
To use RJDBC to connect to MySQL or MariaDB, you need to:
- Install:
- Know the following:
- driverClass: the class name. See the driver-specific documentation.
- classPath: the location of the jar file. If you don’t know where it is, try running
dir /s /b %systemdrive%\*sql*.jar
from a command prompt. - url connector prefix. Again, driver-specific.
In my specific setup:
library(RJDBC)
drv <- JDBC(
driverClass = "com.mysql.jdbc.Driver",
classPath = "C:/Program Files (x86)/MySQL/MySQL Connector J/mysql-connector-java-5.1.35-bin.jar")
conn <- dbConnect(drv, "jdbc:mysql://MyServer.cloudapp.net", "MyUser", "MyPassword")
dbGetQuery(conn, "SELECT 1+1")
dbDisconnect(conn)
Using RMySQL
It’s slighter easier to use than RJDBC:
library(RMySQL)
conn <- dbConnect(RMySQL::MySQL(), host = "MyServer.cloudapp.net", user="MyUser", password="MyPassword")
dbGetQuery(conn, "SELECT 1+1")
dbDisconnect(conn)
Summary
- We’ve tried RODBC, RJDBC and RMySQL to connect to a MariaDB database hosted on a Suse VM in Azure.
- In RODBC and RJDBC cases, we had to install something on the client machine (besides R packages): drivers and/or JDK.
- When it comes to performance, our experience was similar to Microsoft SQL Server’s case: RODBC was faster, both to connect and to get back the query results.
Editors Note: Changes were made to the RMySQL and Summary sections of this post on 6/4/15 to correct certain inaccuracies. See the discussion in the comments section.
Why RMySQL requires JDK? How does it impact it's performance? Can ODBC compete with RMySQL in terms of performance? For databases *pulling the data* is usually not enough, it is nice to *pull the data efficiently*. I believe RMySQL, without JDK overhead, would be the best way, hm?
Posted by: jangorecki | June 03, 2015 at 13:32
The RMySQL package does not require JDK or JRI. There never was or will be any Java involved with mysql or RMySQL. RMySQL works completely out of the box on Windows without any runtime dependencies. It is linked to the C driver and OpenSSL so it supports SSL servers encryption as well.
See this post: https://www.opencpu.org/posts/rmysql-release-0-10-2/
Posted by: Jeroen Ooms | June 04, 2015 at 02:30
Indeed a great article.So you can think so as bigdata can allow us accurate the nature of business that we are going to do data science
Posted by: vilzue | June 04, 2015 at 03:20
@Jeroen: oops, you're right, RMySQL doesn't require the JDK. Thanks for pointing it out!
Posted by: Grégory Vandenbrouck | June 04, 2015 at 10:58