by Andrie de Vries (@RevoAndrie)
I frequently get asked the question how you can safely store login details and passwords for use by R, without exposing these details in your script. Yesterday Jennifer Bryan asked this question on twitter and a small storm of views and tweets erupted.
Do we have any sort of consensus whether user’s API keys or app id/secrets should be handled via .Rprofile or .Renviron? #rstats
— Jenny Bryan (@JennyBryan) November 23, 2015
A few minutes later she tweeted that there clearly is no consensus:
Answer: NO, apparently we have no consensus. #rstats pkgs that wrap APIs are unique snowflakes ❄️. https://t.co/IjV6FarApq
— Jenny Bryan (@JennyBryan) November 24, 2015
Different options
Reading the twitter conversation, it seems to me there are several approaches. You can store your keys:
- Directly inside your script.
- In a file in your project folder, that you don't share.
- In a .Rprofile file
- In a .REnviron file
- Store the keys in a json file
- In a secure key store that you access from R
Let's look at the key idea and benefits (or disadvantages) of each approach:
1. Directly inside your script
The first approach is to simply store your keys directly in your script.
id <- "my login name" pw <- "my password" call_service(id, pw, ...)
Although simple, nobody seriously proposes this, for the obvious downside that it becomes impossible to share your code without also sharing your keys.
2. In a file in your project folder, that you don't share.
The second option is almost just as easy to do. The idea is that you put your keys into an R script file in the same project folder, e.g. "keys.R". You then read the keys using, for example, source().
The idea is that you then exclude the "keys.R" file from any source control system. With git, for example, you can add "keys.R" to your .gitignore settings.
The downside is that you can easily mistakenly share this file if you're not careful.
# keys.R id <- "my login name" pw <- "my password" # script.R source("keys.R") call_service(id, pw, ...)
3. In a .Rprofile file
The third option is to store the keys in one of your .Rprofile files (I wrote about this in a previous blog post "Best practices for handling packages in R projects").
This option was very popular in the twitter discussion, because:
- You can store the keys in your home folder, i.e. outside the project folder. This makes it less likely that you accidentally share your keys.
- You can write standard R code in the .Rprofile
# ~/.Rprofile id <- "my login name" pw <- "my password" # script.R # id and pw are defined in the script by virtue of .Rprofile call_service(id, pw, ...)
One downside of defining the objects "id" and "pw" directly inside your .Rprofile is that these objects then live in your global environment. If these objects are in the global environment, they can easily be changed by your script. For example, using rm() to clear your global environment will make these objects disappear.
A slightly more robust variation on the theme is to still use .Rprofile, but to declare your keys as environment variables. You can use Sys.setenv() to set environment variables, and Sys.getenv() to read these:
# ~/.Rprofile Sys.setenv(id = "my login name") Sys.setenv(pw = "my password") # script.R # id and pw are defined in the script by virtue of .Rprofile call_service(id = Sys.getenv("id"), pw = Sys.getenv("pw"), ...)
4. In a .Renviron file
Actually, R also has a mechanism to define environment variables in an external file called .Renviron. The loading of .Renviron is analogous to .Rprofile. The big difference is that you can in .Renviron you can define the variables directly, without using Sys.setenv().
As Hadley Wickham points out, environment variables are language agnostic:
@JennyBryan I like env vars because it works across programming languages
— Hadley Wickham (@hadleywickham) November 23, 2015
You can find very detailed instructions and recommendations in one of the vignettes of the httr package. View the vignette Best practices for writing an API package and navigate to "Appendix: API key best practices".
# ~/.Renviron id = "my login name" pw = "my password" # script.R # id and pw are defined in the script by virtue of .Rprofile call_service(id = Sys.getenv("id"), pw = Sys.getenv("pw"), ...)
5. Store the keys in a json or yaml file
The json file format is increasingly the format of choice to communicate with webservices. As a result, most modern languages can easily parse json files. The same idea goes for yaml files.
So, if you want to store your keys in a file format that can easily be consumed by other languages, e.g. Python, then json might be a good idea.
# keys.json { "id":["my login name"], "pw":["my password"] } # script.R library(jsonlite) call_service(id = fromJSON("keys.json")$id, pw = fromJSON("keys.json")$pw, ...)
6. In a secure key store that you access from R
One big downside of all the previous approaches is that, in every case, you are storing your keys in an unencrypted format somewhere on your file system.
You probably already use a password storage tool, e.g. keychain or LastPass.
Unfortunately I am not aware of R interfaces to any of these key chains. If you know of a good solution, please leave a comment!
Interesting. We just have to think about username and password problem in a team environment.
We looked at how encryption can be done use the PKI package. But as you noted in your post, one has to save username and password somewhere to avoid re-enter over and over again.
Our current solution is to type it when needed and never save.
Posted by: D | November 25, 2015 at 19:12
As D commented, for the best security, you have to type your password each time.
password = readline("Enter your password > ")
works, but doesn't prevent access by people looking over your shoulder.
You can have obscured text if you type into a gWidgets GUI. Something like:
library(methods)
library(gWidgets2tcltk)
passwordEntryFactory <- setRefClass(
"PasswordEntry",
fields = list(
win = "ANY",
txt = "ANY",
pw = "ANY"
),
methods = list(
initialize = function(...)
{
win <<- gwindow(
"Enter your password",
visible = FALSE,
height = 25
)
txt <<- gedit(
container = win,
handler = function(h, ...)
{
pw <<- svalue(txt)
dispose(win)
}
)
visible(txt) <- "*"
visible(win) <- TRUE
}
)
)
getPassword <- function()
{
passwordEntry <- passwordEntryFactory$new()
while(isExtant(passwordEntry$win))
{
Sys.sleep(0.1)
}
passwordEntry$pw
}
getPassword()
I sometimes store passwords in RData files too, then at least the password isn't stored in plain text, and is secure from non-R programmers.
Posted by: Richierocks | November 25, 2015 at 23:00
Hi Guys,
Thanks for posting about this.
I ran into this issue recently and here is the solution I found.
1. I use the digest package that allows aes encryption.
2. I use a two functions one that write aes encrypted files, the other that read and decrypt those files.
You can find those functions here: https://github.com/sdoyen/r_password_crypt
3. Finally, I use the digest package to generate the key required to encrypt and decrypt files.
4. Once all of this is in place I create a dataframe that contains the loggin and the password.
5. I use the write.aes function to write the credential locally in an encrypted file
6. The read.aes allows to decrypt the credentials and import it in R.
That way no credential appears in plain text or in the code. Additionally, one could decide to store the key elsewhere (remote server, usb drive, ect..)
Also, this solution does not require to prompt for password each time.
So,
source("crypt.R")
load("key.RData")
credentials = data.frame(login = "foo", password = 'bar', stringsAsFactors = F)
write.aes(df = credentials,filename = "credentials.txt",key = key )
rm(credentials)
credentials = read.aes(filename = "credentials.txt",key = key)
print(credentials)
I hope this helps,
S.
Posted by: Stephane Doyen | November 26, 2015 at 14:32
We use method #2, but store the files in a folder outside the project. We have secure storage available to all users on a SAN, so storage in text files isn't as much of a problem. The user's area on the SAN is accessible only to them, and is highly encrypted. You could use other storage somewhere else of course.
Then you can create a symlink in the project to the folder on the SAN. We're on Windows for development and we use Git for version control, so we use a simple .bat script to create the link and add the files to .gitignore. You could use bash or whatever depending on your environment.
mklink /J credentials "PATH_TO_SAN_STORAGE\credentials"
ECHO.>> .gitignore
ECHO credentials/* >> .gitignore
ECHO !credentials/credential.template >> .gitignore
ECHO !credentials/readme >> .gitignore
With every new project, we can copy the .bat file into the project folder, run it, and have easy access to our personal credentials.
Posted by: mstanley | November 27, 2015 at 02:04