Chapter 2 The QCA package
In the first chapter, we have briefly covered some of the basics in using R. This chapter presents the structure of the QCA package with its associated functions, complementing their explanation with an overview of the associated graphical user interface.
Many functions will be thoroughly explained in dedicated chapters (for example how to construct a truth table, or how to calculate parameters of fit), while this chapter will only cover the general usage, valid for all functions. Since the user interface in this package is novel, most sections are recommended for all users, beginners and advanced alike.
Some of the sections from this chapter are highly specific and very technical about the package itself, but have very little to do with either R in general or QCA in particular. Readers need not insist too much on those sections (for instance section 2.2 about the package’s structure or section 2.4.3 about how to create an executable icon to start the graphical user interface), they are provided just in case it might prove to be useful for advanced users.
2.1 Installing the QCA package
The easiest way to install the QCA package is to start R and follow the platform specific menus to install packages from CRAN.
- For MacOS, the menu is: Packages & Data / Package Installer. In the resulting window, there is a button called ‘Get List’ to get all the binaries from CRAN, then select the QCA package also click on ‘Install Dependencies’.
- For Windows, the relevant menu is: Packages / Install package(s)… and a list of CRAN mirror servers will appear (chose the one closest to your location), after which the list of packages is presented.
- On Linux, probably one of the easiest ways is to install from the source package(s). Simply download them from CRAN, open a Terminal window and type:
R CMD INSTALL path/to/the/source/package.tar.gz
(this method works on all platforms, provided that all the necessary build tools are installed). For certain Linux distributions like Ubuntu, there are package binaries already produced and maintained on CRAN3, and users can install them via the usual package management software.
Apart from the obvious point and click method using R’s graphical user interface, there is also a cross-platform method from within R, where users have to type:
There is also a package installer within the RStudio user interface (select the Packages tab, then click the ‘Install’ button), which is more convenient to use, but from my own experience it is always better to install packages from the base R, and then use them via any other user interface.
There are development versions released in between the official CRAN versions, with patched code that fix various things. To install the very latest such version, the following command is needed:
Once installed, the package needs to be loaded before the functions inside can be used. For all chapters and sections to follow, this book assumes the user has already run this mandatory command:
2.2 Structure
The package QCA is organized in a series of directories and files, some of them containing the actual code (functions) while others containing their associated documentation and help files.
A good practice advice is to read the ChangeLog file upon each package install or update, because functions can slightly be changed or patched, or arguments can change their default values and previous code that used to work in some way would give different results on the next run (detailed explanations on how functions work will be give at section 2.3 Command line mode).
One possible way to access the ChangeLog file (the text file mentioning each modification and addition that happens from one version to another) is to read it directly in the QCA webpage4 on CRAN, where a link to the ChangeLog file is automatically provided by the build system.
On the local computer, the ChangeLog file is located in the root of the package’s installation directory, which usually is:
- on Windows, in
Program Files/R/R-3.3.3/library/QCA
- on MacOS, in
/Users/your_username/Library/R/3.3/library/QCA/
- on Linux, in
/home/your_username/R/architecture/version
Each of these paths may change depending of the version of R (for example 3.3.3 at the time of this writing) or depending on the Linux distribution (via the architecture
string), whether on 32 bit or 64 bit etc.
Users are allowed to define a custom library installation directory, which would help not only to circumvent the version-dependent path, but also to avoid reinstalling all packages upon the release of a new version of R. As a bonus, it also helps working with multiple version of R in the same time (this is sometimes important for developers who want to test functionality for different versions).
Defining a custom installation directory is platform dependent (for example Linux users have to export the paths), therefore readers are encouraged to read more specific details on the ‘R Installation and Admin manual’ or via the help(Startup)
command in R. Windows users may also want to take a look on the more automatic batchfiles package5 written by Gabor Grothendieck.
Once locating the installation directory of the QCA package, the actual content of the individual files (R code and its documentation) is not directly readable because the package is installed as a binary. However, the content is available in the source package, downloadable from the CRAN server.
Apart from the ChangeLog file, users may also be interested in the alternative help system which is located in the staticdocs
subdirectory.
One thing that is likely important for all users is the so called “namespace” of the QCA package. R is a contributed software, with thousands of developers who constantly upload their work to CRAN. It is very likely that two or more developers to write functions using the same name (sometimes for completely different purposes), therefore R needed a way to avoid function name conflicts. The solution was to contain all function names in each package into a namespace, which protects the usage of some function names to a specific package.
Some other times, competing packages use the very same name for exactly the same purposes, and conflicts cannot be avoided. Users should be aware that precedence has the last loaded namespace, but to avoid such situations the best scenario is to refrain from using competing packages at the same time, or at least not in the same R environment. Starting R can be done in various ways, for example in the dedicated installed R user interface and also in the Terminal window.
2.3 Command line mode
R is essentially a CLI - command line interface, which means it can be accessed via commands typed in the console (either in the R’s basic graphical user interface, or in the Terminal window). The first chapter explained the basics of the R commands, and the purpose of this section is to take the user one step further. Once properly understood, things will prove to be deceptively simple, but otherwise they are seemingly complex, especially for beginners.
Both the command line functions, and the user interface in the QCA package are developed with the intention to make things as easy as possible for any user. The design of the functions, with their formal arguments, have a twofold purpose: to perform the operations for Qualitative Comparative Analysis, and to involve an absolute minimal effort to use them.
Care must be taken, however, to make sure the right input is fed for the right arguments, otherwise the functions will stop with descriptive error messages.
The structure of a written command in the QCA package is determined, and closely follows the structure of any basic R command: it has arguments, default values, and relations between various values of associated argument. What is different and better, on top of the basic level of functionality, is an utmost care to capture all possible error situations and create clear and helpful error messages.
The R messages can sometimes be cryptical for beginners. They are very clear for advanced users, but for those who are only beginning to learn R they seem incomprehensible. Since the QCA package assumes only a minimal R knowledge, it is extremely important to provide highly descriptive error messages that allow users of any level to carry on their analysis. Achieving this level of information involved a huge effort to capture all possible situations where users might have difficulties, and return that kind of message which helps users to quickly understand the nature of the problem.
2.3.1 Getting help
The arguments are generally self explanatory, and the functions have associated help files to provide more details where needed. To access such a help file, the easiest is to type a question mark “?
” sign in front of the function name. For example, to see the help file of the parameters of fit function pof()
, the command is simply:
As previously indicated, this command assumes the package QCA has already been loaded, so its namespace and functions are available to the user level.
Another possibility to browse the help files is to access the Index of all functions and datasets in the package (on the bottom of every help page there is a link to that Index), and select the function for which the help is needed. The very same set of help files is also available as a mini-series of static web pages built with the staticdocs package. Those pages are found in the staticdocs
directory from the root installation directory of the package, or (as it will be shown in section 2.4) as a dedicated button in the graphical user interface.
The information from these alternative help files are identical, with the added bonus for the web pages that they also present the result of the commands (especially useful in the case of plots or Venn diagrams).
2.3.2 Function arguments
The structure of a function consists of one command (the name of the function) and a series of formal arguments that need to be specified by the user in order to obtain a result, or a specific type of result.
Let us examine one of the most simple functions in the package, findTh()
to find calibration threshold(s) for a numerical causal condition, in the absence of any theory about how that condition should be calibrated (more information about calibration in chapter 4).
The structure of this function is presented below:
It has four formal arguments (x
, n
, hclustm
and distm
), plus the possibility for additional informal arguments via the “...
” operator (mainly for historical reasons, to ensure backwards compatibility with the code in the former versions of the package).
As it can be seen, some of the arguments have already defined values (called “default”), as in the case of the argument n
, the number of thresholds to find, which is pre-set at the value 1. On the other hand, the argument x
, that contains the numerical input, doesn’t have any default value because it differs from case to case and from user to user.
The main difference between these two kinds of arguments is that x
needs to be specified (without data, the function does not have any input to find thresholds for), while the argument n
is optional (if the user doesn’t specify anything, the function will use its default value of 1).
To demonstrate, we first need some data, for example this vector containing hypothetical country GDP values:
Having created the object gdp
, these two commands give identical results:
[1] 8700
and
[1] 8700
This is a fundamental information on how functions generally work in R and it must must always be remembered because, many times, these default arguments are “at work” even if users don’t specify them explicitly. Perhaps more important, sometimes arguments are “at work” without the user’s knowledge, a highly important incentive to carefully read the help files and make sure that all arguments have their proper values or settings.
Modifying the number of thresholds (more details in chapter 4) will of course change the result:
[1] 8700 18000
Another important aspect regarding the written commands is the logical relationship between various arguments. Some of these arguments depend on the values of other arguments from the same command.
As an example, as it will be further explained in section 4.2.2, let us examine the calibrate()
function which has these formal arguments:
calibrate(x, type = "fuzzy", thresholds = NA, logistic = TRUE,
idm = 0.95, ecdf = FALSE, below = 1, above = 1, ...)
The argument logistic
depends on the fuzzy type of calibration: it would not make any sense to ask the function for a logistic function of a crisp type calibration. If the argument type
is modified to "crisp"
, then argument logistic
will be ignored irrespective of its value.
The same dependency relationship can be observed between the arguments logistic
and idm
, where the later (which stands for the inclusion degree of membership) makes sense only for those cases where the logistical function is used for calibration, and since the logistic function depends on the fuzzy type of calibration it means the argument idm
is logically associated with a chain of dependencies starting from argument type
. If either or both of those arguments are changed from their default values, the argument idm
will be ignored.
This kind of behavior might be difficult to conceptualize on a written command, but is absolutely evident in the graphical user interface for the calibration menu (details in chapter 4), where not only that arguments are ignored, but they actually disappear from the dialog when different previous arguments options have been selected. When choosing the crisp type of calibration the dialog presents only the options (arguments) for the crisp version, and when choosing the fuzzy type of calibration the options from the crisp version disappear and those for the fuzzy type appear instead. The same thing happens in the command line, it is just more obvious in the graphical user interface.
Perhaps one final thing that is worth noting, in the command line description for the package QCA, is a feature that simplifies users’ experience to a maximal extent. Many times, especially in QCA applications, users are required to specify which causal conditions are used for a truth table, or what kind of directional expectations are used to further enhance the solution space.
All of those situations, as well as many others, involve specifying a character vector and the causal conditions is an easy example. The normal accepted way to deal with character vectors, in R, is to use quotes for each value, something like: c("ONE", "TWO", "THREE")
.
But beginners often forget to add the c()
function to construct the vector, or sometimes they might forget a quote somewhere, especially when the conditions’ names are longer. This kind of error can only be captured by the base R, with the associated, seemingly “cryptical” messages. To help prevent these situations, and enhance the user experience, the QCA package has automatic ways to understand and interpret a single character string which, many times, can be used instead: "ONE, TWO, THREE"
.
Such a string is easy to type, and it doesn’t involve much attention on behalf of the user. The package automatically recognizes this string as a character vector (because it contains multiple commas inside), and it splits the string into several character substring corresponding to the initial values in the “normal” specification.
This is useful for two main reasons. First, it is now possible to specify variously complex disjunctive expressions (understood as sum of products) to calculate parameters of fit, something like "natpride + GEOCON -> PROTEST"
.
But as a more convincing example, specifying directional expectations (see section 8.7) can sometimes prove to be tricky. These expectations are most of the times specified in the form of c(1, 0, 0, 1, 1)
, but there are situations where users don’t have any expectation at all (a “don’t care”), and the usual way to specify this situation is to use the character "-"
.
An error will appear if typing c(1, 0, -, 1, 1)
, where the correct way to specify this vector is c(1, 0, "-", 1, 1)
, and this is a very common situation for many first time users. To circumvent this possibility, the QCA package can read the more simple string "1, 0, -, 1, 1"
which works just as well for both advanced and beginners because it is entirely enclosed between quotes. As an added bonus, it is easier to write and decreases the likelihood to make a mistake, overall enhancing users’ experience.
2.4 The GUI - graphical user interface
Working in a command line driven environment is both a blessing and a curse. With most of the other software offering intuitive menus and dialogs, users have a natural expectation for a similar interface when first opening R. It can actually be discouraging, because a command line is completely outside most of the users’ experience.
But there are good reasons for which R is designed the way it is, and those programmers who started to build a graphical user interface know how very difficult a task this is, highlighted by the fact that no universally accepted graphical user interface was designed since the first release of R, over 20 years ago. There have been many attempts to design a GUI (in a random order, for exemplification: Deducer, Rcmdr, RStudio, JGR, RKWard, Rattle, Tinn-R), most of them presented by Valero-Mora and Ledesma (2012).
What is common to all these efforts is the attempt to accommodate R specific features (workspace, packages, console, object types etc.) in a graphical user interface which is normally designed for a different kind of functionality. This is especially evident in the data editor, a spreadsheet like feature that all user interfaces present: as any spreadsheet, it has a rectangular shape with cases on rows and variables on columns and that is the most expected shape when users want to view or edit their data.
The trouble is, R has many more types of data. There are scalars, vectors, matrices, arrays and lists to name the most common, and implementing a viewer / editor for all of those objects, as simple as it might seem, it is in fact close to impossible. Restricting the data editor to datasets only is even worse, fueling many users’ expectations that data means a dataset when in fact data can have many other shapes.
And that is only a fraction of the entire complexity. R is constructed as a package centered software, where in addition to a core set of packages there are potentially thousands of other contributed packages from all users. There are also multiple environments, sessions, graphical devices, it would be practically impossible to accommodate everything into a single system of menus, adding up to the difficulty of designing a multi-purpose user interface.
2.4.1 Description
The graphical user interface in the QCA package is not different from any other attempt. It is strictly dataset oriented, an understandable fact since it is not designed as a general R graphical user interface but specific for QCA purposes, where input data usually has a rectangular shape.
To avoid duplicating efforts and make the users’ experience as smooth as possible, it was extremely important from the very beginning to create a single version for all operating systems, without any additional software necessary to be installed except for the base R.
One highly appreciated user interface that accomplishes these goals is the R Commander (Fox 2005), a point and click interface that can be installed as a regular R package and opened from within R. No other software is necessary, and it can be opened (just like R) in any operating system because it was written in the Tcl/Tk language.
This is one of the best user interfaces for R, and the structure of the user interface in the QCA package is similar to the one from R Commander (it has menus and separate dialogs), however its internal functionality is closer to the other highly appreciated user interface, RStudio.
Much like RStudio, the QCA graphical user interface is constructed in a webpage using the powerful shiny package (RStudio, Inc 2013) with additional Javascript custom code on top. Perhaps the best description of the QCA user interface is the effort to achieve Rcmdr functionality using shiny and RStudio technology, combining the best features from both.
A webpage is likely to become the most widely used environment to exchange and visualize data, partial results or even complete reports or presentations. It is obvious that many of the “traditional” software have siblings in cloud based environment (Google Docs for example), and it is actually possible to fully use an entire software in a webpage, through some sort of a virtual machine.
The team from RStudio is actually leading the way to building all sorts of webpage based useful tools, for example Shiny dashboards and html widgets. There is also another collaboration tool with a high potential to produce a significant impact in the academic world, named R Notebooks: not only it makes the code exchange possible, but thanks to a careful design it also exchanges the output of the code, all in a single HTML document that can be literally opened everywhere, irrespective of having R installed.
It seems that cloud and webpages, with flavors of HTML and Javascript inside, are starting to become a new kind of collaboration environment that R is happy to work with, either directly communicating via the package shiny or building snippet documents like R Notebooks.
These are precisely the reasons for which the QCA graphical user interface has been designed to be opened in a webpage: it is a modern approach, and with so many new tools appearing every day it has good prospects to still be valid many years from now.
The purpose of this chapter is less to offer an exhaustive presentation of the entire user interface, some menus and dialogs will be presented in specialized chapters. At this point, it is probably more important to introduce the most important features, especially how to open it and how to make the best out of the interface. Although it allows most of the base R features through the web based R console (see section 2.4.5), this is not a general purpose R graphical user interface, but a QCA specific one.
Once opened, it looks like any regular software installed on the local computer with menus, and each menu opens a dialog, where users click through various options and set various parameters then run the respective command.
The only difference, because the interface is opened in a webpage, is that dialogs are not separate windows but are created inside the webpage environment: if that is closed, all open dialogs will be closed.
Otherwise, there are multiple advantages building the user interface in a webpage. To begin with, the user interface looks exactly the same irrespective of the operating system: since the HTML and Javascript languages are standard and cross-platform, a webpage should look identical everywhere.
But there are even more advantages. Unlike traditional point and click software where dialogs are static, building these dialogs into a webpage makes possible to use all HTML and Javascript features that are normally present into a web environment: reactivity, mouse-over events, click events etc. The dialogs themselves change, function of specific options being clicked.
This will be especially evident in the chapter 4, where the calibration dialog contains a plot region where users can actually see the distribution of raw against the calibrated values, function of specific calibration options. This is helpful because it eliminates an additional step of verifying the result of the calibration after the calibration process: instead, it shows how the calibration looks like before the command will be sent to R.
Some dialogs can be resized to accommodate complex information, and all content will automatically be resized at the new width and height of the dialog. Such examples will be presented for example in section 11.6, where Venn diagrams for more than 4 or 5 sets become too complex for a smaller plot region, or even for an XY plot dialog with very many points to plot.
2.4.2 Starting the graphical user interface
The QCA graphical user interface is essentially a shiny app. An “app” is a generic term for any application project done with the package shiny, which provides the possibility to build interactive user interfaces that help understand the data structure and clarifies the nature of the problem to be solved. There are thousands of potential applications built with Shiny, either on local machines, or personal servers or even published and shared on the RStudio servers (https://www.shinyapps.io/).
While the RStudio family of applications open up an entire world of possibilities, like the interchange format of an R Notebook discussed in the previous section, a Shiny app does even more: it allows the user to interact with the R system using an easy to use, intuitive interface. A Shiny app need not be limited to a specific example, it can be customized to accepts various inputs which are fed to R by the interface, making the user’s life as easy as possible.
Shiny can do this kind of mediation between the user and R, because it has a communication engine that sends the commands to R, which responds to the HTML environment and the answer is displayed on the screen. Users don’t even have to know R to use these kinds of interfaces, they only need to click, drag and generally do whatever the web-based app offers within. This communication was possible because R provides a native web server, initially designed to display help pages for the various functions in various packages. Shiny built on this capability, and uses the R’s internal web server as a communication tool between the webpage and R.
Any such Shiny app can be opened using a function in the shiny package called runApp()
, which takes as its first argument the path to the directory where the app resides, and has different other parameters to set the host and the port where the web server should listen on. Users don’t need to know all this stuff, because the function automatically assigns a local host and a random port, unless specifically stated otherwise.
Which brings the attention over the important aspect of what a web “server” is. Most naive users think that a server is something which runs on the internet, in a public or a private mode. And this is true, but few users are aware of the fact that web servers can be run on the local computer, and the most common web address for a local web server is 127.0.0.1
which is the automatic default for the shiny package. The port number follows after a colon, making the web address of a Shiny app look similar to something like 127.0.0.1:6479
.
The QCA package offers a convenient wrapper function for all these details, and opening up the graphical user interface can be simply achieved with:
2.4.3 Creating an executable icon
While the user interface can be easily started from the R console after loading the QCA package, this approach suffers from a single drawback: the user’s experience is now restricted to the web user interface for the entire session.
From the very moment the web application has been initiated, R will not do anything else but to listen for the commands generated by the point and click menus. It is therefore impossible to resume working at the R console while the user interface is active, because R accepts no other commands while in the communication state with the web server. It is a choice of either the web user interface, or the normal R console, but not both in the same time.
Fortunately, this can be circumvented thanks to R’s internal operation procedures. As a command line interface software, R is primarily programmed to be started from a terminal console, usually from any kind of a terminal window. In the Unix environments (both Linux and MacOS), a terminal window is so natural that it doesn’t need further explanation, and Windows users might be familiar with the DOS black window, and something similar can be started by entering the command cmd
to the Start/Run
menu.
In any of these terminal windows, provided that R has been installed with administrator privileges and the path to the binary executables have been added to the system paths, R can be opened by simply typing the command R
(if installed to set its executable paths, or providing the complete path to the R executable file). The built-in R user interface / console is nothing but a shortcut to such a procedure, plus access to more advanced graphical devices.
Users can have two R sessions opened, one in the normal R console, and the other started in the terminal. Both can load the same package QCA, and one of them can be used to open the web user interface. While one of the sessions will be busy maintaining the web server communication, the other session can be used for any other R related task.
This kind of dual R usage can be further simplified by creating an executable file (icon) to start the terminal based R session and automatically open the web user interface, through a simple double click procedure. For the Linux users this is rather straightforward, involving the creation of a shell script and modifying its executable properties via the command chmod
. While this is natural for the Linux environment, the rest of this section will provide detailed instructions for Windows and MacOS users.
In the Windows environment, an executable file can easily be obtained by creating a file having the extension “.bat”. This can be achieved by opening any text editor and save an empty file having this extension. The content of this file should be the following:
CLS
TITLE QCA Qualitative Comparative Analysis
C:/PROGRA~1/R/R-4.2.1/bin/R.exe --slave --no-restore -e ^
"setwd('D:/'); QCA::runGUI()"
The first two lines are harmless, cleaning up everything in the terminal window (if already opened) and setting a title. The third and more complex command requires some explanation because it involves a combination of commands, all related to each other.
It starts by specifying the path to the R executable, in this example referring to R version 3.3.3 but users should modify according to the installed version. The C:/PROGRA~1
part simply refers to the C:
drive and the shortcut to the Program Files
directory (where usually all programs are installed), and the rest up to the /bin/R.exe
completes the path to the R executable file, to start R.
The --slave
part suppresses the prompts, commands, and R’s starting information, while the --no-restore
specifies that none of the objects created through this R session will be saved for later usage.
The -e
part indicates to R that a series of commands are to be e
xecuted immediately after R has been started, and those commands are found between the double quotes. The ^
part signals that a long command is continued on the next line.
While the sequence C:/PROGRA~1/R/R-3.3.3/bin/R.exe --slave --no-restore -e
are terminal specific commands, the sequence setwd('D:/'); QCA::``runGUI()
is a series of two R commands, separated by a semicolon.
Specifically, setwd('D:/')
sets the working directory to the D:/
drive (users might want to change that to a preferred directory of their own, see section 2.4.8), while QCA::runGUI()
indicates running the runGUI()
function from the package QCA (the ::
part is an R feature, allowing the users to run a function from a specific package without having to load that package). The runGUI()
function is responsible with the creation of the web page containing the user interface.
And that should be enough: with a bit of care to correctly specify the path to the executable R file, and setting the working directory to a place where the user has read and write permissions, saving this text file with a “.bat” extension will allow opening up the web user interface by a simple double-click on that file.
For MacOS users, the procedure is just as simple, although not immediately obvious because, unlike the Windows environment where a “.bat” file is automatically recognized as an executable file, in the Unix environment a file becomes executable only after specifically changing its properties.
The MacOS executable icon is also a text file, having a slightly longer extension called “.command”, and it contains the following:
It is basically the same command as in the Windows environment, only it doesn’t need the complete path to the R executable because under MacOS the R installation process automatically sets these paths in the system.
The /Users/jsmith/
is the home directory of a hypothetical user called jsmith
, and this is also something users want to customise.
Having this text file created, and named for instance QCA.command (assuming it was created on the Desktop, which is a directory found in the user’s home directory), the only thing left to make it executable is to type this command to the Terminal window:
This activates the executable bit of that file, and a subsequent double click will trigger the above mentioned behavior ending up with a web page containing a fresh instance of the user interface.
In Linux, a similar file can be created:
The exit 0
command ensures the terminal window is closed when the R process stops.
2.4.4 Command construction
Designing a graphical user interface for a command line driven software like R can be a daunting task. The variety of R commands is huge, and the same result can be obtained through many different ways. This is one of the reasons for which R is such a versatile and flexible data analysis environment.
A menu system, by contrast, is highly inflexible: computer programs don’t choose the best or the most efficient set of commands for a given problem, they are designed with a single way to deal with every mouse interaction, making the user interface look like a prison. Users of traditional, point and click data analysis software (for example SPSS) many times believe the particular way that software presents the “solution” is a golden standard.
But that belief is very far from the truth, because any problem can be solved in hundreds of different ways, depending only on users’ imagination. Some ways are more efficient, some are less efficient but easier to read by other peers, some are both efficient and easy to understand. By contrast, a point and click menu system offers only one possibility out of a potentially infinite number of ways to solve the same problem. It is the developer’s responsibility to design intuitive ways to interact with the user interface, if possible some that are suitable for both advance and novice users.
One purpose of this graphical user interface is to facilitate using the QCA package, even for those users who don’t necessarily know how to use R, however it should be very clear that a user interface, even an intuitive one, can never replace written commands.
To help users learn both QCA and R at the same time, one of the particularly nice features of this user interface is the way it automatically constructs R commands for any given dialog.
This kind of feature is not new, in fact it was employed for many years by John Fox in his R Commander user interface for R. What is different, and could be seen as an improvement over R Commander, is the way the command is constructed: interactively, upon each and every click on a specific dialog.
For novice users, it is extremely useful to see how a visual dialog to import a dataset (for example), is transformed into written commands, using various clicks to specify natural, but unknown parameters like what kind of column separator it has, decimal separator, which column refers to the case names etc.
The ultimate, long term goal is to avoid trapping users in a specific system of menus and dialogs, by demonstrating how to achieve the same thing with both clicks and written commands. There is a special “Command constructor” dialog which is always present (and cannot be closed), which displays how the command is constructed after every click on dialog options. Every such option (or combination of options) corresponds to a specific set of formal arguments from the QCA package functions, and hopefully users will understand how these commands are constructed.
This ultimate goal will have been achieved when users will eventually stop using the user interface completely, in favor of the more flexible written syntax: it should be a transition phase towards written commands, but it can be used as a standalone application.
The command constructor is not editable: it automatically builds the commands from various clicks in the menu dialogs, but this is not (yet) a fully editable syntax editor. Future plans include extending this functionality, but for the moment it is only meant for display purposes.
2.4.5 The web R console
The first versions of the user interface offered only the menu system, with all communication between the webpage and R being rather hidden. While this approach was simple enough to be understood by novice R users, it suffered from a serious drawback: at some point, it becomes a burden to repeat dialog clicks in order to obtain similar truth tables or minimized solutions.
In the regular R session, users can create many objects, including truth tables, and also save the results of a Boolean minimization to a given object. It is not only to avoid repeating commands, but more importantly because these kinds of objects usually contain a lot of additional, useful information. For example, the minimization process involves creating and solving a prime implicants chart, which is not normally printed on the screen but can be inspected at a later stage to completely understand how the minimization was performed.
Other users might be interested to inspect the easy and difficult counterfactuals for a specific minimization (see section 8.6), especially when directional expectations have been used (section 8.7). All this information is returned into a single list object, out of which the internal function print only the solutions, eventually their inclusion and coverage scores. Saving that list to an object makes it possible to inspect it later, and this kind of interactivity is difficult to capture in a graphical user interface because there are more types of objects than what a user interface usually can display.
Newer versions of the user interface allow the possibility to create such objects directly from menus, however their inspection can only be performed from the command line R console. Along with the “Command constructor” dialog, the user interface has another mandatory dialog called the “R console”, with an attempt to mimic a regular R console and display the results of the minimization commands, or even to display truth table objects.
In the first version(s), this web based R console was not interactive: it was displaying what a normal R console would display, but users could not interact with those objects. A lot of effort was invested to make this interaction possible, in such a way that it would be as close as possible from a normal R window. Everything that would normally be displayed in the normal R console is captured by the interface and printed on the webpage.
The newest versions of the user interface feature this interactive web R console, that supports typing commands (exemplified in figure 2.2) so that users can use menus and dialogs, as well as written commands, to produce and modify objects.
Although it looks like the user interacts with R in a very direct way, in fact the information from the user interface is collected (either from dialogs or from the written commands) and sent to R as plain text. The communication is mediated by package shiny, which brings back the response and specialized Javascript code displays it either in the web R console or in various other dialogs, including plots.
The experience is dual: dialogs can create or modify objects, while written commands can re-create or modify various options in the dialogs. If the user creates a new variable in the R console, the variable will be immediately reported to the dialogs in the user interface, that get refreshed to display the new set of variables. This kind of reactive interactivity is valid for almost anything in the graphical user interface, from entire columns in the data to individual values that are displayed in various plot dialogs.
The web console behaves almost exactly like a normal R console, just in a distinct and separate evaluation environment. Objects that are created in the web environment do not exist in the original R environment that started the web interface, they are specific and unique to the web one. But otherwise practically all normal R commands are allowed, including source()
: it reads the code within the sourced file and it interprets it just like the commands from the web R console, and the result is brought to the web environment.
2.4.6 Graphics
R is well known for the quality of its graphical devices, first developed at the Bell Laboratories (Murrell 2006). It has a rather sophisticated system of both low level and high level graphics, capable of drawing anything from simple statistical plots to complex images with pattern recognition.
As complex as it is, this system is fixed: once drawn, a graphic is like an image and to change something it needs to be recreated. It is possible to alter various components of the graphic, but it is still not an interactive system.
On the other hand, the web based graphics could not be more different. They might not be as sophisticated as R’s graphical system, but they more than compensate through a lot of interactivity. A simple bar chart can behave differently, if only because the bars can be programmed to react on mouse over, or at click events: they change color, or get thicker borders, or display various information about a specific category etc.
Since R commands are possible through the web R console, it made sense to also capture graphics, in addition to printed output, errors and messages. There are related packages like evaluate, knitr, raport or pander, which parse source files to produce complex outputs (like reports, books or articles) and they can include plot results within the text. Inspired by the work in those packages, some effort was invested to bring the R graphics in the webpage environment. Without going into many technical details, it involves saving the plot from the base R to an SVG device, which can be read and displayed in the web environment.
And since an SVG can be resized on the fly, the resulting plot dialog can be resized and the graphic will automatically follow. Figure 2.3 is simple plot example obtained with the following commands which can be typed in the web user interface:
Adding subsequent plot commands is possible (in this example adding a title and drawing the cross lines), because the code first verifies if there is a plot dialog open. If the plot dialog would be closed, any of the last two commands would trigger the R error that “plot.new has not been called yet
”.
There are many attempts to extend the base R graphics and make them interactive, using either Java add-ins or RGtk2 with the cairoDevice, and there is even an implementation of an openGL in package rgl for 3D interactive plots (heavily used by the Rcmdr package). For some reasons, in the past 5 or 10 years, more results have been achieved with graphics in a webpage than for traditional software. Perhaps because of the unprecedented use of the web world that attracts a high number of web developers, there are Javascript libraries out there that can already rival traditional software and the trend shows no signs of a slow down.
The SVG - Scalable Vector Graphics standard has been extended and it can accommodate animations, morphing, easing and all sorts of path transformations based on matrix calculations. It is not surprising that libraries like D3.js have appeared, which seems to be ideal for data driven graphic displays.
Improved results have been recently presented at the RStudio conference in 2017, with yet another interactive 3D network visualisation tool of the future that uses a Javascript library called “threejs”, packed in a htmlwidget which can be opened everywhere there is a fairly recent web browser installed. This library can easily draw (and animate) force directed graphs with hundreds of thousands of points.
It is far from the goal of this user interface to completely replace R’s graphical system. Rather, the purpose is to enhance the traditional graphics system with more interactive options, specific to a webpage.
Web graphics are different from the base graphics, and are drawn separately in the web environment, reason for which the user interface presents three different types of plot dialogs:
- one for a normal R plots (captured from the base R),
- one for the XYplots, and
- one for the Venn diagrams (last two started from dedicated menus).
The first has already been discussed, allowing any kind of static graphics including ggplot2 type of plots. In the XY plots, the points are allocated labels that are displayed at mouse-over events, and the labels can be displayed and rotated around the points. In the Venn diagrams, for every intersection in the truth table that has cases, the mouse-over event displays those cases.
The long term goal is to unify all these plot dialogs in a single environment, that would:
- automatically detect which type of plot is being used, and mediate different interaction possibilities for each
- offer various export options depending on each type of plot: for the interactive Javascript plots, it might necessary to export from SVG, or determine the best command that produces the most similar static plot.
All of the web-based graphics are drawn using an excellent SVG based library called “Raphäel.js”, entirely written in Javascript. In fact, all dialogs have been designed using this Javascript library, with custom code for all radio buttons, check boxes and every bit of interactivity.
The normal R plots can be saved on the hard drive in six most common formats (the base R has even more possibilities), using the menu:
Graphs / Save R plot window
2.4.7 The data editor
One highly appreciated feature of any graphical user interface seems to be a data editor. In principle, this kind of dialog is useful only for small datasets. For large or very large data (thousands of rows and hundreds of columns), it is obvious the data would not fit into the screen. Whatever the users “see” in the data editor, no matter how large the dialog, is only a fraction of the entire dataset, and the usual procedure is to move the visible area up and down, or left and right.
Such a dialog is difficult to develop in a web environment. Usually, HTML tables are split into small chunks, and users click on a “Next”, “Previous”, “First”, “Last” buttons, or any other chunk number to have the respective part displayed on the screen. This solution is understandable, because scrolling through the entire dataset is sometimes impossible because the dataset would have to first be loaded in the webpage before it can be scrolled.
Normal installed software can deal with this “loading” on the fly, working with local compiled code which is extremely fast, but a webpage relies on many other (slow) factors, for example the speed of the internet connection. The Javascript language can be extremely fast too, but it cannot beat a locally compiled code and it is not designed as a replacement. Its nature, that does something in a webpage, serves very different purposes.
Loading time is especially essential, for nobody wants to spend more than a couple of seconds before a webpage could be used. The traditional solution, therefore, is to split the data in smaller chunks that could be quickly delivered in the webpage. Faithful to this procedure, it is exactly the way RStudio displays data in most of their Shiny apps.
This inconvenience has been properly dealt with in the QCA user interface, which does not rely on classical HTML for display purposes. As mentioned in the previous section, it uses an SVG based, Javascript library called Raphäel.js that acts as a workhorse for the entire user interface. Instead of a regular HTML table, the dialog constructs a Raphäel “paper” (a concept similar to an HTML5 canvas), where it is possible to achieve scroll behavior by defining the full length and width of the paper without actually populating the dialog with the entire data.
Only the visible part is populated, and the rest is empty but ready to be populated when the user scrolls to another specific location. Figure 2.5 shows a vertical scrollbar on the right side, and the Javascript code behind detects a mouse-scroll event that is binded to the Raphäel paper. Whenever the paper is scrolled in any direction, the content of the cells is erased and the dialog starts a communication process with R to ask about the content from the new location. The behavior is very similar to a chunked HTML table (moving from chunk to chunk), while in this case it is moving from one location to another location (given by the new scroll).
This is one feature that brings the usage experience a step closer to a normally installed software. Custom code has been developed to make it look like a normal spreadsheet instead of an HTML table. On the entire content of the data editor dialog, there are click events that simulate the selection of a single cell, or a row name, or a column name (all of them can be considered “cells”).
There is also a double-click event to simulate editing the content of any given cell, by creating a temporary text box on top of the designated cell (at exact coordinates): once the text has been edited, the webpage detects the “Enter” or “Escape” events to move decide whether to commit the change or drop it. When committing a change, a new communication process is started with R to announce the change at that specific cell.
It might seem trivial, but this simple change has ripple effects throughout the entire user interface. If the change modifies the range of values (say, by changing a value from 100 to 1000), the new range is reported to any other editor which uses that variable (many such examples in the calibration chapter).
Finally, since the user interface has been upgraded to deal with multiple objects, it seemed natural to provide a quick way to select various data frames from the workspace. The top left corner of the dialog shows a button with the name "LR"
written in red (the name of the dataset currently being displayed. A click on that button opens up a dropbox containing all loaded datasets, and selecting any of them will refresh the dialog with the new data.
When multiple datasets are loaded in the memory, they can be visualized in multiple ways. Either selecting from the dropbox in the top left corner, or if a dataset is small it can simply be printed on the screen, by typing its name in the R console. The normal R has another way to display a dataset, using the function View()
.
When typing a command such as View(LR)
in the base R console, a separate window pops up where users are presented a very basic data browser. Under Windows the data can be scrolled up-down and left-right, while in MacOS it cannot even be scrolled, it looks as though it is a still image constructed in the graphics window. And this is a window designed for browsing data only: it cannot be modified in any way.
This is another reason to improve this functionality in the web interface, and more custom core has been produce to capture the normal R output and divert it to the web interface. Using the same command View(LR)
in the web R console, the data will be displayed in the interactive web data editor, despite the command being evaluated in the “base” R, global environment.
2.4.8 Import, export and load data
To bring the data file into R there are many alternatives, depending on the file’s type. There are many binary formats from other software, but one of the simplest way to exchange files, especially between different operating systems, is to use a text based file. Usually, these files have a .csv
extension (comma separated value):
This command creates a new object called datafile
in the workspace, combining the function read.csv()
with the assignment operator “<-
”.
When looking at this command, a first time R user might be puzzled, as it makes little sense and would likely fail on other computers. Most of the commands in this book can simply be copied and run on the local computer, and they should run just fine. But a command such as this one is bound to fail on a different computer, for two reasons. First, the file on the hard drive has to be named exactly “datafile.csv”, otherwise the user should change the name of the CSV file, and second because the local computer has zero knowledge about the location of the file.
As simple as it might seem, this scenario is potentially disruptive for the beginners. There are two solutions to this situation in base R, plus an additional one in the graphical user interface. The first basic solution is to set the working directory (a topic presented in section 1.1) to the place where the CSV file is located:
The string "/path/to/a/certain/directory"
should be changed according to the local file system (here similar to Linux and MacOS). It can be verified if the working directory was indeed changed to the desired location by typing the command getwd()
.
The second solution, and probably one which should be preferred, is to specify the complete path to a specific file, an approach which is independent of the working directory settings above (the example below is specific to a Windows environment):
read.csv()
has some other formal arguments, compatible with those from the more general function read.table()
. The user can specify whether the file has a different separator (often tab separated value, for example) via the argument sep
, or indicate if the data file contains the variable names in the first row, via the argument header
, and so on.
The extension “.csv” is not important, sometimes the text files have a “.dat” extension and rarely “.tsv”, the latest meaning tab separated values. The separator is the character that differentiates values from two different columns.
More details about all their arguments can be read using this command:
The third solution is the most simple for all users, at any level. It requires no prior knowledge of R commands, and it uses interactive dialogs for navigating to a specific file, in a similar way to any normal graphical user interface. The dialog is opened using the menu:
File / Import
Figure 2.6 shows a minimalistic, yet comprehensive import dialog where users can navigate to a specific location in multiple ways. A first way is to double click the lines on the right side of the dialog. Directories are signaled with a plus +
sign, and the parent directory is always the line with the two dots ..
at the top of the navigation area.
There are no “Back” (or “Forward”) button to select the previous directory, however above the navigation area there is a structure of directories that compose the current path. By double clicking any of the directories on that path, the navigation area changes with the content of that directory. If the path is very long (as in the figure), and it does not fit on the screen, it can be dragged left and right using the mouse.
Finally, a fourth possible way to navigate to a specific directory is to manually enter the path in the Directory textbox. This is very useful when working with USB flash disks, that are mounted to very specific places (or under Windows in very specific drives) that the web based user interface does not know anything about.
Once a specific CSV file has been selected, there is an area on the left (below the Preview column names) which is automatically filled with the names of the columns, if they exist in the header of the file. This is an extra measure to make sure the data will be correctly imported, before it is actually imported. Should all settings be correct according to the structure of the CSV file, the names of the columns are listed in that area.
Much like in the written command, there are various arguments to select one or another column separator. By default, this is a comma (hence the name of the CSV - Comma Separated Value file), but there are various other possibilities, the most frequently encountered being a tab, or a space. Any other separator (for instance a semicolon ; sign) should be manually entered in the other, please specify: textbox.
Another setting is the Decimal radio box, for files where the decimal values are separated with a dot (the default in the English speaking countries) or with a comma (as encountered for instance in the Eastern Europe and German speaking countries).
There is one final, optional setting to make, by specifying the No./name of the column containing row names textbox. If any of the columns from the datafile contains the names of the cases (or rows, as opposed to the names of the variables, or columns), this option will place these names on the rows of the imported data. It can be either a number (the position of the column containing the row names) or even the actual name of the column containing these row names.
Before actually importing the data, a name should be given in the textbox next to Assign checkbox, to save the data into an R object in the workspace. Finally, a click on the Import button brings the data into R. This procedure mimics the written command, with various arguments of the function being manipulated via the dedicated options in the visual dialog.
All this procedure is needed to import custom data, but otherwise the examples from this book are exclusively based on the datasets that are already part of the QCA package. They can be loaded by simply typing the command data()
with the name of the dataset, or use another dialog from the graphical user interface:
File / Load
In order to demonstrate functionality, many packages contain datasets needed to replicate the commands from the functions’ help files. This is also very convenient, saving the users from having to import their data themselves. To access the datasets from a specific package, the package has to be loaded first, using the function library()
and the name of the package.
Once this command is given, the package becomes attached and the functions inside (as well as the contained datasets) become available. In figure 2.7, there are only three attached packages (datasets, QCA and venn), and package QCA was selected. There are multiple datasets contained in this package, among which the selected one in the dialog (LR).
Once a specific dataset is selected, the box immediately below offers some basic information about the data. A complete information about the dataset (for instance what the name LR means, what columns does it contains, how were they calibrated, from which theoretical reference is the dataset taken from etc.) can be obtained by clicking on the Help button, which opens up a different webpage.
Data is usually modified during the analysis, and there is no “Save” button to make sure the modifications are preserved after the current session is closed, neither in command line nor in the graphical user interface. That is due to a particularity of R to maintain all objects in the memory (in the workspace, see section 1.2), and they need not be saved as long as R is still opened. Only when closing R, all objects can be lost if they are not written to the hard drive.
The usual command to save a dataframe (or to export and save it to the hard drive) is write.csv()
, which is a particular case of the general function write.table()
. There are very many arguments to these functions, especially those which are specific to certain operating systems (for instance, the end of line code is different between Windows, Linux and MacOS). Also, there are various options to select a text encoding, because most of the languages have special characters (accents, diacritics etc.) which require a different encoding than the default one.
This book does not cover all of those options, and readers are encouraged to read about them in a general purpose book about R. Assuming that most of the default options are sufficient for the vast majority of users, only some of the arguments from write.csv()
are going to be introduced.
The graphical user interface dialog, specific to exporting a dataset, is even more simple. It contains only the very basic options, which should be enough for most situations. There is a dedicated function in package QCA called export()
, that is basically a wrapper for the base function write.table()
with only one difference.
The main reason for this function is the default convention for the function write.table()
to add a blank column name for the row names. Despite this (standard) convention, not all spreadsheet software can read this column properly, and many times the name of the first column in the dataset is allocated to the column containing the row names.
The function export()
makes sure that, if row names are present in the dataset (in the sense they are different from the default row numbers) the exported file will contain a proper column name called cases. If the dataset already contains a column called cases, the row names will be ignored. Users are naturally free to choose their own name for the column containing the row names.
The dialog to export a dataset is opened with the menu:
File / Export
Since there is a single dataset in the workspace, it is the only object that appears in the area below Dataset in the left part of the dialog. The rest of the options should be self explanatory, from selecting the type of column separator to choosing a directory where to save the dataset to.
A name for the file should be provided in the New file textbox. By default, if a file with the same name exists in the selected directory, a checkbox will appear asking to choose if the file will be overwritten.