Chapter 3 Set theory
QCA is a methodology based on sound theoretical foundations and robust software implementations. In the first version, Ragin (1987) presented a binary system which was dubbed as the “crisp” sets (csQCA), and following various critiques in the next years it was extended to fuzzy sets (fsQCA) with a pit stop through the “multi-value” variant (mvQCA).
Whatever the version, the term “sets” is extensively used throughout the QCA terminology, and its understanding is key to using the entire methodology. Although very common in the social sciences, this term has its roots in mathematics, where there are various types: set of real numbers, set of natural numbers etc.
In the social sciences, a set can be understood as a synonym for a category, and research methodology mentions many categorical variables such as Residence, where the category “Urban” can be understood as the set of people living in urban areas, and the category “Rural” as the set of people living in rural areas.
Fueled by the developments of the Qualitative Comparative Analysis in the recent years (Ragin 2000, 2008b) social research methodology witnesses an interesting theoretical duel. Emerging from the artificial opposition between the qualitative and quantitative research strategies (they approach the social world in very different ways), a new kind of conceptual competition captivates the community’s attention: the debate opposing variables and sets, with deep implications regarding measurement and interpretation.
3.1 The binary system and the Boolean algebra
The natural language can be transposed into a mathematical language, combining some fundamental laws of logic and a special type of language having only two values: true and false.
The history of these values dates back from the Chinese concepts of Yin and Yang, expressing a continuous duality in the nature. A similar line of thought was advanced by the philosopher and mathematician Gotfried Leibniz (from a biography written by Aiton 1985), who believed so much in the power of these symbols that led him to invent the binary mathematics.
Leibniz devoted his entire life to this system, which in his final years became almost religious where 1 represented the good and 0 represented the bad. Similar to the Chinese philosophy, Leibniz’s world was a continuous struggle between good and bad, and he truly believed the binary system of mathematics had a divine origin.
His work was neglected for almost 150 years until the middle of 19th century, when another great mathematician named George Boole refined the binary system to become useful for logics, as well as mathematics. Both Leibniz and Boole were well ahead of their times, when the scientific community was not prepared for their work and failed to understand its use.
Boole’s system was also neglected by his peers, until a few decades later the first real applications at the MIT - Massachusets Institute of Technology in the United States.
3.2 Types of sets
Towards the end of the 19th century, with an interest of some properties of the infinite series, the German mathematician and philosopher Georg Cantor created a theory of abstract sets which fundamentally changed the foundations of mathematics (Dauben 1979). His initial version (also known as the naive set theory) was later very much extended, and the modern set theory contains more axioms and types of sets than those described by Cantor. In summary, a set can be defined as a collection of objects that share a common property.
The objects inside the set are called elements, and each such element is unique. Mathematics has many set related properties and concepts: finite, infinite, conjunct, disjunct, equal etc. It can be noted that social science methodology, and especially comparative social science, uses a formal terminology with many terms borrowed from set theory, for example employing categories when segmenting human populations: rural / urban, males / females, lower / medium / upper levels of education etc.
Formally, there are two main types of sets:
- Crisp
- bivalent
- multivalent
- Fuzzy
The first family of crisp sets is Cantor’s original creation: an element is either inside the set or outside. It contains two subcategories: the bivalent sets with only two values, and the multivalent sets which can contain more than two values (no limit on the number of values, but all are discrete).
Despite the QCA terminology making a distinction between crisp sets (csQCA) and multi-value sets (mvQCA), in fact they are both “crisp” and what has been coined as “crisp sets” are in fact bivalent crisp which, as it will be shown, are a special case of multivalent crisp sets. Any bivalent crisp set can be specified as a multi-value set with only two values.
The fuzzy sets family is different because a fuzzy set can have an infinitely large number of possible values, and elements are not just in or out but more or less included in a given set, starting from the value 0 (completely out) to the value 1 (completely inside).
3.2.1 Bivalent crisp sets
The bivalent crisp sets are collections of well defined elements, having or not having a certain property, therefore belonging or not belonging in the set defined by that property. In a formal notation, any such set can be represented by enumerating all its distinct elements:
\[\mbox{A = } \{a_1\mbox{, } a_2\mbox{, } \dots\mbox{, } a_n\}\]
The set of all elements from all set is called the Universe (U), and its elements are \(x_{1 \dots n} \in \mbox{U}\). All other sets are subsets of this Universe.
For each element in the Universe it can be said “yes” (true) if the element belongs to a set A, and “no” (false) if it doesn’t. In the classical set theory, elements have only two values: 0 (false) and 1 (true). In formal notation, there is a function to attribute these two values:
\[\mu_{A}\mbox{(x)} = \begin{cases} 0 & \mbox{ if x } \notin \mbox{A}\\ 1 & \mbox{ if x } \in \mbox{A} \end{cases}\]
In the social sciences, crisp sets are also called mutually exclusive categories. A rose is part of the flowers set, while a horse does not belong to this set. One element can belong to multiple sets: a person can belong to the set of women and to the set of mothers.
For any two sets A and B from universe U, we can say that A is a subset of B if and only if any element that belongs to set A is also an element of the set B:
\[\mbox{A} \subset \mbox{B}: \{\mbox{x} \in \mbox{A} \Rightarrow \mbox{x} \in \mbox{B}\}\]
A set that does not contain any element is called the empty set (the notation is \(\varnothing\)), and this is a subset of any other set.
Starting from Aristotle, the origin of logics is bivalent, where any proposition has a single truth value: either true or false. This system has three principles (laws) of logics that constitute bivalent crisp sets:
- Principle of identity: an object is what it is (in other words, it is equal to himself). This principle makes the distinction between an object and all the others, similar to the psychology concept of “self” as compared with “others”
- Principle of non-contradiction: it is impossible for an object to exist and not exist at the same time, or for a phenomenon to both happen and not happen. It is either one, or the other, but not both.
- Principle of excluded middle: a proposition is either true or false, there is no third alternative.
3.2.2 Multivalent crisp sets
Traditional, scholastic logic is inherently bivalent, according to the principle of excluded middle. However, bivalency has been questioned by Aristotle himself, who formulated a paradox using propositional logic and a combination of logical expressions, testing bivalency in a temporal order situation.
Truth values can be attributed to past events: once it happened, we cannot say that a phenomenon did not happen because its truth value transcends time. After it happened, it becomes true at any point in time: immediately after, in the present and in the future.
The bivalent logic is compatible with the past (that we already know), but it is difficult to apply for the future, a situation which laid Aristotle’s paradox. Two thinking systems can be applied, with respect to the future:
- Determinism: if something must happen, it will happen no matter what we do. This is similar to the concept of fatalism, the belief that everything is predetermined and inevitable, and we don’t have any control over our fate.
- Free will: we decide what is going to happen, admitting the ability to choose what it happens (as well as what it does not happen), in the absence of any external constraint.
Any event has successive previous causal conditions, which have their own causal conditions, to the infinity until the beginning of time. If going to the future, as if it already happened, we will say the same thing about an event which didn’t yet happen but is about to happen in the present, and that event will have had its own infinite chain of causes. In a deterministic world, an event is atemporal because, given all causal conditions which happened in the past, are happening in the present and will happen in the future, the event is necessarily bound to happen.
Aristotle made the statement: “Tomorrow, there will be a battle”, for which it is impossible to attribute a truth value today, because it did not happen yet. However, applying the deterministic logic to the future (the causal chain inevitably leading to the event) negates the free will which today we know that it’s true, thus breaking the principle of non-contradiction which states that something cannot be true and false in the same time.
Approaching the deterministic system, the Polish philosopher Jan Łukasiewicz created at the beginning of the 20th century a logical system (Borkowski 1970) that goes past the traditional bivalent philosophy and offers a solutions to Aristotle’s paradox. This system, noted with \(\mbox{Ł}_3\) has not two, but three truth values:
\[\mu_{A}\mbox{(x)} = \begin{cases} 0 & \mbox{false}\\ 1/2 & \mbox{undetermined (neither true, nor false), partially true}\\ 1 & \mbox{true} \end{cases}\]
Dismissed at first, Łukasiewicz’s philosophy was eventually accepted and led to the generalisation of his trivalent system to multivalent systems with n values. The truth values are obtained through a uniform division of the interval [0, 1] into n distinct values:
\[\left\{0\phantom{a} =\phantom{a} \frac{0}{n-1}\mbox{,}\phantom{a} \frac{1}{n-1}\mbox{,}\phantom{a} \frac{2}{n-1}\mbox{,}\phantom{a} \dots \phantom{a}\mbox{,}\phantom{a} \frac{n-1}{n-1}\phantom{a} =\phantom{a} 1 \right \}\] It can be seen that bivalent sets with only two values (0 and 1) are just a particular case of a multivalent crisp set with n values. The attribute “crisp” can be applied to any set where the elements are separated and distinct from one another.
3.2.3 Fuzzy sets
Crisp sets are generally finite. At least in the social sciences, qualitative variables don’t have an infinite number of categories. After the generalisation of Łukasiewicz’s theory from 3 (\(\mbox{Ł}_3\)) to \(n\) truth values (\(\mbox{Ł}_n\)), it was only a question of time to a full extension of the theory towards an infinite number of possible values. In fact, the model \(\mbox{Ł}_n\) is very close to the fuzzy sets, because \(n\) can be a very large number (close to, or equal to infinity).
Instead of dividing the space in \(n\) distinct values, another solution was proposed several decades later by the mathematician Lotfi Zadeh, who introduced the concept of “fuzzy sets”.
In this original definition (Zadeh 1965, 338), these are:
… a class of objects with continuum of grades of membership. Such a set is characterised by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one.
Between a minimum (nothing, zero) and a maximum (completely, one) there is a continuum, an infinite number of degrees of membership. An element can more or less belong to a set, not just inside or outside of the set. A population is not simply rich or poor, but somewhere in between. Function of the definition of welfare, a population belongs more in one set and less in the other, but it is difficult to imagine it strictly belongs into only one of them.
At this point, fuzzy set theory separates from the established social science methodology, where “rich” and “poor” are just the two opposite ends of the same continuum. It is a current practice to use bipolar scales (e.g. Likert type response scales) to measure attitudes and opinions on various levels of agreement/disagreement.
From the perspective of fuzzy sets, on the other hand, rich and poor are two separate sets, not just two ends of the same continuum. Each set has its own continuum of degrees of inclusion, so that a person (or a country) can be both rich and poor, in various degrees.
This is somewhat counter-intuitive from a quantitative, correlational point of view (where social reality is defined symmetrically from small to large) but it makes perfect sense from the perspective of set theory. It all relates to something called simultaneous subset relations that defines social reality as asymmetric, a topic which will be covered in more depth in chapters 5 and 6.
Another difference, and often source of confusion, is the apparent similarity between a fuzzy set and a probability. Although they both take values in the interval [0, 1], they are in fact very different things. If a stove has a 1% probability of being very hot, there is still a chance to get severely burned when touching it, but if the same stove has a 1% inclusion in the set of very hot objects, we can touch and hold for as long as we want and it will still be harmless.
3.3 Set operations
In the algebra that carries his name, Boole (1854) formulated three basic operations that are still very much in use today, extensively used in computer programming. These three can be applied for both crisp and fuzzy sets, just using different calculation methods.
3.3.1 Set negation
This is probably the simplest possible operation, and the calculation method is similar to both binary crisp and fuzzy sets. Negation consists in finding the complement of a set A from the universe U, which is another set written as ~A, formed by all other elements from the universe U that are not in A.
Negation is a called a “unary” operation because it accepts only a single argument. In R, there are various ways to negate either a truth value or a binary numerical one.
[1] FALSE
The “!
” sign is interpreted as “not” and it negates any logical value. It can work on a scalar, and it also works on vectors.
[1] FALSE FALSE TRUE TRUE
In this example, all values from the object lvector
have been inverted. The same kind of operation can be performed subtracting the values from 1:
[1] 0 0 1 1
Due to the automatic coercion of data types (in this case from logical to numeric), the true value become equal to 1 and those false become equal to 0, and the result is as simple as the mathematical subtraction from 1, which works for binary crisp values as well as for fuzzy values:
[1] 0.7 0.6 0.5
Negation is a bit more complex for multivalent sets, but the essence is the same. For any such set with at least three values, say {0, 1, 2}, the complement of {0} is the set containing the rest of the values {1, 2}. In the same fashion, the complement of {1} is the set containing {0, 2}.
Negation is usually used with such numerical inputs but it doesn’t necessarily be applied to numbers only. In natural language it can be applied just as easily using the expression “not”. In terms of categories, an expression such as “not male” consists of all individuals who are not males. Other expressions can have logical implications, such as “not mother” which refers to all women who are not mothers, because the set of mothers is a subset of the set of women.
Neither the “!
” operator, nor the “1 -
” operation can automatically be applied to a multivalent set, because such a negation needs an additional information regarding the set of all possible values. Negating a certain value from a multivalent set has an unknown result in the absence of the complete information about all possible (other) values. To the limit, this information could be read from the input dataset, but there is no guarantee the input is exhaustive.
3.3.2 Logical AND
This operation is also called a “logical conjunction” (or simply a conjunction), and it takes a true value only when all its elements are true. If any of the elements is false, the whole conjunction will be false.
In natural language, we may say students who have high grades are intelligent and study hard. Non intelligent students don’t have high grades, just as intelligent students who don’t study hard. Only when both attributes are met, both intelligent and study hard, the grades are high.
This can be exemplified with logics, the result of the logical AND operation is true only when all conditions are true:
|
|
There are various combinations of vectors in R to exemplify, one of the specific functions is all()
:
[1] FALSE
The result of the function all()
is false because at least one of the values of lvector
is false. Subsetting for the first two:
[1] TRUE
In this case, the result is true because both first two values of lvector
are true.
Another R operator that refers to conjunctions (intersections) is the ampersand sign “&
”, and the example can be further extended with combinations of logical vectors:
[1] TRUE FALSE FALSE FALSE
Here, the result of the “&
” (logical AND) operation is a vector of length 4, comparing each pair of the values from lvector
with those of rvector
, and the resulting values are true only when both values from lvector
and rvector
are true, in this case only the first pair.
The R specific recycling rule can also be applied:
[1] TRUE TRUE FALSE TRUE
In this example, the shorter rvector
(of length 2) has been recycled to reach the length 4 of the longer lvector
, and the final result shows a true value on the second position because only the second values from both vectors are true. Recycling didn’t matter in this case, because the third and fourth values from lvector
are both false.
The logical AND is also useful to make data subsets based on various criteria. Using the data frame dfobject
from section 1, which has four rows, the following can be applied:
A B C D
C3 rural 14 7 3
C4 rural 15 8 4
In this example, only two cases (C3 and C4) conform to both conditions that A is rural and B is greater than 13. Both these conditions generate logical vectors that are combined with the “&
” (AND) sign, and the resulting logical vector is used to select rows from dfobject
.
As already mentioned, another interpretation of the logical AND is that of an intersection. When A is the set of intelligent students and B the set of students who study hard, then intelligent students who study hard are found at the intersection between A and B.
The base R has a function intersect()
that intersects two sets:
[1] 5 6
The fuzzy version of the logical AND has a different calculation method, because the set elements have values anywhere in the interval [0, 1], that is each element has a membership value in a specific set.
A person may have a 0.8 membership score in the set of intelligent students, and 0.3 in the set of students who study hard, and the logical operation is obtained by taking the minimum of those membership scores:
\[A \cap B = min(0.8, 0.3) = 0.3\]
The result is a membership score in the set of “A and B”, equal to the minimum between the two membership scores in the component sets. Zadeh (1965, 341) calls it: “the largest fuzzy set which is contained in both A and B”.
The QCA package has a function called fuzzyand()
that is similar to the built-in function pmin()
that performs parallel minima, with some improvements. It accepts vectors, and also data frames and matrices for which it applies a min()
function on the rows, thus simulating a parallel minima on a rectangular object.
In addition, the result value of fuzzyand()
is automatically given an attribute called "name"
, which is the formal conjunctive notation of the result. To achieve this, another improvement is the automatic detection of negated inputs using the usual “1 -
” notation:
[1] 0.7 0.2 0.4
Detection of negations becomes visible when inspecting the name attribute. The function automatically detects a negation using subtraction from 1, and signals it using a tilde:
[1] "A*~B"
3.3.3 Logical OR
Logical OR is also called a “logical disjunction” (or simply a disjunction) and refers to any alternative way to achieve a result. The logical OR is more strict than the natural language word “or”, which has many other interpretations.
In the natural language it may sometimes refer to exclusively to one, or the other, but not to both in the same time: “You can get to your workplace taking a taxi or taking a bus”.
Some other times it may also refer to any option (inclusive), even both happening at the same time: “That person lost weight, so must have kept diet or made a lot of sports”, where these options don’t exclude each other.
In logics, if any condition is true (even both at the same time), the result of the logical OR operation is true:
|
|
These kinds of relations may be exemplified using various combinations of logical vectors in R, having an immense applicability potential, especially for indexing and subsetting purposes.
[1] TRUE
In the example above, the object lvector
is a logical with four values, first two being true. The function any()
returns true if any of those four values is true, and it can be rewritten in natural language as: first is true, or second is true, or third is true, or fourth is true. The final result is TRUE
, because at least one of those four values is true.
The example above can be further extended with combinations of logical vectors:
[1] TRUE TRUE TRUE FALSE
In this example we have two logical vectors, combined with the “|
” (logical OR) sign, the result being another logical vector of length four, where only the last value is FALSE
because it is false in both input vectors.
The R specific recycling rule can also be applied:
[1] TRUE TRUE FALSE TRUE
Here, the shorter vector rvector
has been recycled at the length of the longer vector lector
, and this time the third value is FALSE
because when rvector
has been recycled, its third value becomes false and the third value in lvector
is also false.
Just as the previous two logical operations, the logical OR can also be useful to make data subsets based on various criteria. Using the same data frame dfobject
:
A B C D
C2 urban 13 6 2
C3 rural 14 7 3
C4 rural 15 8 4
Here, there are three rows which conform to both expressions. The row “C2” was preserved into the subset even though it was not true for the expression B > 13
, however it was true for the other expression C > 5
.
Another interpretation of the logical OR is a union of two or more sets, the result being another set containing all unique elements from the other sets.
In R there is a function union()
that does precisely that thing:
[1] 1 3 5 6 2 4
The resulting union has all the unique values from both lvector
and rvector
, in the order they first appear as unique.
The fuzzy version of the logical OR also implies values in the interval [0, 1]. If a person has a 0.8 membership score in the set A of intelligent students, and 0.3 in the set B of students who study hard, the logical operation is obtained by taking the maximum of those membership scores:
\[A \cup B = max(0.8,\phantom{.} 0.3) = 0.8\]
The result is a membership score in the set of “either A or B”, equal to the maximum between the two membership scores in the component sets. Zadeh (1965, 341) calls it: “the smallest fuzzy set containing both A and B”.
The QCA package has a corresponding function called fuzzyor()
, similar to the built-in function pmax()
(from parallel maxima), with an added feature to automatically detect the negated inputs using the “1-
” notation:
[1] 0.3 0.8 0.6
Similar to the sibling function fuzzyand()
, there is a name
attribute which contains the string expression corresponding to the input vectors:
[1] "~A + B"
Since the package QCA allows negating objects (not only SOP expressions) using a tilde, the same result can be obtained by directly negating the object:
[1] "~A + B"
3.4 Complex operations
The examples presented in the previous section are demonstrative only, referring either to single sets or at most two sets. Most commonly however, expressions can involve more sets with various combinations of all three basic operations of set negation, logical union and logical intersection.
The functions fuzzyand()
and fuzzyor()
themselves, although capable of handling any kind of complex example, are also mainly used for didactic purposes. The package QCA has yet another function which can replace both them, and perform even more complex operations. The final example in the last section can be re-written as:
[1] 0.3 0.8 0.6
In these kinds of string based expressions, the tilde sign “\(\sim\)” can be used to negate a condition, then logical union is signaled with a plus sign “+
”, while the intersections are most of the times signaled with a star sign “*
”, excepting the situations when the expression is multi-value (when the star sign is redundant) or when the set names are taken from a dataset.
The function compute()
is versatile enough to search for the input conditions A and B, either within the list of objects created in the user’s workspace, or within the columns of a dataset specified with the argument data
:
[1] 0.43 0.98 0.58 0.16 0.58 0.95 0.31 0.87 0.12 0.72 0.59 0.98 0.41
[14] 0.98 0.83 0.70 0.91 0.98
The same result would be obtained using a combination of two fuzzyand()
and one fuzzyor()
, but feeding a sum-of-products (SOP) expression to the function compute()
is many orders of magnitude simpler.
Complex expressions can be simplified by applying a few simple Boolean rules:
A · A | = | A |
A · A·B | = | A·B |
A + A·B | = | A |
A + ~A | = | 1 |
A · ∅ | = | ∅ |
In particular, negations are very effective applying DeMorgan’s rules:
~(A + B) | = | ~A · ~B |
~(A · B) | = | ~A + ~B |
Such simplifications are automatically applied by the function simplify()
, for instance on some of Ragin’s examples from his 1987 original book. The first is an intersection between the developmental perspective theory (L~G) and the resulting equation for the outcome E, ethnic political mobilization (page 146):
S1: LW~G
A more complex example shows the subnations that exhibit ethnic political mobilization (E) but not hypothesised by any of the three theories (page 147):
S1: ~SLWG + SL~WG
The simpler expression can be used as input for the function compute()
, with an identical result as the one for the complex expression.