Ralph Waldo Emerson famously observed:
A foolish consistency is the hobgoblin of little minds.
Given that, I think that Mr. Emerson would be happy to know that "consistency"—the word—is not consistent.
There are two separate technical meanings of "consistency," two meanings that technical people tend to conflate.
The older definition of the word comes from database researchers, especially including Jim Gray. He gave us locking, transactions, distributed transactions, consistency, and the ever-popular "ACID," all back when database was still spelled as "data base." I think we can regard him as authoritative.
In this definition, "consistency" is defined as a boolean function—a predicate—that takes the whole database as an input. When it returns "true", then the database is in a good state. To keep it that way, the server checks every transaction to see if it would make the predicate return false. If so, the server must reject that transaction.
We can regard this definition as having an "interior" perspective. That is, only the database server itself is able to apply the consistency predicate. Most databases do not have any way for an observer to get the complete value of the database at a point in time.
The idea of a consistency predicate also clarifies the relationship between consistency, atomicity, and isolation. The database server is temporarily permitted to break the consistency predicate while applying a transaction. (As an aside, I think that paper anticipates Datomic by 31 years.) However, it does not allow any outside observer to perceive the intermediate, unacceptable state. To external observers, the change appears atomic. Transactions do not commute, so the server also ensures that the effects of the transactions appear to obey some partial ordering. (Please note the very careful wording on that last sentence.) That's isolation.
Computationally, it is impractical to apply a general predicate over the entire database at the end of every transaction. Implementations use weaker forms of enforcement, typically syntactic constraints (i.e., CHAR(20) NOT NULL) and referential integrity. This moves from the notion of a consistency predicate to a lesser guarantee.
As an aside, using this weaker form of the predicate is one reason why we get data cleansing problems like negative ages and Little Bobby Tables. It's because we can't make a big table that has every possible human name, so we apply a weaker rule that says "any sequence of alphanumeric characters could be a name."
There is a completely separate definition of "consistency" that comes from the distributed systems world. This definition of consistency has more to do with applying the same transactions in the same order at every node in a system. This is the meaning of "consistency" which is used in the CAP Theorem. In that paper, consistency means, "there must exist a total order on all operations such that each operation looks as if it were completed at a single instant." This definition of consistency is about the history of operations and their causality.
That kind of consistency—strict linearizability—is a very high bar. There are many ways to weaken it and exploit "loopholes". CRDTs also avoid the need for linearizability by requiring all operations to be commutative. With CRDTs, as long as every node has a permutation of the ordering of operations, they will still report the same resulting value.
One form of the word "consistency" talks about the state of the system at a given point in time. The other talks about the history of operations that produced the current state. Are they fully substitutable? No.
For instance, suppose the global consistency predicate
says, "The atomic object V holds an integer." There are two
operations on V: inc and dec which increment and decrement its
value by one. The starting value is 0 and the representation can
grow arbitrarily. Given that definition, we can have "consistency
as predicate" without "consistency as history." That's because we
can reorder those operations at will and still have the same value
at the end. That means our predicate holds no matter what the
history says. Two nodes that see a different history of operations
are inconsistent by the second definition, but consistent by the
first.
So it turns out that "consistency (predicate)" and "consistency (history)" are two distinct ideas that happen to share a word. It is always an error to substitute the distributed systems definition of "consistency" for the C in ACID.
The C in CAP is not the C in ACID.