In this episode, we talk to Rich Hickey about Clojure.spec.
You can send feedback about the show to firstname.lastname@example.org, or leave a comment here on the blog. Thanks for listening!
EPISODE COVER ART
In this episode, we talk to Rich Hickey about Clojure.spec.
CRAIG: Hello, and welcome to Episode 103 of The Cognicast, a podcast by Cognitect, Inc. about software and the people who create it. I'm your host, Craig Andera.
I want to make you aware of a couple things in the community happenings. The first one is the Minnesota Clojure Users Group. That's going to be happening Wednesday, June 8, 2016, at 7: 00 p.m. at Software for Good. The topic will be Onyx. I'm sure you can find out lots more about that by searching for the Minnesota Clojure User Group.
The other one I want to mention is the Amsterdam Clojure Meet-Up. That's happening the same day, Wednesday, June 8, 2016, at 7: 00 p.m. at Sytac. Again, I'm confident that if you search for Amsterdam Clojure Meet-Up, you will discover the details that you need to know.
That's really it in terms of events that I want to announce. The only other thing I want to mention today is: I was a little curious. We have implemented transcripts, as you might know. Every episode, a few days after it goes up, gets a full transcript of the entire spoken portion of the podcast. I'm just wondering if anybody has found any interesting uses for those. We haven't gotten any feedback one way or the other.
We're definitely keeping the feature. I think it's super useful for people to be able to discover what's in the show. I think there are a bunch of other interesting uses that are possible. But, I'm wondering if anybody out there has found a good use for the transcript.
We've been pretty psyched about this feature. We talked about doing it for a long time and finally implemented it. So, if you have any feedback about that feature, we'd love to hear it. You can send that to us through the usual channels. Either tweet @Cognicast or email email@example.com.
All right, so we will go ahead and go on to Episode 103 of The Cognicast.
[Music: "Thumbs Up (for Rock N' Roll)" by Kill the Noise and Feed Me]
CRAIG: Cool. Yeah. I think that's everything. I'm ready to begin if you are.
CRAIG: Great. Okay, well, welcome, everybody. Today is Wednesday, May 25, 2016, and this is The Cognicast. We're very, very pleased today to welcome back a guest we've had on several times, certainly one of our most popular guests. I'm definitely excited to talk to him, as always. I'm talking about Rich Hickey. Welcome to the show, Rich.
CRAIG: It's great to have you on again. We're pleased to have you back. We do have a question for you at the beginning and at the end, as I'm sure you're aware since I just hit you with this, as I did everyone, when we recorded for Episode 100. The question that we start off with these days is we ask our guests to relate some experience of art, open to whatever interpretation they choose to put on it. I think you're familiar with the question. I wonder if you've got anything to share with us today.
RICH: Yes. I was thinking back to a concert where I got to see the music of Harry Partch played on the instruments that he made himself. He was a microtonal composer who made his own instruments because he wanted different scales. He made some fantastic instruments, big xylophone-like things, and instruments made out of glass bowls and stuff like that. It was quite exciting, musically and to just see the amount of individuality to it. It was cool.
CRAIG: Microtonal music, you said that's alternative scales. It's presumably not the standard, one of the standard tunings. Is that the idea there?
RICH: That's right. That's right, in an effort to find intervals that are more naturally harmonic, equal temperament that we use for music that divides the octave into 12 parts. It's very regular, but it's not very natural, and so composers in the microtonal space pick alternate tunings that allow intervals to be more perfect harmonically. You might divide an octave into 30-something intervals instead of 12, which is what he did.
CRAIG: Obviously you've studied music. But, for someone like me who is used to listening to conventional tunings, does that wind up being, do you think, challenging to listen to? Does it really push your brain outside of its comfort zone in terms of music?
RICH: Yes. Initially, it will sound out of tune. But, that feeling goes away relatively quickly. Obviously Western music is just one kind of music, and there are other tuning systems used throughout the world that don't work that way. But, we're used to what we're used to, so it will sound out of tune, initially. Then it will sound right in a very strange way.
What was particularly cool about this performance was that it was an opera. Not only were these instruments playing all of these varied pitches, but the singers had to learn a completely new set of scales and had to effectively sing out of tune.
RICH: Which is very, very challenging whenever you see either singers or stringed instrument players play microtonal music. They're working very hard to find and be precise about those other intervals.
CRAIG: Yeah, I can imagine. That's an interesting and perhaps tortured segue into the thing that I think we probably want to talk about today. As is always the case on the show, we're happy to talk about whatever the guests think would be interesting. However, we work together at least a little bit, and I kind of have a pretty good idea of where your head has been lately. And so, I strongly suspect that what you would like talked about today would be Clojure.spec. Am I right on that account?
CRAIG: Okay. I'm sure most of our listeners will have seen the blog post and whatever other materials will have surfaced between the time that we're recording this and the time that they hear it. But, maybe if you wouldn't mind, you could kind of take us through what Clojure.spec is. I know you've got a story to tell around this. I'll just hand it to you.
RICH: Yeah. Well, Clojure.spec is fundamentally a library for writing specifications about data and a set of supportive functions for using those specifications in a variety of different ways and some modifications to some of the functions of Clojure to let them tap into specifications or specs when they exist. It's something that's going into Clojure proper. It doesn't really modify the language. In fact, it doesn't modify the language at all, but it's going into Clojure so that it's a facility that everyone can presume is there, will be there by default, and can become sort of a lingua franca of how specification is done across systems.
CRAIG: Yeah. The obvious, perhaps first, question is why? This idea of being able to describe data, that it's available to everyone - great. But, why is it something that everyone should have available to them, in your opinion?
RICH: I would delay that specific question and get back to sort of why. The first thing about why is why you do anything. Hopefully you're doing it to try to solve some problems. The problems that spec is trying to solve are around the fact that, being a dynamic language, we rely pretty heavily on documentation to talk about how our functions work, what arguments they expect, what they return, and things like that.
That documentation has limits. It can be pretty decent in terms of human-to-human communication, but it's very difficult or impossible to have programs tap into it and get any kind of leverage out of that. So, trying to have a way to communicate about how our programs work that is stronger than English text is the first thing.
The second thing is that there's a set of data structure validation things that cross a bunch of things that we do. The most obvious one is my function takes a data structure, or my Web service gets handed data structures, and I need to validate them. That's a problem people have with data structures.
They end up doing a variety of things, including writing validation code by hand. Maybe not as obvious is the fact that, if you look at the Clojure Survey, there's a perennial complaint about error messages from macros and other things where, in fact, people have been doing the same thing, including myself. Macros are just functions of data-to-data, but they have an expectation about the shape of the data, and they have their own built in validation, which is often handwritten and has a varying amount of quality in the kinds of error reporting it does. It's a matter of looking across all of those things and saying, "Are these the same problem?" I think they were.
Another problem I think spec chooses to solve is that we have some really good technology now for doing property-based generative testing. In the Clojure space, that's implemented in test.check, which is a derivative of QuickCheck, which originated in the Haskell world. That kind of property-based testing is definitely the strongest. But, it takes a fair amount of effort to learn how that works to write generators, to learn how to compose generators, to learn how to write properties and things like that. And, if you are going to write a specification, that implies a lot that could be or should be, in my opinion, leverageable by testing. So, if we do bother to spend time writing specifications for functions, maybe we should get generative testing for free out of that. The fundamental idea in terms of the problems it solves is to come up with a machine leverageable way to talk about how things work and then to get as much leverage out of it as we can.
CRAIG: Mm-hmm. Yeah, I like your point about the macro error message ones. Until you said it, it hadn't occurred to me. But, of course it's obvious, in retrospect, that it's a function that takes data and produces another kind of data. Of course, that's exactly what or–maybe I'm presuming–it seems like that's exactly what spec is aimed at, so it makes perfect sense.
CRAIG: Yeah. Okay. Okay, so these are the problems that you wanted to solve. There are some interesting – there's one interesting – but, actually, I happen to have the spec rationale up in front of me. We're kind of talking about that part. Maybe I'm jumping ahead here. Stop me if I am. But, one of the things that comes up first here is map specs should be of key sets only, which is a really interesting thing that I think is quite different from other similar things that I've seen before, at least.
CRAIG: Maybe we'll come to that later. But, if not, I'd love for you to comment on that point because it's one that I found a little surprising when I first encountered it.
RICH: Yeah. I do think it may be. The things that went into spec are long brewing for me. Then I just had enough ideas to say, "Ooh, I think this could be one answer to a bunch of concerns." But actually in a different space, in the space of Web services, I encounter the same kind of problems that this map specification thing is, which is that we combine the specification for an aggregate with the specification for its parts, which leaves us with a lot of rigidity in systems.
You can see this in the Web service API. It becomes this thing, and it's both a set of operations and the specification of the operations. Similarly, map specs traditionally have been: here are the keys, and here's what the keys mean in the shape of the values at the keys.
And, when we do this, we end up with a bunch of things that are not good. One is, our reuse is low because now our definitions of these parts are context dependent, and they're tied to the aggregate. It's as if you were going to define what a tire is only inside defining what a car was. Well, that makes it hard to reuse tires elsewhere as an idea.
That's what we're doing. We do it in software everywhere. We do it in this little case with maps and keys, but we do it in the large when we have source files and we add a comment to a file. To add a comment to a file, that means we change the file, which means we change the library, which is just crazy.
We have to stop doing this. And, in the Web services space and other work, I had been trying to fight this problem. And so, when I looked at this problem of specification, I saw the same thing. Really, to me, it's the same problem. We shouldn't be combining the specification of an aggregate, which should really be saying, "I have these parts."
If you want to describe a car, you say it has a chassis and tires. But, you leave the description of what tires are to an independent description. When you do that, you get more reuse.
This matters tremendously, as our systems become more dynamic, especially in the data space. But, even in the Web services space, you're combining subsets and making intersections of sets all the time. You'll take some data you got from here, and it had X, Y, and Z. You took some other data from there that had A, B, and C. Then you hand the next piece of the program: A and X. If the definitions of those parts are in the aggregates, then every single intersection and union and subset needs its own definition and will re-specify that same stuff again. I think that leads to rigidity in systems, and I think it actually doesn't work well at all in the fully dynamic case when I don't want to know necessarily what's flowing through this part of the system.
I just want to convey it to another part of the system. So, I don't want to take the requirements of somebody down stream from me and make them my requirements. I'd like that to be done sort of automatically. And, when you combine this separation of concerns, key value–let's call that attribute definitions from map specifications–you get some really cool properties.
When you combine that idea with name spaced keys, you get some really cool properties like the ability to check keys even though they're not part of the map spec at this moment. In other words, I said I expect A, B, and C. If you happen to pass me X and there's a spec for X, that will get checked even though I don't care about X. I'm just flowing it through this part of the system.
CRAIG: Yeah. The thing it reminds me of, and I have to imagine this is no accident, is, in Datomic, the idea of schema being defined at the attribute level. Kind of an analogy there between entities having a set of attributes and maps having a set of keys, and the specifications being essentially at the attribute level and not at the overall aggregate level.
RICH: Right. I would definitely point to RDF as prior art in terms of thinking about properties, they called them, independent of aggregates. Property definitions have stood on their own. Now, of course, in RDF, without combining it with RDF schema or something else, the properties are just names. But, the idea was they could be freely applied to different things. If there were going to be semantics, they would be associated with the property itself and not in the context of some aggregate.
It's pretty easy to underestimate how much this is costing us in software development. I think it's basically a catastrophe the way we're approaching aggregates. It's something I would really like to fix.
CRAIG: This is vague, but I'll ask the question anyway. Is there any relationship between the ideas we're talking about and sort of the move that a Clojure programmer tends to make in their mind away from the rigidity of something like Java class with a set of named, very specifically type properties into the more open data types enabled by something like Clojure's maps? You know what I mean? I think of Java; now I think of Java objects as being these little locked up boxes that can only hold certain things, and that's it. That's really limiting compared to something like Clojure where I can say I have a map, and it's much more extensible, more fluid, more open. Is there any analogy to be drawn there?
RICH: Yeah. Yeah, totally. That's part of why spec is the way it is and why spec can be the way it is because, yeah, in Clojure, we tend to just use maps. Then, when you start using a specification system that wants you to define aggregates, well, you start defining things a lot more like classes again. You start having the same problems that you got rid of when you moved from classes to maps, which is that rigidity. When you use maps, you can just assoc new keys into a map. Right?
All maps are open. You can take two maps, and you can merge them. You can take subsets of maps, and you don't need to define a new named entity for every possible combination, every set of keys, I would say, in this language now, which is something you do in Java or languages similar to that. So, it seems important to me that we end up with the specification system that is compatible with this approach to data, which is a dynamic compositional approach, not a named entity approach.
CRAIG: Yeah, an important attribute of that, I think, that you mentioned earlier was the idea that we use namespace keys, which I think is another thing that a lot of people are going to find to be a change, certainly from the way I write programs. I often write programs that use unnamed space keys. Although, having worked with Datomic, the idea of using namespaces on my attributes is pretty familiar, but I haven't really broadly made the transition to doing that in my Clojure programs. But, it seems like the way spec is written points out that that is a good idea and doesn't force you down that road, but really encourages it. Does that seem accurate?
RICH: Yeah. To be clear, it does not force that. spec is designed to be compatible with what people are doing, which, as you said, is largely not namespaced keys. But, it doesn't support that by giving up on that fundamental idea, which is, therefore, we're going to put the definitions in the aggregate. We still don't do that. spec has a system whereby you can say, "I expect a non-namespaced key, but I'm going to tell you to use this namespaced name to find a spec for it." That gives you a bridge. It allows you to keep taking the data that you're taking today and connect it to specifications that are properly named.
You get a lot of the benefits. The only benefit you don't get is the thing I mentioned earlier where even if a key is not present in your current map spec, it could get validated. If the keys are not namespaced, they can't get validated because the word "name" or "ID" doesn't have a universal semantic. It's got to be qualified in some way.
RICH: I think it's a great bridge. But, yes, I'd like to see more namespaced keys. We're adding a little bit of syntax to Clojure to make that even easier, literally just easier so that people do more of it.
CRAIG: I don't know where that work is at right now. Is that something that you'd be ready to describe, even in rough terms, the work in Clojure to make that more syntactically–whatever the word is–easier–I guess is a good word?
RICH: Yeah. It's pretty straightforward. Essentially, you just have prefix before a map that says this is the default namespace for keys in this map. The map looks like it used to before. You just say ":x:y:z" but you could say that's in my namespace. Effectively, one, you say my namespace once, and all your keys are qualified with that. That's both something that you can write and, therefore, the reader can read. It's also an enhancement made to printing. So, if all your keys have a consistent namespace, the printer will lift that to being a qualifier on the map. There is similar support in de-structuring.
CRAIG: I'm reminded of XML default namespaces. Although, given people's sort of emotional reaction to XML, maybe that's not a comfortable analogy.
RICH: Well, you can think whatever you want about XML, but the fact that they thought about namespaces is definitely correct.
CRAIG: Yeah. I totally agree. Anyway, I won't go into – anyway, I was going to go on a rant, but I'll stop.
CRAIG: Sorry. Go ahead.
RICH: The closing tags were the truly terrible idea.
CRAIG: Yeah, my rant was going to be about JSON, but some other time, maybe.
Okay, so yeah. That makes sense to me. Again, maybe I'm jumping ahead, but, as I think about spec and think about the things in there that, when I looked at them, I'm like, "Oh, that's interesting. I need to dig into that more to really understand it." One of the things that jumped out at me was this idea of regular expressions. Of course, the fact that we're not talking about regex string describing Perl-like patterns of characters, but a regular expression, grammar for pulling apart sequences. That's something that seems kind of new-ish, or at least unfamiliar to me. I wonder if you could comment on that.
RICH: Most of the systems that do this kind of thing have those kinds of operators. They'll have star, question mark, and whatnot. But, they won't necessarily be regular expressions. Therefore, they're somewhat ad hoc, and their properties are harder to pin down.
CRAIG: I want to interrupt you because I actually wasn't familiar with the formal definition of regular expression. So, if you could differentiate between that and what people might think of as regular expressions, that'd be helpful - if you see what I'm saying.
RICH: Well, that wasn't what I was saying. Regular expressions, you can go to Wikipedia. It has a very small vocabulary, right? There's concatenation: this thing should be next to that. There's star, which says a repetition of zero or more of these things. There's alternation, which says this or that, this pattern or that pattern. It's not just logical "or." It's alternatives in the pattern. There's empty in the math.
With these things, you can build the other stuff you're used to seeing in string regular expressions like plus and question mark, which is one or more and zero or one, but those two are not primitive. But, those very small sets of primitives can describe all regular grammars. But, they have important properties, and so one of the things that's appealing to me is that, when you take a system like this, and you say, "There are bottom predicates," which we don't expect to change, and then there are really only two kinds of things in the world.
The spec idea about this says there are associative things and there are sequences. Associative things work with that key specification we were talking about before, which has important supportive math behind it, right? The idea of set logic, intersection and union, things like that. We understand how to do that stuff.
Similarly, there are important properties of regular expressions. I think the first benefit you get from limiting a system like spec to things that are very, very simple, I would say, like these two things, is that the resulting compositionality is great. But, the other thing you get, which spec doesn't currently deliver but is part of the idea behind spec, is that we need to get better about talking about whether or not something has changed.
Right now, we have this giant ad hoc system where I don't think I broke you, so I'm going to call something 1.5. I did think I broke you, so now I'm going to call it 2.0. Did I break you or not? I don't know. You can run your code with my new code and tell me I did, and I think I didn't. We have no way to talk about this, mathematically or scientifically, in a robust way.
But, it's quite another thing today, "Well, I required these keys as a set. Now I require fewer keys." Well, that's something that we know is compatible mathematically, right? Similarly, there are operations on regular expressions that allow you to say that this regular expression satisfies or conforms to all the cases of another one.
With these two properties, we're going to be able to talk about whether or not you're allowed to modify a spec or if a potential future version of a specification is compatible with a prior one in a very rigorous way. I think that's gigantic because what I'd like to do is move to a world wherein either your specs are compatible with the past or you pick a new name, and we stop using the same names for different things, which we do all the time in software and it's just created this crisis of versioning and dependencies.
It's all a human problem because we're not thinking precisely enough about what it means to change something, what the granularity is to depend on something. Like I said before: adding a comment to a file changes the library, et cetera, et cetera. But, it didn't. If I was using a function in that file, it didn't affect me. It didn't change what I was doing. So, we need to go to a finer granularity in how we think about dependencies, and we need to become more rigorous about saying, "I'm never going to change something under the same name in an incompatible way."
It's like, well, I know you're an old COM programmer.
RICH: And so, what were the rules with COM? The rules were: if you're going to change it, you add 2 to the end of ex, right?
RICH: Those were good rules. We need to get back there. If we had fine-grained specifications and dependencies, we would not be thinking we were changing things when we weren't. You'd realize that things change a lot less frequently than you think and you'd have a lot better sense of: I have a new library. What do I have to worry about?
Usually it's nothing, especially in Clojure where there are libraries of functions that are substantially pure and independent from one another. But instead, we're living the awful life of Java programmers with the technology for talking about change that's based around things being just a free for all of mutation and spaghetti. As we started using more mathematical languages, we really should start doing versioning, dependencies, and change management in different ways. Something like spec that has these primitive operations and these fundamental mathematical ideas about sets and regular expressions gives us some foundation we can use to say this is different or it's compatible.
CRAIG: Yeah. It's funny. I've actually spent the last three days working at my client. I won't go into any details, but it's a story that's familiar to anybody that's worked in software where the version upgraded; everything broke. The reason why is not entirely clear at this point, but I think the other thing I'm reminded of is we did the 100th episode recently. One of the questions I asked everybody was what's one important thing you've learned. I think one thing that people said that they've learned at Cognitect is how to approach change. There's obviously immutability, but there's also what you're talking about, which I think is not, "let's just keep every version or everything," but let's understand what change really means and when it's okay to make that type of change or what the impact is and when it's nothing.
RICH: Yeah. I think we should do less changing. Actually, I think we should stop changing things in incompatible ways and keeping the names the same.
RICH: We should just stop doing that. That explains why regex.
CRAIG: Right. No, it's super useful. I found it fascinating, and I'm sure our listeners will as well.
Yeah, so one of the things, again, just sort of paging through in my head as I read the rationale, what were the things that jumped out at me. One of the ones that were interesting to me was this idea that de-structuring is kind of a part of the story. I wonder if you could talk about that a little bit.
RICH: Yeah. Let me slot that into context.
RICH: I talked about what problems does it set out to solve, but not very much about what it delivers.
RICH: When I talked earlier about these are the problems that are out there in the world, and it would be nice to sort of write specs once and get a bunch of things. What are those things? Clojure is often about leverage - a word I like to use. You apply a little bit of effort, and you get a lot of leverage from it. You make an abstraction and, if you're able to use it across a lot of data structures, you get a lot of leverage. It's something people appreciate about Clojure. They don't necessarily see it coming when they arrive, and then they're like, "Oh, look. Assoc works on all these things, and the seek works, and all the seek functions work," and so there's leverage everywhere.
We're looking for the same thing here. We're going to write a spec. Then what do we get for doing that because it's effort? Especially, you have to learn about the operators of spec, and then there's a combination of these key set operators, some logical operators, and then the regular expression operators. You'll have to learn a little bit, sort it out, and learn how to read them. But, what do you get?
You get validation, first of all, which you expect. Right? I wrote a specification. I should be able to say, "Does this thing conform to this spec?" Typically, validation is sort of a yes or no. Yes, it does. No, it doesn't.
You should get error reporting. So, if it's not valid, it would be nice if the system automatically would be able to do a good job of telling you what's wrong. That's something that you want to use when you're trying to use spec to specify how your macro works, but it's also something you might want to use at the front door of your Web service when people are sending you data structures and they may be malformed. You need to talk to them about what they got wrong, so there's error reporting.
De-structuring is interesting because what happens is if you're only given something that says yes or no, then you can have quite an elaborate spec. You say, "I expect you to give me a number in this range and then a list of several numbers followed by some other stuff, a map with these things, and then one of these three choices."
** A validator comes and looks at somebody's data and says, "It's good. It's what you said." Then what's the first thing your code needs to do? Well, it needs to find all those parts, right? You talked about the parts in your specification, and maybe some of them were optional. Your code needs to know what options were present and which ones weren't or which branch did they take. You said that you could do this or that or that**: X or Y or Z. Which was it: X or Y or Z? You said you were going to have this map with these keys. How can I talk about them?
Again, when you do this kind of work, you try to find the analogies. There are a lot of analogies to the error reporting. If I'm going to talk to you about what's wrong, I need to talk about the shape of your data and the shape of the specification, quite importantly, and then you got something wrong. But, if you're de-structuring, you really want the same kinds of things. In other words, you went through in validation and you found all the parts, and you decided they were all okay. But if, the specification system made you name every place where there was a branch or a choice.
For maps, they already have names, right? All the keys are names of branches. That's a way to think about it. But, spec makes you say, if you have an or, it makes you label all the possibilities. If you have an alt and a regex, it makes you label those possibilities. In doing so, it means that there's a way to talk about any part of the spec, any individual sub-branch. If you think about the spec as being this tree of possibilities, there's a way to talk about any part.
That's the path system of spec. It's something that's present that again will be unusual to people. Why do I have to label the parts of my or? Why do I have to label the various possible branching points in a regular expression?
But, what you get for that is, A, the ability to talk about parts of your spec, which is used during error reporting. This part, you didn't match this part. You didn't satisfy this sub-spec. But, it's also used by de-structuring because the same labels you use in your spec are used to label the data in what's called conforming.
In spec, there is valid question mark, just a validator that's Boolean. But, the more interesting function is conform, which takes the data and a spec, and gives you a de-structured, labeled version, if you will, of the data where every branching point that was detected, the result of it was labeled. If you had some arbitrary complex predicate that you needed to satisfy and it was one of three–X or Y or Z–and it matched Y, there'll be a key there that says Y is what they supplied. Your code won't need to figure that out again because the spec did that, de-structured it, and labeled it.
There's a path system in spec. It's part of the design. Everywhere there's a branch, there's a label. Therefore, spec can do de-structuring, and everything talks the same language. If you're going to get an error report, it's going to relate your problem to the path. If you're going to get de-structured data, it's going to label the data with names that correspond to the path.
We can do other cool things with those paths. For instance, as I get down the list of things that spec can do, one of the things it can do is generate data. Sometimes you want a certain kind of generator for testing, which generates nasty, random, everything data. But, other times you want a generator that generates pretty data that you can use as an example.
How do you talk about substituting a generator? Well, it's easy to talk about substituting the root generator for the whole object, but it's also very interesting to talk about, in this sub-tree, I'd like to generate the data differently in this context. Because we have paths, we can talk about that sub-tree in a spec. This is just a radical difference, I think, in spec. I think that this generator, override labeling is just the tip of the iceberg in terms of being able to talk about parts of our data structures because, when you think about it, if you combine the fact that specs are namespace names, they're registered under namespaced names, it means that any part of a spec has a global address: the namespace plus the path to that part.
That is cool. That means that you could build a Web service that had additional explanations for the kinds of errors you might get in this, you know, trying to conform to this part of a spec. People could call that and know how to talk to it because there are ways to talk about the parts. That's how de-structuring works, and that's why it's there.
The other thing that you get from having defined a spec is what we call instrumentation. This is where being in a dynamic language has some real advantages. The things you can check for in specs are pretty arbitrary. It's not just, "Is this an integer?" or things like that, or structural things. You can have predicates that say this argument has this relationship to this other argument.
A simple case would be: I want to write a "generate a random number in a range" function, and it should take a start and end part for the range and the end should be greater than the start. That's the kind of thing that's hard to do maybe traditionally with types because it's not a type of either part. It's a predicate across the two parts.
Similarly, you can say, of a function overall, that the results should have this relationship to the input. This is the kind of thing that you would use in a generative property test. But, the cool thing is that if you've defined these specs, you can, while you're working in the REPL, turn them on as a run-time check around an individual function that's been specified or an entire library, or everything that has specs. Then proceed doing your work in the REPL and getting all these maybe expensive tests to run for everything you do.
This instrumentation is not something that you'd leave on in production, but you can easily work with instrumentation while you're doing interactive development. You can also instrument libraries so when you're doing bigger tests, so maybe you have system level tests that walk your big system through big workflows. You could turn on instrumentation during that testing and get those specifications validated while you're running your ordinary processing during testing. That's dynamic instrumentation.
The next thing you get from writing a spec is you get test data generation. If you said, "I expect to have inputs of this shape," you can just make a call and say, "Generate me something in that shape," and you'll get data that you can look at. One of the interesting things that spec has is something called "exercise," which, you give it a spec, and it will generate a bunch of instances of that spec, run them through conform, and give you pairs of: I generated this data, I conformed it, and here's what it looks like when it's conformed. You can use that for testing or just to validate your presumptions. It's very interactive.
Then the final sort of big kahuna of spec is the fact that it will generate generative tests for you. If you've spec'd functions, you will get property-based generative testing of those functions. spec has test my function or test my namespace, and those tests are test.check generative tests, so it will generate 100 different inputs that correspond to your specification, run your function, validate your returns against that specification, and then do this fancy, final spec, which is the function spec, we call it, which can compare or utilize the input and the output to a function and test any predicates you want across the two.
That's very powerful. The generative tests support higher order functions, so spec can generate functions for you. If you have a function that returns a function, spec will recursively go and generatively test the function you've returned to see that it matches its specs. Remember, these specs are not just, you know, takes init, returns init. They can be arbitrary predicates of both the input, the output, different arguments to the input, and the relationship between the input and the output.
I think that's just going to be the way you want to do unit testing. You just will not write unit tests the way you have before because, as we know, test.check and property-based testing, derived from QuickCheck, is just great at finding bugs. It writes tests you never would. It tries data you would never try. It can do hundreds and hundreds of tests when you would be tired after writing three tests. And, if it detects a problem, it will shrink your input to be the smallest failing case, so this is the kind of testing that I like, the testing you don't have to write, that does the best job of detecting bugs.
CRAIG: Yeah. That's really cool stuff. You actually dropped one little gem that Tim Ewald had mentioned to me the other day, and I had a little bit of a Keanu Reeves "Whoa!" moment when I heard. You just said it, which was you can spec a function that returns a function, and you'll get generative testing against the returned function if you spec out what it's supposed to do, which is just super meta and very, very cool, in my opinion. It's one thing among many, but I just found that to be, for some reason, it just kind of got me. It kind of gripped me as a neat thing.
RICH: Yeah. Well, the other thing is that, for that function, spec could make that function for you.
RICH: Obviously it's not going to be a very interesting function, but if you think about trying to test, generatively, functions that take functions, well, I mean you're trying to make it automatic. So, where are they supposed to come from? Typically, if we were going to test a function that takes a function, you would have to supply one and name it. The fact that spec can write functions matters because it means that that way it can automatically test functions that take functions without you having to say, well, if you need a function that takes and returns init, use increments or something. You don't need to say that.
CRAIG: Yeah. It's another type of data that you need to come up with somehow because we work with that form. Maybe data is the wrong word, but that is a thing that we give to our functions all the time.
RICH: Yes, exactly. Right.
CRAIG: Yeah. Yeah, it's so cool. Well, Rich, one of the reasons I always enjoy hearing you talk about this stuff is because you have a wonderful ability to kind of tell the whole story. I could keep sort of casting around in my mind trying to think, "Oh, yeah, what else did I see in the rationale?" but I know for a fact that you have a pretty good picture of what's in spec, so I'll just ask you straight up. Is there anything else in spec that we haven't at least touched on today?
RICH: Yeah. I guess the one other thing, and so we're into sort of gravy features.
RICH: Which is sort of fine.
CRAIG: We like gravy.
RICH: Yeah, we like gravy. Let me just see what we've got. Yeah, so we didn't talk much about the map checking. Maybe it's obvious, right? Map checking checks the keys for the presence in the key set and then independently checks that the values of the keys match the specs, the independent specs for the attributes.
I think one of the other cool things that spec can do, which is a recent addition, is a common problem people have is they have data whose types are self-describing. This is quite typical when you're getting stuff over wires. There's some field in the structure, which says my type is X. There's a type tag or a type attribute.
The challenge you have is that it's the value of that attribute that determines what spec should really be used. If the type tag says I'm an employee information record, well, that should get one kind of spec run against it. If the type tag says I'm a product descriptor, that should have a completely different kind of spec run against it.
The challenge is obviously spec has "or," but "or" is really not a great tool for open, large sets. You don't want to build something that says, "If it's an employee, do this," or, "If it's a product, do that," or, "If it's a truck, do that," any more than you want giant switch statements. This is why we invented polymorphism.
How do we get polymorphism in specifications? The answer for spec is something called multi-spec, which allows you to connect a spec or to make a spec, which dynamically determines the spec by calling a Clojure multi-method. This directly allows spec to connect to an already existing, good, data-driven dispatch, polymorphic dispatch system that has always been in Clojure, pretty much, which is multi-methods.
Now, instead of having a big "or" in your spec, which would be this mega spec that knew about everything, you have a multi-spec that says, "This multi-method will tell you what spec to use," and it will look at that type tag, tag field, or whatever and find the spec. This is open so, if on the first day there's only one method for this multi-method, then the spec will only check for one thing. But, if tomorrow there are two things, it will check for two things.
Similarly, it will also generate in an open way. So, if on the first day you've only defined two methods, it will define two kinds of things. It will generate two kinds of things. But, if as your system grows you now have 100, it will generate one of 100 kinds of things. I think this is a good dynamic system for solving the tag data problem. I think that's another sort of cool feature.
CRAIG: I agree. Actually, we've kind of been showing spec to our clients on the project I'm on, and that was one of the first things they asked was, "Well, how would you handle this case?" Multi-spec was exactly what they were asking about, really, and so clearly it's something that people are asking for in this space.
But, it does raise a point, which you address in the rationale, I think, which is that this isn't really spec, that is to say, isn't really a data format. It's not always something you can put in an end file. Am I expressing that correctly?
RICH: I think that's somewhat controversial about what makes something a data driven thing. spec certainly is data driven. Some of its input data is code, which is data. It does treat that code as data, and it does remember that code as data. It can regurgitate that code as data.
There are a couple of functions in spec that allow specs to describe themselves, and they can describe themselves in a way that's sort of for a human to read in a documentation string, but they can also describe themselves in a very precise way with all of the namespacing and whatnot necessary so that you could reevaluate the data and get a working spec.
RICH: The question is: Is that data? Is a bunch of lists nested data? Of course it is. That's why lists work, and that's how read works. I think it's not maps that doesn't make it not data.
One of the beautiful things about spec, and I think we will get there pretty soon so that this is clearer to people, is if you have specs for the "describe" output of spec, which basically looks like code but is data, if you have specs for those things, which we have some we haven't yet published, you're only a conform call away from seeing that same data in a mappy kind of way with keys and values. I think that people need to start thinking that way that specifications give them the ability, without writing parser or having to understand a lot of stuff, to do data representational transformations that would give them what they want without spec having to be different.
I think the flipside is many people try to design DSLs in this space where it's all maps, but there are tons of semantics on the maps. I don't know if you've ever tried to look at or read some of the systems that encode functions in JSON, like AWS does sometimes.
RICH: Code is data, but not all data is code. Sometimes when you want something that really has the abilities and capabilities of code, and the write ability of code, and the readability of code, for people who are the primary authors of specifications, code, especially as represented in a highly readable Lisp like Clojure, is the best possible DSL format. That's the opinion of spec. Yes, we can get it into a shape you might prefer to program with, but it's no less data from the get go than any other format.
CRAIG: Gotcha. It makes sense. Thanks. Well, Rich, you, maybe even more than most of our guests, are permitted to talk here for as long as you would care to. I could pretty much guarantee our listeners are sitting – some of them are sitting in their driveways, even as we are virtually speaking this in their ears. That said, I don't want to keep you forever, but I do want to give you a chance to hit any other points, if any, about spec or, for that matter, about anything else you'd like to share with our listeners today.
RICH: I think we're pretty good. People should go and read the documentation and whatnot because I don't think we've fully described how you would use spec or some of the other things that it does. I guess the two points I would want to make sure were clear were in the other area of what I called spec was how does it integrate with Clojure. The two big areas are that if you write specs, they will appear in doc strings when you ask for documentation.
The other is that if you've written a spec for a macro, macroexpand will automatically use it to validate the user's input. Even if you've written your macro the old fashioned way and you've got a handwritten checker that's maybe not so great with error reporting that's not so great, if you write a spec for it, you do not need to touch your macro code at all. That spec will get run by macroexpand, and error reporting will be that of spec if people had an error in the way they called your macro.
RICH: That's an important feature of spec that the specs are independent of the code they may describe. They can be added later. They don't require you to change what you were doing before, but they still give you leverage.
CRAIG: Good stuff. This is all coming, as far as the Clojure integration, in 1.9, which you and the rest of the team are hard at work on even as we speak.
RICH: Yeah. The alpha is out already, and people are using spec a lot, quite frankly, in the first two days. I think it addresses a set of needs across the board, so people are doing all kinds of interesting things already.
CRAIG: Yeah. I know we were pretty excited. I know that there was a lot of excitement internally. I got to see you and the rest of the team kind of working on the weekend to get it out the door, and then Twitter.
I guess I'll ask you. How did the launch go, from your perspective? It seemed like it went really well from where I was sitting?
RICH: Yeah, it was great. It was great. Like I said, I do think people are hungry for this. It solves a lot of independent problems. I think everybody has their part of it that they're seeing and their needs that they're most interested in. There is a lot to it they'll discover over time. Yeah, I thought it's been great. Some of the feedback has been great, and it's already been integrated. We've knocked out some bugs, and we've added some features based upon inputs, so I'm looking forward to making it really great.
CRAIG: Cool. Well, that seems like a great place to wrap it up. But, of course, I do have one more question for you. The question we end every show with has to do with advice. We always ask our guest to share with our listeners a piece of advice, whether advice they've received or advice they like to give, just something in the way of advice. I wonder if you had a chance to think of anything to share with us today, Rich.
RICH: I'll keep it on topic. I would say my advice will be about spec, and it will be to make sure that you consider spec to be a suite of small, composable tools. Like I said before, I'm not sure it would be evident to anyone that getting some data that's in the shape of code into something that's more map like is just a matter of calling conform on it with the spec. You need to look at the pieces of spec as very small utilities you can bring to bear in a variety of different circumstances.
** For instance, we're thinking about what's the test suite for spec going to look like, and how do we test these regexes or whatnot? You just want to pick up spec for jobs like that. You can make specs for spec itself. You could run a generator on that and get specs to be written. You could then run a generator on that to get data that conformed to the specs. Then you could run conform on that to make sure it was working and, therefore, check spec with spec with two generating steps in the middle**: one that makes specs, which are data, obviously, because you can generate them; one that takes those specs and generates data. It's just one example of the kind of tool change you might be able to put together.
** I would say**: Maybe you're thinking about testing. Maybe you're thinking about macros. Maybe you're thinking about your DSL. All those things are good. But, spec is fundamentally simple, and it has a bunch of small, reusable parts. My advice would be to encourage people to be creative and open-minded about when they might apply it because it has more applications than you might imagine.
CRAIG: Well, Rich, unsurprisingly, you have once again offered us amazing advice, although I can almost guarantee that I will benefit even more from it the second or third time that I listen to it because some of what you said is still sinking in, but I can already kind of see the edges of the utility. Anyway, thanks a ton for that and for taking the time to come on today and chat with us about spec.
I think the docs are great. I've read them all, and I think it's actually quite clear and easy to understand. I, for one, always kind of enjoy getting to chat back and forth about these things, and hopefully our listeners thought that was helpful too. Thanks a ton for coming on and talking to us today.
RICH: Yeah. Well, thanks for having me. I also want to give a shout out to Stu and Alex Miller, who helped with this and, in particular, the guide Alex Miller wrote. He gets a lot of credit for helping with the storytelling there.
RICH: Thanks for having me.
CRAIG: Oh, absolutely. All right, well, then we will say good-bye there. This has been The Cognicast.
[Music: "Thumbs Up (for Rock N' Roll)" by Kill the Noise and Feed Me]
CRAIG: You have been listening to The Cognicast. The Cognicast is a production of Cognitect, Inc. Cognitect are the makers of Datomic, and we provide consulting services around it, Clojure, and a host of other technologies to businesses ranging from the smallest startups to the Fortune 50. You can find us on the Web at cognitect.com and on Twitter, @Cognitect. You can subscribe to The Cognicast, listen to past episodes, and view cover art, show notes, and episode transcripts at our home on the Web, cognitect.com/cognicast. You can contact the show by tweeting @Cognicast or by emailing us at firstname.lastname@example.org.
Our guest today was Rich Hickey, on Twitter @RichHickey. Episode cover art is by Michael Parenteau, audio production by Russ Olsen and Daemian Mack. The Cognicast is produced by Kim Foster. Our theme music is Thumbs Up (for Rock N' Roll) by Kill the Noise with Feed Me. I'm your host, Craig Andera. Thanks for listening.