After I published my last post on HTTP API compatibility, a friend pointed out to me that the essay really didn't explain why backwards compatibility matters. Here's an attempt at making up for that.
Why do you need to worry about keeping your systems backwards compatible?
Let's say you've recently become a proud parent of a toddler. You go shopping every week, and before you had a toddler you could remember all the items you needed. But toddlers are very distracting, and between "Why is the sky blue?" and "Can I have a snack? pleeeease?" it's just impossible to remember little things like groceries. After the third time forgetting cheese sticks you build yourself a shopping list app. A friend sees you using it, and thinks it's really cool and wants to use it too! So you deploy it to the Cloud, call it Shopst, and start getting customers.
Shopst has a very simple architecture: just a single web application and a database. Since the application is a monolith, you update it by shutting down the app, uploading the new code, and starting it up again. And most of your users are in a single timezone, so they don't care if you take the app down for a while at night because they're asleep anyway.
Your customers are a friendly lot. They love shopping but hate forgetting things, so your app is a hit.You get a very interesting
request - it seems that a major creative social site, PartyPinner, wants to embed your app in their site, so that people can add party supplies they find exploring everyone else's feed to their shopping lists. You think that's great, so you implement an API for them! You give them an endpoint that returns a users shopping lists, and another endpoint that lets them add to a given list.
GET /lists
[
{
"name": "Party Food",
"items": ["Chips", "Dip", "Salsa"]
}
]
POST /lists/add
{
"listName": "Party Food"
"itemToAdd": "Soda"
}
After a week of slamming your head against Oauth2, you can support clients. PartyPinner is very excited and you start getting more users right away.
Life is still good! You have more reach, but everyone still pretty much goes to sleep when you do so you can stay up a few minutes later than they do on Friday, update the application, and sleep in on Saturday.
Now, you get an email from a customer. "Dear You," they say, since, you haven't given them your real name, "I love using Shopst for shopping lists. I am going on a cruise in a month, and I want a packing list app to plan my trip. Is there any way you could add packing lists, too?"
"That's easy!", you think. "I'll just add a new list type." So, you update your website, and since you're a conscientious coder, you update your API too:
GET /lists
{
"shoppingLists": [
{
"name": "Party Food",
"items": ["Chips", "Dip", "Salsa"]
}
],
"packingLists": [
{
"name": "Cruisin'!",
"items": ["Socks", "Swimsuit", "Snorkel"]
]
}
Another Friday deploy, and you look forward to sleeping in Saturday morning again.
At 7 AM your phone starts buzzing. It's PartyPinner emailing you frantically! They say, "All of our Partiers are trying to plan their Saturday barbecues at the last minute, and they get errors every time they try to find their 'Barbecue Supplies' lists! All they see is an error message that says 'Expected: List at index 0 but found: Object'! What gives?"
"Oh!", you say. "I changed that last night so that people could make packing lists too! You just need to change your code to expect an object from the GET, and populate your list...lists...from the 'shoppingList' property."
"Oh, ok, that seems like a great idea for someone...", they reply. "But our Partiers only care about Shopping! You broke all of them, and now we have to work on Saturday to update our code, and we don't care about trips at all."
So, you promise them that you'll never do that again. You've discovered A Thing here: Changes for new customers should not make existing customers break. You name this Thing API Compatibility.
Shopst continues to be a great success, and a year later people all over the world are using it! You start getting more complaints about your Friday deployments, because Friday night where you are is Saturday morning somewhere and people need to make their shopping lists! So, you decide to put up another server so you can upgrade one server at a time, without anyone having to stop making lists.
Now you have a high availability cluster. This is great, because you can tell the load balancer to direct all traffic to Node 2 while you deploy the code to Node 1, then reverse the traffic and deploy to Node 2. Sweet!
Sometime in the last year, you added a "Store" field to the shopping list object because you thought people might want to make different lists for different stores. Well, no one, and I mean absolutely no one, used it. They just put the store in the list name if they want it. "That was a good experiment", you think, "but I don't want to have a field no one uses. None of my API clients implemented it, so I'll just delete it. And since I have a High-Availability Cluster, I can deploy the change as soon as I finish it without having to stay up late on Friday!"
Having just put your child down for their nap, you decide to make the code change quickly while they're sleeping. It's easy enough - just remove the column from a query and from some model objects and delete the column from the database once you deploy. You also add in a little script that will delete the column once the app deploys just so you don't forget about it.
The child is blessedly still sleeping, so you take down Node 1 and deploy the app to it. You've just put traffic back to the node when, like they knew you were in the middle of something, your child awakens and is Ready. For. Snacks. Off you go to serve up some pretzels, and since this is really the first time you've deployed since you added the second node, you forget that you hadn't upgraded it - until your phone starts buzzing.
It's PartyPinner again - haven't heard from them since the Packing Lists update! "Something seems weird," they say. "About half the time Shopst seems fine...but the other half, we see an error that says 'Error: missing column "Store"'. Any idea what's up with that?"
You smack your forehead because you know exactly what's wrong. The second node with the older code you forgot to upgrade still expects the "Store" column! You quickly upgrade the second node and that fixes everything, and you plan not to do that again. And also to write a deployment script that updates both nodes so you don't forget that either. You name this concept Database Compatibility.
Systems often grow by adding pieces - components - to them. Those components need to be able to interact with each other, and to evolve as new capabilities get added. Understanding how to structure your interactions so that consumers don't break means that you can add features, fix bugs, and deploy all without breaking any clients or incurring any downtime. Which means, once you learn how, you can sleep in Saturdays. At least until your kid wakes up.