Microservices are less efficient, but are still more scalable.
Servers can only get so big. If your monolith needs more resources than a single server can provide, then you can chop it up into microservices and each microservice can get its own beefy server. Then you can put a load balancer in front of a microservice and run it on N beefy servers.
But this only matters at Facebook scale. I think most devs would be shocked at how much a single beefy server running efficient code can do.
You know, I don't really think microservices are fundamentally more scalable. Rather, they expose scaling issues more readily.
When you have a giant monolith with the "load the world" endpoint, it can be tricky to pinpoint the the "load the world" endpoint (or, as is often the case, endpoint*s*) is what's causing issues. Instead, everyone just tends to think of it as "the x app having problems."
When you bust the monolith into the x/y/z, and x and z got the "load the world" endpoints, that starts the fires of "x is constantly killing things and it's only doing this one thing. How do we do that better?"
That allows you to better prioritize fixing those scaling problems.
It sounds like creating problem, then spending time=money on fixing it and calling it a win?
There is a point when it all starts to make sense. But that point is when you go into billions worth business, hundreds of devs etc. And going there has large cost, especially for small/medium systems. And that cost is not one off - it's a day-to-day cost of introducing changes. It's orders of magnitude chaper and faster (head cound wise) to do changes in ie. single versioned monorepo where everything is deployed at once, as single working, tested, migrated version than doing progressive releases for each piece keeping it all backward compatible at micro level. Again - it does make sense at scale (hundreds of devs kind of scale), but saying your 5 devs team moves faster because they can work on 120 micoservices independently is complete nonsense.
In other words micoservices make sense when you don't really have other options, you have to do it, it's not good start-with default at all; and frankly Sam Newman says it in "Building Microservices" and so do people who know what they're talking about. For some reason juniors want to start there and look at anything non-microservice as legacy.
> It sounds like creating problem, then spending time=money on fixing it and calling it a win?
It sort of is.
It's not a perfect world. One issue with monoliths is that, organizations like to take a "if it ain't broke, don't fix it" attitude towards things. Unfortunately, that leads to spotty service and sometimes expensive deployments. Those aren't always seen as being "broken" but just temporary problems that if you pull enough all nighters you can get through regularly.
It takes a skilled dev to be able to really sell a business on improving monoliths with rework/rewrites of old apis. Even if it saves money, time, all nighters. It's simply hard for a manager to see those improvements as being worth it over the status quo. Especially if the fact of running the monolith on big iron masks the resource usage/outages caused by those APIs.
How so? If functionality A is critical to functionality B, how will wrapping it in an HTTP call (microservices) reduce the damage from breaking functionality A?
I can see an advantage regarding resource hogging, but the flip side is the extra point of failure of network calls in microservices.
Not saying which is better, but deployment is orthogonal to logical dependence and correctness.
Most features in a modern app are not critical functionality, though.
For instance, in a shopping site, why should a crash in the recommendations engine result in a non-functional webpage (rather than a working purchase page with no recommendations)?
Personally I think microservices start to make sense when you have several hundred developers (an environment I'm currently keen to never enter again - $work has 5 devs and might one day have 6).
That makes sense. I am still trying to understand why does it differ between a monolith and microservices. The app in the monolith can make the call to non-critical functionalities time-limited and fault-tolerant, just like a network call has a time-out and can return nothing (in a simplified manner, it can wrap that call with a timer and an exception handler).
I agree that microservices are suitable for large organization, where the organization practically has multiple products (which could be purchased from a vendor or sold to another company).
If your monolithic service OOMs, hits a large GC pause causing dependent requests to time out, locks a shared file descriptor, or a bunch of other things then the monolithic service as a whole can hit a fault or stall even if other threads/tasks are still executing. While classes of errors like OOMs go away when multiple processes are executing.
A monolith can also scale vertically with mechanisms to redeploy on fatal errors. If all starts failing, you may have a problem. But you can get the same problems with a microservices that is in the critical path
Networks could have unexpected delays, routing errors and other glitches. At least with a monolith you can often find a stacktrace for debugging. I have seen startups that have limited traceability and logging when using micro services.
When a small startup has to manage "scalable" K8s infrastructure in the cloud, distributed tracing and monitoring is often not prioritized when you are a team of 5 developers trying to find a product market fit.
I am not against microservices (I work with them daily) but you just trade one type of stability problem with another
Right I'm not advocating for one over the other, I was just explaining issues solved by microservices. Now instead of the OOM Killer taking your service down, you have a flaky NIC on another microservice box and now you need to figure out how to gracefully degrade.
I love working with microservices at the scale of $WORK, but we're Big Tech. I can't imagine why a 5 person startup would want k8s and microservices. You don't need that scale until you have more than 2 teams, and you're pushing at the very least 15 engineers at that point and usually the sales and marketing staff to make that investment worth it.
I don't think it was well expressed, but to reuse my last example: OOM-killer ending the recommendations process mid-request is less of a big deal if the main store server can keep running and serving traffic.
If the recommendations team write code that causes the OOM-killer to end their process, making them run it on separate infrastructure insulates your "main store team" from the bugs they write.
It was about the OOM killer as the sibling comment says, yeah. I'm surprised you're so incredulous. OOM Killer and GC stalls are some things I've run up against in my career frequently. I'm sorry my comment didn't live up to your expectations, it was hastily typed on mobile.
His point was that the comment was unclear if you'd also read it hastily :-)
I imagine his logic was something like: "How can OOMs happen less often if you run more processes (possibly on the same machine)?", while your comment actually wants to say: "if a specific service is affected by an OOM, with microservices only that specific microservice goes down, since it's probably running on its own hardware".
Resource hogging is a huge class of errors, though. Everything from a bad client update DDoSing a feature, file handles, memory leaks, log storage (a little outdated now perhaps), and so many more...
Sure depends on the architecture.
When the auth service is down, everything else should be down.
But when the "optional feature" service is down, a core component should be unaffected by that.
Split up services where it makes sense and don't over do it.
That's how I design my projects.
When the auth service is down, only new authentications should fail. Existing auth sessions should continue to function just fine. This exact failure mode occured at Google in early 2021 which caused a fairly big outage but not as big as it could have been because of this design choice.
> I don't really think microservices are fundamentally more scalable
It depends on what you are scaling. I think microservices are fundamentally more scalable for deployment, since changes can be rolled out only to the services that changed, rather than everywhere. Unless your language and runtime support hot-loading individual modules at runtime.
I disagree, in my opinion micro-services hinder scalability of deployment, and development - at least the way I see most businesses use them. Typically they break out their code into disparate repositories, so now instead of one deployment you have to run 70 different ci/cds pipelines to get 70 microservices deployed, and repo A has no idea that repo B made breaking changes to their API. Or lib B pulled in lib D that now pollutes the class-path of lib A, who has a dependency on lib B. Often you need to mass deploy all of your microservices to resolve a critical vulnerability (think log4shell)
The solution to this is to use the right tool, a build system that supports monorepos like Bazel. Bazel solves this problem wonderfully. It only builds / tests / containerizes / deploys (rules_k8s, rules_docker) what needs to be rebuilt, retested, recontainerized, and redeployed. Builds are much faster, developers have God like visibility to all of an organizations' code, and can easily grep the entire code base and be assured their changes do not break other modules if Bazel test //... passes. It is language agnostic so you can implement your services in whatever language best suits it. It allows you to more easily manage transitive dependencies, manage versions globally across your org's codebase.
Of course Bazel has a steep learning curve so it will be years before it is adopted as widely as Maven, Gradle etc. But in the banks I've worked at it would've saved them tens of millions of dollars.
Also git would need to catch up to large codebases. I think Meta released a new source control tool recently that is similar to git but could handle large monorepos.
Man, I wish my colleagues would read your comment (and at least question their believes for one brief moment)…
> I disagree, in my opinion micro-services hinder scalability of deployment
…and of anything related to testing:
- Want to fire up the application in your pipeline to run E2E tests? Congratulations, you must now spin up the entire landscape of microservices in k8s. First, however, you need to figure out which versions of all those microservices you want to test against in the first place, since every service is living in a separate repository and thus getting versioned separately.
- Want to provide test data to your application before running your tests? Well, you're looking at 100 stateful services – good luck with getting the state right everywhere.
Breaking APIs do not happen often at least in my projects.
We mostly add new Endpoints for new features (non breaking change).
We keep object definitions in a separate repo to not duplicate them and generate a maven dependence from that.
Pretty simple actually.
We mainly run on Java thought.
Would be more complicated when there are multiple languages.
Protobuf or Json schema could come to the rescue if needed.
If you have many app servers and they all run copies of the same app you can roll out new versions to a few servers at a time. You just have to handle the db update first but you need to do that with microservices anyway (they might use smaller databases and therefore making it somewhat easier).
I'm talking about the scalability of actually delivering new code to servers (or "serverless" runtimes). Feature flags don't help with that.
Admittedly, it isn't a scalability problem you will run into right away.
But when you need to roll out an emergency fix, there is a big difference between deploying to thousands of servers that all have everything, and ten servers running a single service.
That’s a really interesting point - something that could probably be addressed by module-level logging and metrics. That said, even as a pro-monolith advocate, I can see why it’s preferable to not allow any one module/service to consume all the resources for your service in the first place. The service boundaries in microservice architectures can help enforce resource limits that otherwise go unchecked in a larger application.
It's one I've ran into a few times in my company (which has a large number of these types of endpoints).
The silly thing is that something like the JVM (which we use) really wants to host monoliths. It's really the most efficient way to use resources if you can swing it. The problem is when you give the JVM 512GB of ram, it hides the fact that you have a module needlessly loading up 64gb of ram for a few seconds. Something that only comes to a head when that module is ran concurrently.
> Servers can only get so big. If your monolith needs more resources than a single server can provide, then you can chop it up into microservices
200 threads, 12TB of RAM, with a pipe upwards of 200GB/s. This isn't even as big as you can go, this is a reasonable off the shelf item. If your service doesn't need more than this, maybe don't break it up. :)
I believe that this level of service can no longer accurately be described as "micro".
This cannot be emphasized enough. The top of the line configuration you can get today is a dual EPYC 9654. That nets you 384 logical cores, up to 12 TB of RAM, and enough PCIe lanes to install ~40 NVMe drives, which yields 320 TB of storage if we assume 8 TB per drive. All for a price that is comparable to a FAANG engineer's salary.
And you're missing one little thing - if you get all of your processing on two of those, you save way more than 4 engineers salaries in development/maintenance costs.
Let alone - AWS Lets you get that machine for less than a junior engineer salary ($7.5 per hour, roughly equivalent to $32 of hourly wage = $67k)
How about 2 erlang OTP nodes (homogenous)? I don't have a real world use case of that load but I often imagine I would have 1:2 RAM ratio to be able to vertical scale each time. For example, start with 1TB(A):2TB(B), if that's not enough, scale A to 4TB. When load climbs start to exceeds, scale B to 8TB .. so on alternately.
Helps to have a language that natively uses more CPU cores and/or training for the devs.
Ruby, Python, PHP and Node.js startups have to figure out how to use the cores while C++, Rust, Erlang and Go have no issues running a single process that maxes out all 64 CPU cores.
This is exactly what I do. When it comes to your regular backend business server I write stateful multithreaded monolith in C++ sitting on the same computer as the database hosted on some multicore server with gobbles of RAM (those are cheap now). Performance is insane and is enough to serve any normal business for years or decades to come.
So it does not work for FAANG companies but what do I care. I have the rest of the world to play with ;)
Even for non-FAANG, less-than-a-million-user business applications, there are two problems:
1. Your feature/function scope is not all fully defined at the start and is not static till the end of life. Software has to evolve with business. In this case, it is easier to build a loosely coupled shared nothing architecture that can scale easily than to build a shared-everything-all-in-one-binary monolith architecture.
2. Your customer base isn't one size fits all. You usually different growing businesses that need solution at different scale points but still with very high unit economics. This means you need a incremental scaling solution – this is where old-school big-chassis systems build blade scalable server architectures. But because of custom/proprietary backplane design they become unmanageably complex and buggy.
Instead, if you build an architecture that can scale the number of corporate users by adding cheap $2k pizza box 1u servers as the company grows, that's much more attractive. Also, you can keep your systems design flexible enough to recompile and take advantage of advancements in hardware tech every 18 months – this gives you better operating margins as your own business starts to grow.
>So it does not work for FAANG companies but what do I care. I have the rest of the world to play with ;)
As long as hype chasers in middle management don't get in the way after convincing themselves they too must be like FAANG with a few orders of magnitude less of a consumer base.
the middle management especially half cooked engineers who drank the cool aid and became managers are hard to reason with.
They want to be both the architect and the manager and anything you say would be over ruled and since they are the boss its hard to ignore them.
This service is a monolith because it has 10K code and it needs to be broken up.The product is at MVP and its rock solid on Java Spring and it hardly crashes.
We are never going to lose data based on the design choices we made.
None of that matters.
We need zero down time upgrade, when we had zero customers.
I don't think you actually understand what microservices are. You don't put a load balancer to load balance between different services. A load balancer balances trafic between servers of the same service or monolith.
Microservices mean the servers of different services run different code. A load balancer only works together with servers running the same code.
>A load balancer only works together with servers running the same code.
Uh - what?
>A load balancer balances traffic between servers
Correct.
> of the same service or mononlith
Incorrect.
Load balancers used to work solely at Layer 4, in which case you’d be correct that any 80/443 traffic would be farmed across servers that would necessarily need to run the same code base.
But modern load balancers (NGINX et al) especially in a service mesh / Kubernetes context, balance load at endpoint level. A high traffic endpoint might have dozens of pods running on dozens of K8s hosts having traffic routes to them by their ingress controller. A low traffic endpoint’s pod might only be running on few or one (potentially different) hosts.
The load balancer internally makes these decisions.
Each set of upstream hosts in nginx is a single instance of load balancing. You aren't load balancing across services, you're splitting traffic by service and then load balancing across instances of that service.
The split is inessential. You can just as easily have homogeneous backends & one big load balancing pool. Instances within that pool can even have affinity for or ownership of particular records! The ability to load balance across nodes is not, as you claimed, a particular advantage of microservices.
I don't really think of route based "load balancing" as load balancing. That's routing, or a reverse proxy. Not load balancing. Load balancing is a very specific type of reverse proxy.
The point is, if a client makes a request to a server, the response should always be the same, no matter where the load balancer sends the request to. Which means it should run the same code.
Nginx doesn't even mention route based or endpoint based load balancing in their docs. Maybe they don't consider it load balancing either.
Friend, you don’t know what you’re talking about and if linking NGINX documentation literally describing load balancing algorithms applied across Kubernetes pods hosting endpoints doesn’t clear things up for you, I don’t think anything will.
Yeah, it's bin packing, not straight efficiency. Also, people seem to exaggerate latency for RPC calls. I often get the feeling that people who make these latency criticisms have been burned by some nominal "microservices" architecture in which an API call is made in a hot loop or something.
Network latency is real, and some systems really do run on a tight latency budget, but most sane architectures will just do a couple of calls to a database (same performance as monolith) and maybe a transitive service call or something.
Couple of calls is normal. But if you make everything a microservice. And there are dependencies between them, then by design its destined that some hotter loop eventually will contain rpc.
> Microservices are less efficient, but are still more scalable.
Not at all. You can run your monolith on multiple nodes just as you can do with microservices. If anything, it scales much better as you reduce network interactions.
It's also way easier to design for. The web is an embarrassingly parallel problem once you remove the state. That's a big reason why you offload state to databases - they've done the hardest bits for you.
Little big concept or mono-micro people like the call it is where it is at. Spending too much time making a complex environment benefits noone. Spending too much time making a complex application benefits noone.
Breaking down monoliths into purposed tasks and then creating pod groups small containers is where it is at. Managing a fleet on kubernetes is easier than managing one in some configuration management stack. Be it puppet or salt. You have too many dependencies. Only your core infrastructure kubernetes should be a product of the configuration management.
Running a single server often can’t meet availability expectations that users have. This is orthogonal to scalability. You almost always need multiple copies.
Yes but it’s literally a single point of failure. You probably want at least two servers in separate physical locations. Also how do you do deployments without interruption of service on a single server?
> That's a budget for a local crafts store website hosting, not "high availability" system
I'm not sure about that: you could still put something together within that budget, say, a few different VPSes across Hetzner and Contabo (or different regions within the same provider's offerings), with some circuit breaking and load balancing between those. Probably either a managed database offering, or a cluster of DBs running on similar VPSes.
Of course, this might mean that you have 1-3 instances of a service instead of 10-30, but as long as availability is the goal and not necessarily throughput, that can go pretty far.
> If you're a amateur something, that just does it for fun - sure
sed "s/amateur/comparatively poor, from a third world country, without VC money, or have cheap labor/g"
Not everyone can afford advanced tools or platforms, or even using something like AWS/Azure/GCP. Some of those can indeed be amateur use cases (e.g. side project or bootstrapped SaaS), others simply stretching your money for any number of considerations (e.g. non profit, limited budget etc.), but it's definitely possible. In some countries it probably makes more sense to just build your own solution, as long as you're not doing anything too advanced.
500 USD a month would get you approximately the following resources (taxes vary) on the aforementioned platforms:
Contabo
Nodes: 15 to 83 (depending on configuration)
CPU: 150 to 332 cores
RAM: 664 to 900 GB
SSD: 4150 to 6000 GB
Hetzner
Nodes: 7 to 110 (depending on configuration)
CPU: 86 to 192 cores
RAM: 192 to 384 GB
SSD: 2200 to 4800 GB
(this includes regular VPS packages, not storage optimized ones, or dedicated hosting etc.)
I'm not sure about you, but in my experience that could be enough for some pretty decent systems, albeit some storage heavy workloads would need the storage packages instead of the regular VPS ones. It's mostly a matter of picking a suitable topology and working towards your goals.
> Otherwise what you have is a budget that is lower than the possible implications of temporary downtime. That doesn't make sense in the real world.
This (depending on the circumstances) does sound like a good point! Maybe "the real world" isn't the best wording, though, and choosing "enterprise settings" or anything along those lines would be more suitable.
I could probably design and deploy an HA system for way less. Maybe less than $200/month. It wouldn't be the most performant, but would be HA in three regions.
But it leads me back to my original statement - extreme requirements for uptime don't come out of nothing.
If you're in a location where IT related labor is extremely cheap - you're just going to have people keep one server up.
I know I used to do exactly that, because the server was more than my annual income. But that didn't last long. After the first 20 minute downtime, the budget for HA solution was allocated. But before a certain point downtime wasn't expensive.
Non-profits would probably be the only reasonable exception, where HA and low budgets could coincide. Otherwise - nah...
Those are all fair points, perhaps even more so given the trend of compute and other resources generally becoming more cheap with time (things like Wirth's law and limited IPv6 support aside), thanks for expanding on your arguments!
>> Otherwise what you have is a budget that is lower than the possible implications of temporary downtime. That doesn't make sense in the real world.
> Maybe "the real world" isn't the best wording, though, and choosing "enterprise settings" or anything along those lines would be more suitable.
This is the point - for-profit corporations, by definition, don't want to waste money, and if "high reliability" isn't required (or they don't know about it), they don't waste money. Most of the time, they don't waste the money.
However, if "being cheap" would means billions in lost income (and they know about it), they really want to have reliable, redundant infrastructure and systems around.
well, to be fair, you could host a highly available local craft store website for under 500 dollars on $cloudprovider easily. You could also trivially get regional redundancy.
Go to Digitalocean or a similar provider. Launch a managed HA db. (starts at ~$120/month). Launch an autoscaling HA K8 cluster (starts from ~$80) , deploy 1-2 stateless pods. There you go.
> Servers can only get so big. If your monolith needs more resources than a single server can provide, then you can chop it up into microservices and each microservice can get its own beefy server. Then you can put a load balancer in front of a microservice and run it on N beefy servers.
I've almost never seen situations where a single request would need more resources available, than the entire server has (outside of large GPT models for text, though maybe that's because I couldn't afford beefy machines for that myself).
Instead, if your monolith needs X resources to run (overhead) and Y resources to serve your current load, then in case of increased load you can just setup another instance of your monolith in parallel with another set of X + Y resources (same configuration) and it will generally almost double your capacity.
Now, there can be some issues with this, such as needing to ensure either stateless APIs or sticky sessions, but both are generally regarded as solved problems (with a little bit of work). Monoliths themselves shouldn't be limited to running just a single instance and aren't that different from a scalability perspective than microservices.
Where microservices excel, however, is that you can observe individual services (e.g. systemd services or them running in containers) better and see when a particular service is misbehaving or scale them separately, as well as decrease that X overhead since each service has a smaller codebase when you have lots of instances running. This does come at the expense of increased operational complexity and possibly noisy network chatter, especially if you've drawn your service boundaries wrong.
However, at the same time I've seen actual monoliths that can never have more than one instance running due to problematic architecture, so therefore I propose the following wording (that I've heard elsewhere):
SINGLETONS - a monolithic application that can ever only have a single instance running, for example, when business processes are stored in memory for a bit, or have user sessions or something like that stored locally as well; these will ONLY ever scale VERTICALLY, unless you re-architect them
MONOLITHS - applications that contain all of your project's logic in a single codebase, although multiple instances can be launched in parallel, depending on your needs; can be scaled BOTH VERTICALLY and HORIZONTALLY; they have more overhead though and observability can be a bit challenging
MICROSERVICES - applications that contain a part of the total project's logic, typically across multiple separate codebases, possibly with shared library code, pieces of your project can be scaled separately, BOTH VERTICALLY and HORIZONTALLY; they are operationally more complex, can involve more network chatter and while you can observe how services perform, now you need to deal with distributed tracing
Of course, there can be more nuance to it, like modular monoliths, that still have one codebase, but can have certain groups of functionality enabled or disabled. I actually wrote about that approach a while back, calling them "moduliths": https://blog.kronis.dev/articles/modulith-because-we-need-to...
I don't actually expect anyone to use these particular terms, but I dislike when someone claims that monoliths have the issues of these "singleton" applications when in fact that's just because they've primarily worked with bad architectures. Sometimes they wouldn't need to shoot themselves in the foot with microservices if they could just extract their session data into Redis and their task queues into RabbitMQ. Other times, microservices actually make sense.
Servers can only get so big. If your monolith needs more resources than a single server can provide, then you can chop it up into microservices and each microservice can get its own beefy server. Then you can put a load balancer in front of a microservice and run it on N beefy servers.
But this only matters at Facebook scale. I think most devs would be shocked at how much a single beefy server running efficient code can do.