But in this case, it is like saying "You don't need a fuel truck. You can transport 9,000 gallons of gasoline between cities by gathering 9,000 1-gallon milk jugs and filling each, then getting 4,500 volunteers to each carry 2 gallons and walk the entire distance on foot."
In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
I'll push the metaphor a bit: I think the point is that if you have a fleet of vehicles you want to fuel, go ahead and get a fuel truck and bite off on that expense. However, if you only have 1 or 2, a couple of jerry cans you probably already have + a pickup truck is probably sufficient.
Getting a 288-core machine might be easier than setting up Kafka; I'm guessing that it would be a couple of weeks of work to learn enough to install Kafka the first time. Installing Postgres is trivial.
"Lots of the team knows Postgres really well, nobody knows Kafka at all yet" is also an underrated factor in making choices. "Kafka was the ideal technical choice but we screwed up the implementation through well-intentioned inexperience" being an all too plausible outcome.
Indeed, I've seen this happen first hand where there was really only one guy who really "knew" Kafka, and it was too big of a job for just him. In that case it was fine until he left the company, and then it became a massive albatross and a major pain point. In another case, the eng team didn't really have anyone who really "knew" Kafka but used a managed service thinking it would be fine. It was until it wasn't, and switching away is not a light lift, nor is mass educating the dev team.
Kafka et al definitely have their place, but I think most people would be much better off reaching for a simpler queue system (or for some things, just using Postgres) unless you really need the advanced features.
Postgres is the solution in question of the article because I simply assume the majority of companies will start with Postgres as their first piece of infra. And it is often the case. If not - MySQL, SQLite, whatever. Just optimize for the thing you know, and see if it can handle your use case (often you'll be surprised)
The only thing that might take "weeks" is procrastination. Presuming absolutely no background other than general data engineering, a decent beginner online course in Kafka (or Redpanda) will run about 1-2 hours.
Right, I was talking about installing Kafka, not installing Redpanda. Redpanda may be perfectly fine software, but bringing it up in that context is a bit apples-and-oranges since it's not open-source: https://news.ycombinator.com/item?id=45748426
Just use Strimzi if you're in a K8s world (disclosure used to work on Strimzi for RH, but I still think it's far better than Helm charts or fully self-managed, and far cheaper than fully managed).
Even though I'm a few years on from Red Hat, I still really recommend Strimzi. I think the best way to describe it is "a sorta managed Kafka". It'll make things that are hard in self-managed Kafka (like rolling upgrades) easy as.
Is it about what Kafka could get or what you need right now.
Kafka is a full on steaming solution.
Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.
When you have C++ code, the number of external folks who want to — and who can effectively, actively contribute to the code — drops considerably. Our "cousins in code," ScyllaDB last year announced they were moving to source-available because of the lack of OSS contributors:
> Moreover, we have been the single significant contributor of the source code. Our ecosystem tools have received a healthy amount of contributions, but not the core database. That makes sense. The ScyllaDB internal implementation is a C++, shard-per-core, future-promise code base that is extremely hard to understand and requires full-time devotion. Thus source-wise, in terms of the code, we operated as a full open-source-first project. However, in reality, we benefitted from this no more than as a source-available project.
People still want to get free utility of the source-available code. Less commonly they want be able to see the code to understand it and potentially troubleshoot it. Yet asking for active contribution is, for almost all, a bridge too far.
Note that prior to its license change ScyllaDB was using AGPL. This is a fully FLOSS license but may have been viewed nonetheless as somewhat unfriendly by potential outside contributors. The ScyllaDB license change was really more about not wanting to expend development effort on maintaining multiple versions of the code (AGPL licensed and fully proprietary), so they went for sort of a split-the-difference approach where the fully proprietary version was in turn made source-available.
(Notably, they're not arguing that open source reusers have been "unfair" to them and freeloaded on their effort, which was the key justification many others gave for relicensing their code under non-FLOSS terms.)
In case anyone here is looking for a fully-FLOSS contender that they may want to perhaps contribute to, there's the interesting project YugabyteDB https://github.com/yugabyte/yugabyte-db
I think AGPL/Proprietary license split and eventual move to proprietary is just a slightly less overt way of the same "freeloader" argument. The intention of the original license was to make the software unpalatable to enterprises unless you buy the proprietary license, and one "benefit" of the move (at least for the bean counters) is that it stops even AGPL-friendly enterprises from being able to use the software freely.
(Personally, I have no issues with the AGPL and Stallman originally suggested this model to Qt IIRC, so I don't really mind the original split, but that is the modern intent of the strategy.)
I think the intention of the original license was to make the software unpalatable to SaaS vendors who want to keep their changes proprietary, not unpalatable to enterprises in general.
Rightly or wrongly, large companies are very averse to using AGPL software even if it would cause them very little additional burden to comply with the AGPL. Lots of projects use this cynically to help sell proprietary licenses (the proof of this is self-evident -- many such projects have CLAs and were happy to switch to a proprietary license that is even less favourable to enterprises than the AGPL as soon as it was available).
Again, I'm happy to use AGPL software, I just disagree that the intent here is that different to any of the other projects that switched to the proprietary BSL.
You are obviously free to choose to use a proprietary license, that's fine -- but the primary purpose of free licenses has very little to do with contributing code back upstream.
As a maintainer of several free software projects, there are lots of issues with how projects are structured and user expectations, but I struggle to see how proprietary licenses help with that issue (I can see -- though don't entirely buy -- the argument that they help with certain business models, but that's a completely different topic). To be honest, I have no interest in actively seeking out proprietary software, but I'm certainly in the minority on that one.
Right, open source is generally of benefit to users, not to the author, and users do get some of that benefit from being able to see the source. I wouldn't want to look at it myself, though, for legal reasons.
You can be open source and not take contributions. This argument doesn't make sense to me. Just stop doing the expensive part and keep the license as is.
I think the argument is that, if they expected to receive high-quality contributions, then they'd be willing to take the risk of competitors using their software to compete with them, which an open-source license would allow. It usually doesn't work out that way; with a strong copyleft license, your competitors are just doing free R&D improving your own product, unless they can convince your customers that they know more about the product than the guys who wrote it in the first place. But that's usually the fear.
On the other hand, if they don't expect people outside their company to know C++ well enough to contribute usefully, they probably shouldn't expect people outside their company to be able to compete with them either.
Really, though, the reason to go open-source is because it benefits your customers, not because you get contributions, although you might. (This logic is unconvincing if you fear they'll stop being your customers, of course.)
I think it's reasonably common for accepting external contributions to an open-source project to be more trouble than it's worth, just because most programmers aren't very good.
I often use a different approach - assume by default that external contributors are smarter than our employees. This is needed to prevent arrogance and entitlement during code reviews. A reasonable pull request from an external contributor is more valuable than one from an employee.
Your name sounds familiar. I think you may be one of the people at RedPanda with whom I’ve corresponded. It’s been a few years though, so maybe not.
A colleague and I (mostly him, but on my advice) worked up a set of patches to accept and emit JSON and YAML in the CLI tool. Our use case at the time was setting things up with a config management system using the already built tool RedPanda provides without dealing with unstructured text.
We got a lot of good use out of RedPanda at that org. We’ve both moved on to a new employer, though, and the “no offering RedPanda as a service” spooked the company away from trying it without paying for the commercial package. Y’all assured a couple of us that our use case didn’t count as that, but upper management and legal opted to go with Kafka just in case.
"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."
However, for all that said, Redpanda is still blazingly fast.
I'm highly skeptical of the method employed to simulate unsync'd writes in that example.
Using a non-clustered zookeeper and then just shutting it down, breaking the kafka controller and preventing any kafka cluster state management (not just preventing partition leader election) while manually corrupting the log file. Oof. Is it really _that_ hard to lose ack'd data from a kafka cluster that you had to go to such contrived and dubious lengths?
To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.
I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.
Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.
To the issue of complexity, is Redpanda suitable as a "single node implementation" where a Kafka cluster is not needed due to data volume, but the Kafka message bus pattern is desired?
Downdetector had 5,755 reports of AWS problems at 12:52 AM Pacific (3:53 AM Eastern).
That number had dropped to 1,190 by 4:22 AM Pacific (7:22 AM Eastern).
However, that number is back up with a vengeance. 9,230 reports as of 9:32 AM Pacific (12:32 Eastern).
Part of that could be explained by more people making reports as the U.S. west coast awoke. But I also have a feeling that they aren't yet on top of the problem.
Where do they source those reports from? Always wondered if it was just analysis of how many people are looking at the page, or if humans somewhere are actually submitting reports.
So if my browser auto-completes their domain name and I accept that (causing me to navigate directly to their site and then I click AWS) it's not a report; but if my browser doesn't or I don't accept it (because I appended "AWS" after their site name) causing me to perform a Google search and then follow the result to the AWS page on their site, it's a report? That seems too arbitrary... they should just count the fact that I went to their AWS page regardless of how I got to it.
I don't know the exact details, but I know that hits to their website do count as reports, even if you don't click "report". I assume they weight it differently based on how you got there (direct might actually be more heavily weighted, at least it would be if I was in charge).
The 16" doesn't offer the M5(yet), rather the M4 Pro and Max CPUs. Difference also is higher number of performance cores vs efficiency cores and memory bandwidth is significantly higher in the M4 lines(273 and 410 GB/s) versus the M5(153 GB/s).
Correct. It is more provider-oriented proscription ("You can't say your chatbot is a therapist.") It is not a limitation on usage. You can still, for now, slavishly fall in love with your AI and treat it as your best friend and therapist.
There is a specific section that relates to how a licensed professional can use AI:
Section 15. Permitted use of artificial intelligence.
(a) As used in this Section, "permitted use of artificial intelligence" means the use of artificial intelligence tools or systems by a licensed professional to assist in providing administrative support or supplementary support in therapy or psychotherapy services where the licensed professional maintains full responsibility for all interactions, outputs, and data use associated with the system and satisfies the requirements of subsection (b).
(b) No licensed professional shall be permitted to use artificial intelligence to assist in providing supplementary support in therapy or psychotherapy where the client's therapeutic session is recorded or transcribed unless:
(1) the patient or the patient's legally authorized representative is informed in writing of the following:
(A) that artificial intelligence will be used; and
(B) the specific purpose of the artificial intelligence tool or system that will be used; and
(2) the patient or the patient's legally authorized representative provides consent to the use of artificial intelligence.
I went to the doctor and they used some kind of automatic transcription system. Doesn’t seem to be an issue as long as my personal data isn’t shared elsewhere, which I confirmed.
Whisper is good enough these days that it can be run on-device with reasonable accuracy so I don’t see an issue.
Yes, but HIPAA is notoriously vague with regards to what actual security measures have to be in place. Its more of an agreement between parties as to who is liable in case of a breach than it is a specific set of guidelines like SOC 2.
If your medical files are locked in the trunk of a car, that’s “HIPAA-compliant” until someone steals the car.
I think that's a good thing. I don't want a specific but largely useless checklist that absolves the party that ought to be held responsible. A hard guarantee of liability is much more effective at getting results.
It would be nice to extend the approximate equivalent of HIPAA to all personal data processing in all cases with absolutely zero exceptions. No more "oops we had a breach, pinky promise we're sorry, don't forget to reset all your passwords".
No disagreement. Its just something I point out when people are concerned about "HIPAA compliance."
My experience is that people tend to think its some objective level of security. But its really just the willingness to sign a BAA and then take responsibility for any breaches.
Yes, but also "An... entity may not provide... therapy... to the public unless the therapy... services are conducted by... a licensed professional".
It's not obvious to me as a non-lawyer whether a chat history could be decided to be "therapy" in a courtroom. If so, this could count as a violation. Probably lots of law around this stuff for lawyers and doctors cornered into giving advice at parties already that might apply (e.g., maybe a disclaimer is enough to workaround the prohibition)?
Functionally, it probably amounts to two restrictions: a chatbot cannot formally diagnose & a chatbot cannot bill insurance companies for services rendered.
Most "therapy" services are not providing a diagnosis. Diagnosis comes from an evaluation before therapy starts, or sometimes not at all. (You can pay to talk to someone without a diagnosis.)
The prohibition is mainly on accepting any payment for advertised therapy service, if not following the rules of therapy (licensure, AI guidelines).
These things usually (not a lawyer tho) come down to the claims being actively made. For example "engineer" is often (typically?) a protected title but that doesn't mean you'll get in trouble for drafting up your own blueprints. Even for other people, for money. Just that you need to make it abundantly clear that you aren't a licensed engineer.
I imagine "Pay us to talk to our friendly chat bot about your problems. (This is not licensed therapy. Seek therapy instead if you feel you need it.)" would suffice.
For a long time, Mensa couldn't give people IQ scores from the tests they administered because somehow, legally, they would be acting medically. This didn't change until about 10 years ago.
Defining non-medical things as medicine and requiring approval by particular private institutions in order to do them is simply corruption. I want everybody to get therapy, but there's no difference in outcomes whether you get it from a licensed therapist using some whacked out paradigm that has no real backing, or from a priest. People need someone to talk to who doesn't have unclear motives, or any motives really, other than to help. When you hand money to a therapist, that's nearly what you get. A priest has dedicated his life to this.
The only problem with therapists in that respect is that there's an obvious economic motivation to string a patient along forever. Insurance helps that by cutting people off at a certain point, but that's pretty brutal and not motivated by concern for the patient.
If you think human therapists intentionally string patients forever, wait to see what tech people can achieve with gamified therapists literally A/B tested to string people along. Oh, and we will then blame the people for "choosing" to engage with that.
Also, the proposition is dubious, because there are waitlists for therapists. Plus, therapist can actually loose the license while the chatbot cant, no matter how bad the chatbot gets.
In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
[Disclosure: I work for Redpanda]