Hacker Newsnew | past | comments | ask | show | jobs | submit | jacquesm's commentslogin

I can probably shed some light on that. I've used Forth on 8 bit platforms (6502, 6809), 16 bit platforms (80286) and 32 bit platforms (68K), as well as assembly, and on the 16 and 32 bit platforms C. Putting these side-by-side and assuming roughly equivalent programmer competence levels at the time assembler would win out, C would get you to maybe half the speed of assembly on a good day and Forth was about 10x slower than that.

Which was still incredibly fast for the day, given that Forth was compiled to an intermediary format with the Forth interpreter acting as a very primitive virtual machine. This interpretation step had considerable overhead, especially in inner loops with few instructions the overhead would be massive. For every one instruction doing actual work you'd have a whole slew of them assigned to bookkeeping and stack management. What in C would compile to a few machine instructions (which a competent assembly programmer of the time would be able to significantly improve upon) would result in endless calls to lower and lower levels.

There were later Forth implementations that improved on this by compiling to native code but I never had access to those when I was still doing this.

For a lark I wrote a Forth in C rather than bootstrapping it through assembly and it performed quite well, Forth is ridiculously easy to bring up, it is essentially a few afternoons work to go from zero to highway speeds on a brand new board that you have a compiler for. Which is one of the reasons it is still a favorite for initial board bring-up.

One area where Forth usually beat out C by a comfortable margin was code size, Forth code tends to be extremely compact (and devoid of any luxury). On even the smallest micro controllers (8051 for instance, and later, MicroChip and such) you could get real work done in Forth.


> How is process-based sandboxing stronger?

Because the guarantees themselves are stronger, process isolation is something we have decades of experience with, it goes wrong every now and then but those are rare instances whereas what amounts to application level isolation is much weaker in terms the guarantees it that it provides and the maturity level of the code. That suggests that if you base your isolation scheme on processes rather than 'just' sandboxing that you will come out ahead and even with all other things the same you'd have one more layer in your stack of swiss cheese slices. A VM would offer another layer of protection on top of that, one with yet stronger guarantees.


Any assumption made in order to ship a product on time will eventually be found to have been incorrect and will cause 10x the cost that it would have taken to properly design the thing in the first place. The problem is that if you do that proper design you never survive to the stage where have that problem.

I think the solution to that is to continuously refactor, and to spell out very clearly what your assumptions are when you are writing the code (which is an excellent use for comments).


Continuous refactoring is much easier with well constrained data/type schemas. There are fewer edge cases to consider, which means any refactoring or data migration processes are simpler.

The trick is to make the schema represent what you need - right now - and no more. Which is the point of the “Make your invalid states unrepresentable” comment.


That's almost but not quite how the airline industry is treated. The difference there is that the regulators are in bed with the companies they should be regulating.

There are four words that would make the output of any LLM instantly 1000x more useful and I haven't seen them yet: "I do not know.".

"Dixie can't meaningfully grow as a person. All that he ever will be is burned onto that cart;"

"Do me a favor, boy. This scam of yours, when it's over, you erase this god-damned thing."


If you are really concerned you should do this and then report back. Otherwise it is just a mild form of concern trolling.

I checked the the code, reported a bug, and Filip fixed it. Therefore, as I said, I was a little concerned.

Yes, but instead of remarking solely on the fact that the author has a pretty good turnaround time for fixing bugs (I wished all open source projects were that fast) and listens to input belies the tone of your comment, which makes me come away with a negative view of the project, when in fact the evidence points to the opposite.

It's a 'damning with faint praise' thing and I'm not sure to what degree you are aware of it but I don't think it is a fair way to treat the author and the project. HN has enough of a habit of pissing on other people's accomplishments already. Critics have it easy, playwrights put in the hours.


I understand your point, and I have the utmost respect for the author who initiated, implemented, and published this project. It's a fantastic piece of work (I reviewed some part of it) that will very likely play an important role in the future - it's simply too good not to.

At the same time, however, the author seems to be operating on the principle: "If I don't make big claims, no one will notice." The statements about the actual security benefits should be independently verified -this hasn't happened yet, but it probably will, as the project is gaining increasing attention.


> "If I don't make big claims, no one will notice."

I am making big claims because there are big claims to be made.

> he statements about the actual security benefits should be independently verified -this hasn't happened yet

I don't know what this means. Folks other than me have independently verified my claims, just not exhaustively. No memory safe language runtime has been exhaustively verified, save maybe Spark. So you're either saying something that isn't true at all, or that could be said for any memory safe language runtime.


To clarify the position, my concern isn't that the project is bad - it's that security engineering is a two-front war. You have to add new protections (memory safety) without breaking existing contracts (like ld.so behavior).

When a project makes 'big claims' about safety, less technical users might interpret that as 'production ready'. My caution is caused by the fact that modifying the runtime is high-risk territory where regressions can introduce vulns that are distinct from the memory safety issues you are solving.

The goal is to prevent the regression in the first place. I'm looking forward to seeing how the verification matures and rooting for it.


> without breaking existing contracts (like ld.so behavior)

If you think that Fil-C regresses ld.so then get specific. Otherwise what you’re doing is spreading fear, uncertainty, and doubt for no good reason.

Fil-C has always honored the setuid behavior provided by ld.so. There was a bug - since fixed - that the Fil-C runtime called getenv instead of secure_getenv.

> When a project makes 'big claims' about safety, less technical users might interpret that as 'production ready'.

Fil-C is production ready and already has production users.


I would suggest you re-read your comment in a week or so to see if by then you are far enough away from writing it to see how others perceive it. If it wasn't your intention to be negative then maybe it is my non-native English capability that is the cause of this but even upon re-reading it that's how I perceive it.

- You start off with commenting that the author has a knack for self promotion and invention. My impression is that he's putting in a status report for a project that is underway.

- you follow this up with something that you can't possibly know and use that to put the project down, whilst at the same time positioning yourself as a higher grade authority because you are apparently able to see something that others do not, effectively doing that which you accuse the author of: self promotion.

- You then double down on this by showing that it was you who pointed out to the author that there was a bug in the software, which in the normal course of open source development is not usually enough to place yourself morally or technically above the authors.

- You then in your more or less official capacity of established critic warn others to hold off putting this project to the test until 'adults' have reviewed it.

- And then finally you suggest they do it anyway, with your permission this time (and of course now amply warned) with the implicit assumption that problems will turn up (most likely this will be the case) and that you hope 'there won't be too many false positives', strongly suggesting that there might be.

And in your comment prior to this reply you do that once again, making statements that put words in the mouth of the author.


You're right, my tone was off.

That's absolutely amazing.

What a great job he did. It looks very professional, even though the numbers produced must be fairly low. I wonder how the shutter mechanism works, on most medium format cameras that's a work of art and a project in its own right.

It uses mamiya press lenses, the focusing helicoid and shutter are in the lens.

What is the quality like for these lenses? This says the system was discontinued in the 1970s.

https://en.wikipedia.org/wiki/Mamiya_Press

I've seen a lot of Mamiya 645 and 67 systems but those were probably from the 1980's and 90's.


Good for the time but dated. When your negative is the big it doesn’t matter as much but it’s absolutely not as sharp as more modern constructions. It’s also arguably a step down to use his printed body versus the original Mamiya press in terms of functionality other than reduced weight. Still an impressive design though.

I can't speak for the Mamiya Press but the Zeiss Planar 80mm f/2.8 for the Graflex XL is rad.

Ahh, I totally missed that detail. That makes the whole thing so much more feasible.

See "Camera Specifics" on [1], for a casual but accurate explanation.

Photography is an amazing hobby, highly recommend diving into it ^^!

1: https://photothinking.com/2021-07-03-mamiya-press-super-23-f...


I did the same, then put in 14 3090's. It's a little bit power hungry but fairly impressive performance wise. The hardest parts are power distribution and riser cards but I found good solutions for both.

I think 14 3090's are more than a little power hungry!

to the point that I had to pull an extra circuit... but tri phase so good to go even if I would like to go bigger.

I've limited power consumption to what I consider the optimum, each card will draw ~275 Watts (you can very nicely configure this on a per-card basis). The server itself also uses some for the motherboard, the whole rig is powered from 4 1600W supplies, the gpus are divided 5/5/4 and the mother board is connected to its own supply. It's a bit close to the edge for the supplies that have five 3090's on them but so far it held up quite well, even with higher ambient temps.

Interesting tidbit: at 4 lanes/card throughput is barely impacted, 1 or 2 is definitely too low. 8 would be great but the CPUs don't have that many lanes.

I also have a threadripper which should be able to handle that much RAM but at current RAM prices that's not interesting (that server I could populate with RAM that I still had that fit that board, and some more I bought from a refurbisher).


What pcie version are you running? Normally I would not mention one of these, but you have already invested in all the cards, and it could free up some space if any of your lanes being used now are 3.0.

If you can afford the 16 (pcie 3) lanes, you could get a PLX ("PCIe Gen3 PLX Packet switch X16 - x8x8x8x8" on ebay for like $300) and get 4 of your cards up to x8.


All are PCIe 3.0, I wasn't aware of those switches at all, in spite of buying my risers and cables from that source! Unfortunately all of the slots on the board are x8, there are no x16 slots at all.

So that switch would probably work but I wonder how big the benefit would be: you will probably see effectively an x4 -> (x4 / x8) -> (x8 / x8) -> (x8 / x8) -> (x8 / x4) -> x4 pipeline, and then on to the next set of four boards.

It might run faster on account of the three passes that are are double the speed they are right now as long as the CPU does not need to talk to those cards and all transfers are between layers on adjacent cards (very likely), and with even more luck (due to timing and lack of overlap) it might run the two x4 passes at approaching x8 speeds as well. And then of course you need to do this a couple of times because four cards isn't enough, so you'd need four of those switches.

I have not tried having a single card with fewer lanes in the pipeline but that should be an easy test to see what the effect on throughput of such a constriction would be.

But now you have me wondering to what extent I could bundle 2 x8 into an x16 slot and then to use four of these cards inserted into a fifth! That would be an absolutely unholy assembly but it has the advantage that you will need far fewer risers, just one x16 to x8/x8 run in reverse (which I have no idea if that's even possible but I see no reason right away why it would not work unless there are more driver chips in between the slots and the CPUs, which may be the case for some of the farthest slots).

PCIe is quite amazing in terms of the topology tricks that you can pull off with it, and c-payne's stuff is extremely high quality.


If you end up trying it please share your findings!

I've basically been putting this kind of gear in my cart, and then deciding I dont want to manage more than the 2 3090s, 4090 and a5000 I have now, then I take the PLX out of my cart.

Seeing you have the cards already it could be a good fit!


Yes, it could be. Unfortunately I'm a bit distracted by both paid work and some more urgent stuff but eventually I will get back to it. By then this whole rig might be hopelessly outdated but we've done some fun experiments with it and have kept our confidential data in-house which was the thing that mattered to me.

Yes, the privacy is amazing, and there's no rate limiting so you can be as productive as you want. There's also tons of learnings in this exercise. I have just 2x 3090's and I've learnt so much about pcie and hardware that just makes the creative process that more fun.

The next iteration of these tools will likely be more efficient so we should be able to run larger models at a lower cost. For now though, we'll run nvidia-smi and keep an eye on those power figures :)


You can tune that power down to what gives you the best tokencount per joule, which I think is a very important metric by which to optimize these systems and by which you can compare them as well.

I have a hard time understanding all of these companies that toss their NDA's and client confidentiality into the wind and feed newfangled AI companies their corporate secrets with abandon. You'd think there would be a more prudent approach to this.


You get occasional accounts of 3090 home-superscalers whereas they would put up eight, ten, fourteen cards. I normally attribute this to obsessive-compulsive behaviour. What kind of motherboard you ended up using and what's the bi-directional bandwidth you're seeing? Something tells me you're not using EPYC 9005's with up to 256x PCIe 5.0 lanes per socket or something... Also: I find it hard to believe the "performance" claims, when your rig is pulling 3 kW from the wall (assuming undervolting at 200W per card?) The electricity costs alone would surely make this intractable, i.e. the same as running six washing machines all at once.

I love your skepsis of what I consider to be a fairly normal project, this is not to brag, simply to document.

And I'm way above 3 kW, more likely 5000 to 5500 with the GPUs running as high as I'll let them, or thereabouts, but I only have one power meter and it maxes out at 2500 watts or so. This is using two Xeons in a very high end but slightly older motherboard. When it runs the space that it is in becomes hot enough that even in the winter I have to use forced air from outside otherwise it will die.

As for electricity costs, I have 50 solar panels and on a good day they more than offset the electricity use, at 2 pm (solar noon here) I'd still be pushing 8 KW extra back into the grid. This obviously does not work out so favorably in the winter.

Building a system like this isn't very hard, it is just a lot of money for a private individual but I can afford it, I think this build is a bit under $10K, so a fraction of what you'd pay for a commercial solution but obviously far less polished and still less performant. But it is a lot of bang for the buck and I'd much rather have this rig at $10K than the first commercial solution available at a multiple of this.

I wrote a bit about power efficiency in the run-up to this build when I only had two GPUs to play with:

https://jacquesmattheij.com/llama-energy-efficiency/

My main issue with the system is that it is physically fragile, I can't transport it at all, you basically have to take it apart and then move the parts and re-assemble it on the other side. It's just too heavy and the power distribution is messy so you end up with a lot of loose wires and power supplies. I could make a complete enclosure for everything but this machine is not running permanently and when I need the space for other things I just take it apart, store the GPUs in their original boxes until the next home-run AI project. Putting it all together is about 2 hours of work. We call it Frankie, on account of how it looks.

edit: one more note, the noise it makes is absolutely incredible and I would not recommend running something like this in your house unless you are (1) crazy or (2) have a separate garage where you can install it.


Thanks for replying, and your power story does make more sense all things considering. I'm no stranger to homelabbing, in fact just now I'm running both IBM POWER9 system (really power-hungry) as well as AMD 8004, both watercooled now while trying to bring the noise down. The whole rack, along with 100G switches and NIC/FPGA's, is certainly keeping us warm in the winter! And it's only dissipating up to 1.6 kW (mostly, thanks to ridiculous efficiency of 8434PN CPU which is like 48 cores at 150W or sommat)

I cannot imagine dissipating 5 kW at home!


I stick the system in my garage when it is working... I very enthusiastically put it together on the first iteration (with only 8 GPUs) in the living room while the rest of the family was holidaying but that very quickly turned out to be mistake. It has a whole pile of high speed fans mounted in the front and the noise was roughly comparable to sitting in a jet about to take off.

One problem that move caused was that I didn't have a link to the home network in the garage and the files that go to and from that box are pretty large so in the end I strung a UTP cable through a crazy path of little holes everywhere until it reaches the switch in the hallway cupboard. The devil is always in the details...

Running a POWER9 in the house is worthy of a blog post :)

As for Frankie: I fear his days are numbered, I've already been eying more powerful solutions and for the next batch of AI work (most likely large scale video processing and model training) we will probably put something better together, otherwise it will simply take too long.

I almost bought a second hand NVidia fully populated AI workstation but the seller was more than a little bit shady and kept changing the story about how they got it and what they wanted for it. In the end I abandoned that because I didn't feel like being used as a fence for what was looking more and more like stolen property. But buying something like that new is out of the ballpark for me, at 20 to 30% of list I might do it assuming the warranty transfers and that's not a complete fantasy, there are enough research projects that have this kind of gear and sell it off when the project ends.

People joke I don't have a house but a series of connected workshops and that's not that far off the mark :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: