I've worked jobs where I'm paid 60-80k and where I'm paid 100k plus.
I can tell you for sure I cared much less about the job where I wasn't getting paid as much. I would never answer an after work call or go out of my way to follow up organizational or systemic issues. For me working overtime is part of the package when I get paid loads more.
> For me working overtime is part of the package when I get paid loads more.
I've tended to think this way, but it's all relative. Plenty of people who've never had a $50k job will treat the $100k job as the baseline/minimum, and wouldn't ever think of going the extra mile like you do. For you (and me?) - yeah, you're paying "a lot", I'll make sure I do a bit extra/more when needed. But if you've never had less, that $100k is your bottom, and you might only consider doing 'extra' for a $200k job.
Looking at how Nvidia share prices developed in the last 12 months should be reason alone to enter the AI accelerator business. And OpenAI does have software experts and a major component of Nvidia's success in AI is that they have a major advantage in software compared to their competitors (the hardware advantage isn't that large). Furthermore, there is Google's "We have no moat, and neither does OpenAI" memo: OpenAI definitely needs to strengthen its moat.
But of course, doing research with pytorch is not the same as developing driver code for some hardware bus or scheduling algorithms.
They're apparently not interested in building the chips themselves per the article, but rather 'getting' TSMC or another manufacturer to build them with the funds.
Which raises more questions than it answers. Why does OpenAI even need to be involved -- is this a common situation?
I would think the manufacturers would anticipate the huge impending demand for chips and raise the necessary investments to expand themselves.
Without any special knowledge I would assume that their motivation is two-fold. First remove the middleman that is potentially adding a huge margin on top of the silicon, second remove the extra fluff on the silicon that is not necessary for their specific use case, making the cost smaller.
This is really interesting to see these two posts. I can now imagine where AI tools actually inhibit innovation in many domains simply because they’re optimized for things that are already entrenched and new entrants won’t be in the training data. Further inhibiting adoption compared to existing things and thus further inhibiting enough growth to make it into model updates.
It is a healthy mindset to see this phenomenon as "interesting". I can get there when I dial up my mindfulness, but my default mode here is rather judgy; as in "please ppl! pick the better tool as evaluated over a 4+ hour timeframe (after you've got some muscle memory for the API) instead of a 15 minute evaluation".
Forgive me for ranting here, but have people forgotten how to bootstrap their own knowledge about a new library? Taking notes isn't hard. Making a personal cheat-sheet isn't hard. I say all this AND I use LLMs very frequently to help with technical work. But I'm mindful about the tradeoffs. I will not let the tool steer me down a path that isn't suitable.
I'm actually hopeful: there is an unexpected competitive advantage to people who are willing to embrace a little discomfort and take advantage of one's neuroplasticity.
> I can now imagine where AI tools actually inhibit innovation [...] new entrants won’t be in the training data
I still imagine the opposite impact... Welcome to no-moats-lang.io! So, you've created yet another new programming language over the holidays? You have a sandbox and LSP server up, and are wondering what to do next? Our open-source LLMs are easily tuned for your wonderful language! They will help you rapidly create excellent documentation, translators from related popular languages, do bulk translation of "batteries" so your soon-to-be-hordes of users can be quickly productive, and create both server and on-prem ChatOverflowPilotBots! Instant support for new language versions, and automatic code update! "LLM's are dynamite for barriers to entry!" - Some LLM Somewhere Probably.
Once upon a time, a tar file with a compiler was MVP for a language. But with little hope of broad adoption. And year by year, user minimum expectations have grown dauntingly - towards extensive infrastructure, docs, code, community. Now even FAMG struggle to support "Help me do common-thing in current-version?". Looking ahead, not only do LLMs seemingly help drop the cost of those current expectations to something a tiny team might manage, but also help drop some of the historical barriers to rapid broad adoption - "Waiting for the datascience and webdev books? ... Week after next."
We might finally be escaping decades of language evolution ecosystem dysfunction... just as programming might be moving on from them? :/
Because it’s like willfully choosing the more painful and difficult tool that occasionally stabs you in the hand, because you’re now used to being stabbed in the hand.
Continuing to choose it in the face of - in their own words - a better option, is a bit mind-boggling to me.
I went this route and got taken over by hackers multiples times. It was very worth it. I got taken over by hackers because my password for ssh was "mars". Me and my little brother were sharing it and wanted an easy password (yeah we have ssh keys now).
Anyways, we both learnt a lot (htop tmux etc) . I'm always jealous that he got to learn everything earlier than me. But if he's not better than me then I consider myself a failure wrt being an older brother.
The only drawback is that this doesn't work if you want to do ai stuff. For those use cases I rent a machine on paper space for a cheap hourly rate.
I'm a little sad I never was. I started with the Linode "hardening linux guide" and so had a firewall and disabled ssh passwords from day 1. I still have fun looking at the failed attempts on 22 and 443. My server gets so many weird requests, and they used to crash the server. A few iterations and that stopped happening.
Oh, another thing that's worth learning: how to acquire and refresh a Lets Encrypt TLS cert via the ACME protocol. Doing this requires interesting confluence of skills and tools - you must carve out a vestigial http route in your server, and also configure certbot and cron. And working out the bugs takes a few iterations. (You could install Caddy, but where's the fun in that?!)
Making it all work, from scratch, made me feel happy in the same way that when I watch people rebuild carburetors or who build bookshelves from scratch makes them feel. It's not new, it's not innovative, but its good. And it's always more interesting than you'd ever suspect.
I thought I had been once, and got a very scary email that came from my own domain, claiming to have gotten into my things, and I fully assumed it was the VPS that got hacked. After calming down and raiding the shit out of everything I realized it was just plain old domain spoofing. Both disappointing and terrifying at the same time!
There is a lot of value in learning things the hard way vs the easy way if there is no real significant harm caused, I think. In many cases you learn more or gain a deeper understanding/respect for the topic, which is worth something in its own
This. It would be suicidal to leave a manager a bad review. The only thing that works is anonymous feedback which is how Meta/Facebook is so effective at getting rid of bad managers.
Very cool but will the threads necessarily wake up deterministically? One may wake up before another but not get cpu before it correct? (forgive me if I'm misunderstanding the code)
Yup. If we are being pedantic, and I am sure the author of this loves being pedantic, any number of factors could cause this algorithm to be incorrect. Put it in a slow enough cpu for instance, and it will output the wrong answer.
And correctness is the single most important part of an algorithm. We can do any problem in constant time if we don't mind our answers are not correct.
I'm sure somebody will figure out a way to use multiple seemingly-legitimate parameters to get the same result. Why use ?click_id=aqNERjsdfyqe when you can use ?category=10612550&subcategory=5929127&page=4257344 and transfer the same data without arousing suspicion?
Websites can use a single lengthy encrypted parameter to encode everything (query params and tracking data). And then what.. will they break all website links by removing the parameter?
Not sure if it answers your question but the paper notes something similar in its discussion of limitations:
> the linear attention of RWKV
leads to significant efficiency gains but still, it may
also limit the model’s performance on tasks that
require recalling minutiae information over very
long contexts. This is due to the funneling of information through a single vector representation
over many time steps, compared with the full information maintained by the quadratic attention of
standard Transformers. In other words, the model’s
recurrent architecture inherently limits its ability to
“look back” at previous tokens, as opposed to traditional self-attention mechanisms. While learned
time decay helps prevent the loss of information,
it is mechanistically limited compared to full self-
attention.
> Another limitation of this work is the increased
importance of prompt engineering in comparison to
standard Transformer models. The linear attention
mechanism used in RWKV limits the information
from the prompt that will be carried over to the
model’s continuation. As a result, carefully designed prompts may be even more crucial for the
model to perform well on tasks
I can tell you for sure I cared much less about the job where I wasn't getting paid as much. I would never answer an after work call or go out of my way to follow up organizational or systemic issues. For me working overtime is part of the package when I get paid loads more.