But my first thought looking at this is that the numbers are probably skewed due to distribution of user skill levels, and what types of users choose which tool.
My hypothesis is that Amp is chosen by people who are VERY highly skilled in agentic development. Meaning these are the people most likely to provide solid context, good prompts, etc. That means these same people would likely get the best results from ANY coding agent. This also tracks with Amp being so expensive -- users or companies are more likely to pay a premium if they can get the most from the tool.
Claude Code on the other hand is used by (I assume) a way larger population. So the percentage of low-skill users is likely to be much higher. Those users may still get value from the tool, but their success rate will be lower by some factor with ANY coding agent. And this issue (if my hypothesis is correct) is likely 10x as true for GitHub Copilot.
Therefore I don't know how much we should read into stats like the total PR merge success percentage, because it's hard to tell the degree of noise caused by this user skill distribution imbalance.
100% agree. I see this kind of refactoring as a form of bike-shedding. It's so _easy_ to do this, anyone can do it. It's much harder to think about and design for long-term change and maintainability. Much easier to just deduplicate and declare victory.
> It would be a lie for me to say that I joined crypto without any financial motivation. As a reader, it may sound hypocritical to you that I decided to swear off the crypto industry, now that I have made enough money. Yes, maybe I am hypocritical. But maybe I also just feel sick about contributing to the cesspool of financialization and gamblification of the economy.
If they truly feel sick about it, they should donate the money they made (or the vast majority of it). Otherwise, hypocritical is right.
Luckily that's what's happening here, just at a company level. Plenty of companies are remote only or remote friendly. Hopefully people who prefer remote work can leave here and find work at one of those companies, and maybe people who prefer in person work will find their way here.
I put this in the same bucket as the horrifying "996" trend, or even consultancies that require 80-100% travel. If you want to broadcast that you have a toxic work culture, all I can do is applaud your honesty and look elsewhere for work.
Congrats on your work ethic. But consider that this may simply not be the case for every working adult on earth, and may not even be true for every working adult in your company.
Not everyone is like you. I am, but I know people (some of whom are former and current coworkers) who are much more easily distracted, and are meaningfully less able to compete their work in a timely manner when they work from home.
I'll probably be downvoted, but I just don't think most of these execs are engaging in some larger "authoritarian" play with these moves (maybe some are, but I think incompetence is more likely than malice in most cases). But maybe I'm naive.
As one point, consider the case of Tokyo's "Manuscript Cafe" [0] where patrons intentionally visit to have a cafe owner "force" them to compete a task they may have been procrastinating on. I read this as: being in a "work" location surrounded by other working people is conducive to productivity for some people.
I think it's only a small portion of WFH advocates who say that everyone should be forced to work remotely. Most want each person to have the ability to work the way that's best for them.
Measured for a year, my team overall shipped 60% of issues WFH than when in office. WFH was nice for some colleagues but clearly not working for the team. We promptly change back to in office when able.
The crux of this is the way everyone is at their best is different per person.
Work from office is the brute force solution - if it’s the hammer, flexible work is the scalple.
Not every org has managers capable of welding a scalpel instead of a hammer, or who have time to be surgical even if they have the ability. I accept this reality.
On literally every project I've seen that enforces a % threshold, that % threshold has directly led to bad quality tests. This is more true the higher the % is, with the worst being 100%.
You can argue that it _shouldn't_ be this way, and I would agree, but it is that way. Perhaps in part because developers are humans, and even humans with the best initial intentions will game metrics with a large enough sample size over a long enough time horizon.
In my opinion, the openings to pair are harder to find when remote.
In person, it's usually easy to see when a junior is struggling, and pull up a chair. That same junior, in a remote environment, might not proactively ask for help. And me sending a slack message to them asking how they're doing might get a "all good" even when they are not. And sending a huddle request to them because I suspect they're struggling is just a VERY different thing than looking over at them to read their body language, or swinging my their desk to check in.
Maybe some would say that it's on that junior to know they need to ask for help. Sure, great. That does nothing to resolve the reality of this very real situation. Of course part of coaching this junior (in person or remote) would be encouraging them to be more proactive about asking for help, but if you have fewer opportunities to offer help and coaching in the first place, their growth is slower. Significantly slower in my opinion.
This is a hard difference to convey to someone who has never experienced a good in-person culture. But I know I would not be where I am if I had spent the first decade of my career working remotely.
I agree. I think, at least for me, it's because I know my colleagues know my overall capability, so admitting I don't know a specific thing isn't a big deal because I've already proven I'm capable of overcoming any specific gap. But to strangers, perhaps that may be all they know of me -- that I don't know this one thing. There's no preexisting relationship or past body of work (in other words: no trust) to balance that gap out or put it in perspective.
I think this is objectively mostly a silly and counter-productive worry. But I still feel it.
You've brought in "require" from nowhere. Neither the parent comment nor OP's article mentioned _requiring_ in-office attendance for everyone on earth. Many people make this leap, so I don't mean to single you out.
I think this is one reason this topic is so touchy -- it's hard to even express an opinion without someone assuming you mean to impose that opinion on everyone else (e.g. mandatory RTO), and then taking offense to that imagined imposition.
Perhaps in the long run we can self-organize into companies or groups within companies that universally prefer in-office or remote work.
I prefer at least some % (e.g. 50%) of my work to be in person. But I also don't like working with people who don't want to be there, or for whom being there is a huge burden. So I personally really hate RTO.
Instead, I'll choose a team or company that is open about requiring in-office time (and has been open about it for years), and is therefore staffed by people who also like that environment. It would be ludicrous to join a remote-first or remote-only company and then try to start imposing my in-person preference on others.
OP's article didn't explicitly state requiring, but it did claim make the claim that I'm objectively worse at my job by working remotely than I am in person:
> Remote work eliminates a lot of problems with office work: commutes, inefficient use of real estate, and land value distortion. But software development is better when you breathe the same air as the folks you work with. Even with a camera-on policy, video calls are a low-bandwidth medium. You lose ambient awareness of coworkers’ problems, and asking for help is a bigger burden. Pair programming is less fruitful. Attempts to represent ideas spatially get mutilated by online whiteboard and sticky note software. Even conflict gets worse: it’s easy to form an enemy image of somebody at the end of video call, but difficult to keep that image when you share a room with them and sense their pain.
None of that is phrased as personal opinion or their own subjective experience. I don't think it's hard to express an experience about personal preference, but it's hard to express an opinion about how other people's experiences without them having something to say about it if they disagree.
But I guess this is a good example of default assumptions. When I read a personal blog, unless I see words like "objectively", or see explicit arguments stating that others should/must do X, I by default read it as the author expressing their personal opinion on X.
But I can see it the other way too. I think it could be solved by the author using an "I" framing as they did in every other section.
> I think it could be solved by the author using an "I" framing as they did in every other section
Yeah, this is probably what really confused me most about it. The tone felt remarkably different in that section than the others, and when it's also the topic that at least to me seems the most controversial, I can't help but wonder if it's representative in some way. It's not at all uncommon for people to dig in their heels more when presented with disagreement, so it's hard for me not to be concerned that the only reason this section is phrased differently is that this is the topic where they've received the most pushback on, which would be exactly what I'd expect even before accounting for the actual phrasing of their opinion.
But my first thought looking at this is that the numbers are probably skewed due to distribution of user skill levels, and what types of users choose which tool.
My hypothesis is that Amp is chosen by people who are VERY highly skilled in agentic development. Meaning these are the people most likely to provide solid context, good prompts, etc. That means these same people would likely get the best results from ANY coding agent. This also tracks with Amp being so expensive -- users or companies are more likely to pay a premium if they can get the most from the tool.
Claude Code on the other hand is used by (I assume) a way larger population. So the percentage of low-skill users is likely to be much higher. Those users may still get value from the tool, but their success rate will be lower by some factor with ANY coding agent. And this issue (if my hypothesis is correct) is likely 10x as true for GitHub Copilot.
Therefore I don't know how much we should read into stats like the total PR merge success percentage, because it's hard to tell the degree of noise caused by this user skill distribution imbalance.
Still interesting to see the numbers though!
reply