What I’m saying is that in *nix tooling, things are typically designed to do one thing well. So no, I don’t want my shell to also have to talk MySQL, Postgres, SQL Server, DB2, HTTP, FTP, SMTP…
> "So no, I don’t want my shell to also have to talk..."
What's the point of the shell, if not to manage your databases, your REST APIs, files, and mail? Is it something you use for playing games on, or just for fun?
> designed to do one thing well.
Eeexcept that this is not actually true in practice, because the abstraction was set at a level that's too low. Shoving everything into a character (or byte) stream turned out to be a mistake. It means every "one thing" command is actually one thing plus a parser and and encoder. It means that "ps" has a built-in sort command, as do most other UNIX standard utilities, but they all do it differently. This also means that you just "need to know" how to convince each and every command to output machine-readable formats that other tools on the pipeline can pick up safely.
I'll tell you a real troublshooting story, maybe that'll help paint a picture:
I got called out to assist with an issue with a load balancer appliance used in front of a bunch of Linux servers. It was mostly working according to the customer, but their reporting tool was showing that it was sending traffic to the "wrong" services on each server.
The monitoring tool used 'netstat' to track TCP connections, which had a bug in that version of RedHat where it would truncate the last decimal digit of the port number if the address:port combo had the maximum possible number of digits, e.g.: 123.123.123.123:54321 was shown as 123.123.123.123:5432 instead.
Their tool was just ingesting that pretty printed table intended for humans with "aligned" columns, throwing away the whitespace, and putting that into a database!
This gives me the icks, but apparently Just The Way Things Are Done in the UNIX world.
In PowerShell, Get-NetTCPConnection outputs objects, so this kind of error is basically impossible. Downstream tools aren't parsing a text representation of a table or "splitting it into columns", they receive the data pre-parsed with native types and everything.
Please show me the equivalent using netstat. In case the above was not readable for you, it shows the top ten TCP ports by how many bound connections they have.
This kind of thing is a challenge with UNIX tools, and then is fragile forever. Any change to the output format of netstat breaks scripts in fun and create ways. Silently. In production.
For fun, I took a crack at your example and came up with this craziness (with the caveat it's late and I didn't spend much time on it), which is made a bit more awkward because grep doesn't do capturing groups:
Changing the awk field to 5 instead of 4 should get you remote ports instead of local. But yeah, that will be fragile if netstat's output ever changes. That said, even if you're piping objects around, if the output of the thing putting out objects changes, your tool is always at risk of breaking. Yes objects breaking because field order changed is less likely, but what happens if `Get-NetTCPConnection` stops including a `State` field? I guess `Where-Object` might validate it found such a field, but I could also see it reasonably silently ignoring input that doesn't have the field. Depends on whether it defaults to strict or lenient parsing behaviors.
I know this sounds like nit-picking but bear with me. It's the point I'm trying to make:
1. Your script outputs an error when run, because 'bash' itself doesn't have netstat as a built-in. That's an external command. In my WSL2, I had to install it. You can't declaratively require this up-front, you script has to have an explicit check... or it'll just fail half-way through. Or do nothing. Or who knows!?
Not that that's needed, because Get-NetTcpConnection is a built-in command.
3. Your script is very bravely trying to parse output that includes many different protocols, including: tcp, tcp6, udp, udp6, and unix domain sockets. I'm seeing random junk like 'ACC' turn up after the first awk step.
4. Speaking of which, the task was to get tcp connections, not udp, but I'll let this one slide because it's an easy fix.
5. Now imagine putting your script side-by-side with the PowerShell script, and giving it to people to read.
What are the chances that some random person could figure out what each one does?
Would they be able to modify the functionality successfully?
Note that you had to use 'awk', which is a parser, and then three uses of 'grep' -- a regular expression language, which is also a kind of parsing.
The PowerShell version has no parsing at all. That's why it's just 4 pipeline expressions instead of 9 in your bash example.
Literally in every discussion about PowerShell there's some Linux person who's only ever used bash complaining that PS syntax is "weird" or "hard to read". What are they talking about!? It's half the complexity for the same functionality, reads like English, and doesn't need write-only hieroglyphics for parameters.
Because I didn't see the edited version when I was writing my original reply and its too late, I want to call out another problem that you graciously overlooked that we can call #2 since it touches neatly on your #1 and #3 items and #2 is already missing. The extra junk you see in your wsl after the awk step is probably because the other big *NIX problem with shell scripts is my `netstat` or `grep` or even `echo` might not be the same as yours. I originally wrote it on a mac, and while I was checking the man page for netstat to see how old it was and how likely netstat output would change, it occurred to me that BSD netstat and linux netstat are probably different, so I jumped over and re-wrote on a linux box. Entirely possible your version is different from mine.
Heck, just checking between Bash, ZSH and Fish on my local machine here and Bash and ZSH's version is from 2003 and provides a single `-n` option and declares POSIX compliance, but explicitly calls out that `sh`'s version doesn't accept the `-n` argument. Fish provides their own implementation that accepts arguments `[nsEe]` from 2023. Every day I consider it a miracle that most of the wider internet and linux/unix world that underlies so much of it works at all, let alone reliably enough to have multiple nines of uptime. "Worse is better" writ large I guess.
I was worried that my toy problem wasn’t complex enough to reveal these issues!
I had an experience recently trying to deploy an agent on a dozen different Linux distros.
I had the lightbulb moment that the only way to run IT in an org using exactly one distro. Ideally one version, two at the most during transitions. Linux is a kernel, not an operating system. There are many Linux operating systems that are only superficially “the same”.
At least here, we can agree. If I ran a business and allowed employees to run Linux (which is reasonable, IMO), the last thing I want is someone's riced-out Gentoo with an unpatched security exploit onto the VPN.
Sure, I'm not arguing that having a set of well defined outputs and passing objects around wouldn't be better. You're talking to someone that often laments that SmallTalk was not more popular. But you'd need to get the entire OSS community to land on a single object representation and then get them to independently change all the tools to start outputting the object version. PowerShell and Microsoft have the advantage in this case of being able to dictate that outcome. In the linux world, dictating outcomes tends to get you systemd levels of controversy and anger.
Technically speaking though, there's no reason you couldn't do that all in bash. It's not the shell that's the problem here (at least, to an extent, passing via text I guess is partly a shell problem). There's no reason you couldn't have an application objNetStat that exported JSON objects and another app that filtered those objects, and another that could group them and another that could sort them. Realistically "Sort-Object Count -Descending -Top 10" could be a fancy alias for "sort | uniq -c | sort -r | head -n 10". And if we're not counting flags and arguments to a function as complexity, if we had our hypothetical objNetstat, I can do the whole thing in one step:
One step, single parsing to read in the json. Obviously I'm being a little ridiculous here, and I'm not entirely sure jq's DSL is better than a long shell pipe. But the point is that linux and linux shells could do this if anyone cared enough to write it, and some shells like fish have taken some baby steps towards making shells take advantage of modern compute. Why RedHat or one of the many BSDs hasn't is anyone's guess. My two big bets on it are the aversion to "monolithic" tools (see also systemd), and ironically not breaking old scripts / current systems. The fish shell is great, and I've built a couple shell scripts in it for my own use, and I can't share them with my co-workers who are on Bash/ZSH because fish scripts aren't Bash compatible. Likewise I have to translate anyone's bash scripts into fish if I want to take advantage of any fish features. So even though fish might be better, I'm not going to convince all my co-workers to jump over at once, and without critical mass, we'll be stuck with bash pipelines and python scripts for anything complex.
Half way through your second paragraph I just knew you'd be reaching for 'jq'!
All joking aside, that's not a bad solution to the underlying problem. Fundamentally, unstructured data in shell pipelines is much of the issue, and JSON can be used to provide that structure. I'm seeing more and more tools emit or accept JSON. If one can pinch their nose and ignore the performance overhead of repeatedly generating and parsing JSON, it's a workable solution.
Years ago, a project idea I was really interested for a while was to try to write a shell in Rust that works more like PowerShell.
Where I got stuck was the fundamentals: PowerShell heavily leans on the managed virtual machine and the shared memory space and typed objects that enables.
Languages like C, C++, and Rust don't really have direct equivalents of this and would have to emulate it, quite literally. At that point you have none of the benefits of Rust and all of the downsides. May as well just use pwsh and be done with it!
Since then I've noticed JSON filling this role of "object exchange" between distinct processes that may not even be written in the same programming language.
I feel like this is going to be a bit like UTF-8 in Linux. Back in the early 2000s, Windows had proper Unicode support with UTF-16, and Linux had only codepages on top of ASCII. Instead of catching up by changing over to UTF-16, Linux adopted UTF-8 which in some ways gave it better Unicode support than Windows. I suspect JSON in the shell will be the same. Eventually there will be a Linux shell where everything is always JSON and it will work just like PowerShell, except it'll support multiple processes in multiple languages and hence leapfrog Windows.
>Years ago, a project idea I was really interested for a while was to try to write a shell in Rust that works more like PowerShell.
So this whole conversation and a different one about python and it's behavior around `exit` vs `exit()` sent me down a rabbit hole of seeing if I could make the python interpreter have a "shell like" dsl for piping around data. It turns out you sort of can. I don't think you can defeat the REPL and make a function call like `echo "foo" "bar" "baz", but you can make it do this:
>Years ago, a project idea I was really interested for a while was to try to write a shell in Rust that works more like PowerShell.
>Where I got stuck was the fundamentals: PowerShell heavily leans on the managed virtual machine and the shared memory space and typed objects that enables.
Hmmm, if you need that sort of shared memory access throughout the shell, you probably need a language like python (or maybe better Lisp) with a REPL and the ability/intent to self modify while running. Of course, every time you have to farm out because you don't have a re-written replacement internally to the shell app, you'd still parsing strings, but at least you could write a huge part of data processing in the shell language and keep it in house. Years ago I worked for a company that was using Microsoft's TFVC services (before it was azure devops or whatever they call it now) and wrote a frontend to their REST API in python that we could call from various other scripts and not be parsing JSON everywhere. Where this is relevant to the discussion is that one of the things I built in (in part to help with debugging when things went sideways) was an ability to drop into the python REPL mid-program run to poke around at objects and modify them or the various REST calls at will. With well defined functions and well defined objects , the interactive mode was effectively a shell for TFVC and the things we were using.
Though all of that said, even if one did that, they would still either need to solve the "object model" problem for disparate linux tools, or worse commit to writing (or convincing other people to write and maintain) versions of all sorts of various tools in the chosen language to replace the ones the shell isn't farming out to anymore. Its one thing to chose to write a shell, it's something else entirely to choose to re-write the gnu userland tools (and add tools too)
> PowerShell has up-front required prerequisites that you can declare
Anyone who's written more than a few scripts for others will have learned to do something like this at the start:
declare -a reqs
reqs+=(foo bar baz)
missing=0
for r in "${reqs[@]}"; do
if (! command -v "$r" &>/dev/null); then
echo "${r} is required, please install it"
missing=1
fi
done
if [ $missing -gt 0 ]; then
exit 1
fi
> Your script is very bravely trying to parse output that includes many different protocols, including: tcp, tcp6, udp, udp6, and unix domain sockets
They probably didn't know you could specify a type. Mine only displays TCP4.
> Now imagine putting your script side-by-side with the PowerShell script, and giving it to people to read.
I'm gonna gatekeep here. If you don't know what that script would do, you have no business administering Linux for pay. I'm not saying that in a "GTFO noob" way, but in a "maybe you should know how to use your job's tools before people depend on you to do so." None of that script is using exotic syntax.
> Note that you had to use 'awk', which is a parser, and then three uses of 'grep' -- a regular expression language, which is also a kind of parsing.
They _chose_ to. You _can_ do it all with awk (see my example in a separate post).
> Literally in every discussion about PowerShell there's some Linux person who's only ever used bash complaining that PS syntax is "weird" or "hard to read". What are they talking about!? It's half the complexity for the same functionality, reads like English, and doesn't need write-only hieroglyphics for parameters.
And yet somehow, bash and its kin continue to absolutely dominate usage.
There is a reason that tools like ripgrep [0] are beloved and readily accepted: they don't require much in the way of learning new syntax; they just do the same job, but faster. You can load your local machine up with all kinds of newer, friendlier tools like fd [1], fzf[2], etc. – I definitely love fzf to death. But you'd better know how to get along without them, because when you're ssh'd onto a server, or god forbid, exec'd into a container built with who-knows-what, you won't have them.
Actually, that last point sparked a memory: what do you do when you're trying to debug a container and it doesn't have things like `ps` available? You iterate through the `/proc` filesystem, because _everything is a file._ THAT is why the *nix way exists, is wonderful, and is unlikely to ever change. There is always a way to get the information you need, even if it's more painful.
> What's the point of the shell, if not to manage your databases, your REST APIs, files, and mail? Is it something you use for playing games on, or just for fun?
It's for communicating with the operating system, launching commands and viewing their output. And some scripting for repetitive workflows. If I'd want a full programming environment, I'd take a Lisp machine or Smalltalk (a programmable programming environment).
Any other systems that want to be interactive should have their own REPL.
> This kind of thing is a challenge with UNIX tools, and then is fragile forever. Any change to the output format of netstat breaks scripts in fun and create ways. Silently. In production.
The thing is if you're using this kind of scripts in production, then not testing it after updating the system, that's on you. In your story, they'd be better of writing a proper program. IMO, scripts are automating workflows (human guided), not for fire and forget process. Bash and the others deals in text because that's all we can see and write. Objects are for programming languages.
> In your story, they'd be better of writing a proper program.
Sure, on Linux, where your only common options bash or "software".
On Windows, with PowerShell, I can don't have to write a software program. I can write a script that reads like a hypothetical C# Shell would, but oriented towards interactive shells.
(Note that there is a CS-Script, but it's a different thing intended for different use-cases.)
I'm kind of with the OP that it would be nice if linux shells started expanding a bit. I think the addition of the `/dev/tcp` virtual networking files was an improvement, even if it now means my shell has to talk TCP and UDP instead of relying on nc to do that
> What's the point of the shell, if not to manage your databases, your REST APIs, files, and mail? Is it something you use for playing games on, or just for fun?
To call other programs to do those things. Why on earth would I want my shell to directly manage any of those things?
I think you're forgetting something: *nix tools are built by a community, PowerShell is built by a company. Much like Apple, Microsoft can insist on and guarantee that their internal API is consistent. *nix tooling cannot (nor would it ever try to) do the same.
> It means that "ps" has a built-in sort command, as do most other UNIX standard utilities, but they all do it differently.
I haven't done an exhaustive search, but I doubt that most *nix tooling has a built-in sort. Generally speaking, they're built on the assumption that you'll pipe output as necessary to other tools.
> This also means that you just "need to know" how to convince each and every command to output machine-readable formats that other tools on the pipeline can pick up safely.
No, you don't, because plaintext output is the lingua franca of *nix tooling. If you build a tool intended for public consumption and it _doesn't_ output in plaintext by default, you're doing it wrong.
Here's a one-liner with GNU awk; you can elide the first `printf` if you don't want headers. Similarly, you can change the output formatting however you want. Or, you could skip that altogether, and pipe the output to `column -t` to let it handle alignment.
netstat -nA inet | gawk -F':' 'NR > 2 { split($2, a, / /); pc[a[1]]++ } END { printf "%-5s %s\n", "PORT", "COUNT"; PROCINFO["sorted_in"]="@val_num_desc"; c=0; for(i in pc) if (c++ < 10) { printf "%-5s %-5s\n", i, pc[i] } }'
Obviously this is not as immediately straight-forward for the specific task, though if you already know awk, it kind of is:
Set the field separator to `:`
Skip the first two lines (because they're informational headers)
Split the 2nd column on space to skip the foreign IP
Store that result in variable `a`
Create and increment array `pc` keyed on the port
When done, do the following
Print a header
Sort numerically, descending
Initialize a counter at 0
For every element in the pc array, until count hits 10, print the value and key
You can also chain together various `grep`, `sort`, and `uniq` calls as a sibling comment did. And if your distro doesn't include GNU awk, then you probably _would_ have to do this.
You may look at this and scoff, but really, what is the difference? With yours, I have to learn a bunch of commands, predicates, options, and syntax. With mine, I have to... learn a bunch of commands, predicates, options, and syntax (or just awk ;-)).
> This kind of thing is a challenge with UNIX tools
It's only a challenge if you don't know how to use the tools.
> Any change to the output format of netstat breaks scripts in fun and create ways
The last release of `netstat` was in 2014. *nix tools aren't like JavaScript land; they tend to be extremely stable. Even if they _do_ get releases, if you're using a safe distro in prod (i.e. Debian, RedHat), you're not going to get a surprise update. Finally, the authors and maintainers of such tools are painfully aware that tons of scripts around the world depend on them being consistent, and as such, are highly unlikely to break that.
> Silently. In production.
If you aren't thoroughly testing and validating changes in prod, that's not the fault of the tooling.
You just said everything “is a file” and then dismissed out of hand a system that takes that abstraction even further!
PowerShell is more UNIX than UNIX!