It's a bandaid on a wider problem: the design of Unix shell is bonkers and the w...

vbezhenar · on Oct 29, 2024

The main problem is using text as a common format between different applications.

First: text is not well defined. Is it ASCII? Is it UTF-8? Some programs can spew UTF-32 with proper locale configured, it's a mess.

Second: encoding and decoding of objects to text is not defined at all. Those problems with filenames is just one example. Using newline as a separator is a natural thing that is easy to implement, yet it is wrong.

In my opinion two things should be done:

1. Standardise on UTF-8. No other encodings allowed.

2. Standardise on JSON. It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

So any utility must read and write JSON objects with some standard env set. And shells can be developed with better syntax to deal with JSON. This way you can write something like

`ps aux | while read row; do echo ${row.user} ${row.pid}; done`

poincaredisk · on Oct 29, 2024

>It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

Please don't use that underdefined joke of a spec. Define "PosixJson" and use that instead. Right now it's not even clear what the result of parsing {"a": 1234678901234567890} is. Is this a parse error? A bigint? A float/double? Quiet wraparound? Something else? I've seen all these behaviors in real world JSON implementations across different languages.

aloisklink · on Oct 29, 2024

POSIX does actually define what a "text file" is, but the definition is a bit unusual:

See https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

> 3.387 Text File

> A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.

So, if you have some non-printable characters like BEL/␇/ASCII 0x07, that's still a text file.

(and I believe what bytes count as a valid character depend on your `LC_CTYPE`).

But the moment you have a line longer than {LINE_MAX} bytes (which can depend on which POSIX environment you have), suddenly your text file is now a binary file.

WJW · on Oct 29, 2024

Kind of a weird definition indeed. One edge case: the definition states the file must contain characters, so presumably zero length files are out. But then how could you have zero lines?

Ukv · on Oct 29, 2024

POSIX defines a line as:

> 3.185 Line

> A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

So a file with some characters but no trailing newline is reported by `wc -l` as having zero lines.

rascul · on Oct 29, 2024

An empty file is not hard to make. It's just a matter of creating the file and not writing to it.

WJW · on Oct 29, 2024

Yes obviously. But the POSIX specification for a "text file" as above is that it contains characters, which an empty file by definition does not. So an empty file cannot be a text file if you read that specification strictly, and therefore you cannot have zero lines in a text file. As soon as you have a single character there is at least one line, and the amount of lines can only stay the same or grow from there.

The definition should read "one or more lines" instead or (probably better) specify that a text file contains "zero or more characters".

rascul · on Oct 29, 2024

Ahh I see what you're saying. I misunderstood at first.

arghwhat · on Oct 29, 2024

What cursed madness have you hit that spits out UTF-32 under normal conditions?! That can only be a bug - UTF-32/UCS-4 never saw external use, and has only ever been used for in-memory fixed-width character representation, e.g. runes in Go.

You never have to worry about whether you're dealing with ASCII vs. UTF-8, but rather if you're dealing with UTF-8 vs. ISO-8859-1, or worse, Shift JIS or similar.

vbezhenar · on Oct 29, 2024

I think that I hit that with Java:

    % java -Dfile.encoding=UTF-32 Test | hexdump -C
    00000000  00 00 00 48 00 00 00 65  00 00 00 6c 00 00 00 6c  |...H...e...l...l|
    00000010  00 00 00 6f 00 00 00 2c  00 00 00 20 00 00 00 77  |...o...,... ...w|
    00000020  00 00 00 6f 00 00 00 72  00 00 00 6c 00 00 00 64  |...o...r...l...d|
    00000030  00 00 00 0a                                       |....|
    00000034

From quick googling it seems that glibc does not support it, so it should not happen.

Netch · on Nov 2, 2024

> it seems that glibc does not support it

`iconv` does, and this is enough in common. Among with tons of eerie EBCDIC/whatever...

Netch · on Nov 2, 2024

> That can only be a bug - UTF-32/UCS-4 never saw external use

I regularly use `iconv -t utf-32be | hd` to look what a bizarre sequence is denoting yet another weird symbol like an itchy hedgehog.

And what is a real reason to disallow this?

ezoe · on Oct 29, 2024

Don't even assume UTF-something is the only character encoding. There are so many existing character encodings before Unicode. It's still widely used.

oneeyedpigeon · on Oct 29, 2024

I think a lot of tools should support json as well as plain text. Probably the latter by default, and the former with a "-o json" or similar option. I'm fine with wc giving me `5`, I'd prefer that to `{ "characters": 5 }`.

anal_reactor · on Oct 29, 2024

True, but this would be immensely difficult to pull off, because how do you convince other people to write programs that produce actual working JSON?

nly · on Oct 29, 2024

The primary purpose of command line program output is to convey information to a human, not to other programs.

Command line scripting is supposed to be adhoc and hack.

consteval · on Oct 29, 2024

There are exchange formats that are well-defined enough to be useful to many computers while also being readable enough to be traversed by human eyes. There's no reason to everything ad-hoc, you don't get much by that. You also control the shell itself - there's no reason you can't display object representations in a pretty way.

mdavid626 · on Oct 29, 2024

I disagree that it supposed to be adhoc and hack. Look at PowerShell.

anthk · on Oct 29, 2024

That under limited OSes such as DOS. Under Unix, piping has been the philosophy.

matrss · on Oct 29, 2024

JSON itself is bad for a streaming interface, as is common with CLI applications. You can't easily consume a JSON array without first reading it in its entirety. JSONL would be a better fit.

But then, how well would it work for ad-hoc usage, which is probably one of the biggest uses of shells?

pif · on Oct 29, 2024

> The main problem is using text as a common format between different applications.

If you can't get the immensity of the cleverness of Unix foundations, you should not talk about them.

That idea is what made it possible for you to type that sentence in the first place.

akira2501 · on Oct 29, 2024

> I haven't seen any other tool ever have so many pitfalls.

I haven't seen any other tool with so much general utility and availability.

> to loop over a string array and print its contents

Is incredibly easy in bash and bash like shells. As highlighted the issue is that tools like 'ls' don't create "a string array." They create one giant string that has to be parsed. The rules in the shell are different than in other languages but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

This is a fine tradeoff. As evidenced by it's wide usage and lack of convincing replacements.

anal_reactor · on Oct 29, 2024

> I haven't seen any other tool with so much general utility and availability.

> availability

That's the real reason why we use Unix shell. It's not good, but it's available. Like a cheap hooker.

> but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

"It mostly works if you're careful" doesn't sound very convincing to me.

stephenr · on Oct 29, 2024

> but it's available. Like a cheap hooker.

Username checks out.

akira2501 · on Oct 29, 2024

> "It mostly works if you're careful" doesn't sound very convincing to me.

Would you rather write your own parser?

blueflow · on Oct 29, 2024

Someone needs to come up with a interactive shell first, one that is comparable in usability. Then we can think about replacing the unix shell.

I tried both python and lua interactively, but they are a pain when it comes to handling files. You have to type much more to get the same things done.

anal_reactor · on Oct 29, 2024

The bigger issue is the sheer momentum of Unix shell. Even if you come up with an alternative that is better by every objectively measurable metric, it's still going to be a monumental task to have it packages with commonly used distros. Kinda like the "why can't the US switch to the metric system" problem.

blueflow · on Oct 29, 2024

People already use different shells, mksh, fish, and so on. With fish there is a non-posix shell in wide use.

oguz-ismail · on Oct 29, 2024

>wide use

Five people around the globe isn't wide use.

blueflow · on Oct 29, 2024

I'm sure you might get more than 5 people on HN replying to you that they are using fish right now. Say something discrediting about fish and they show up.

fragmede · on Oct 29, 2024

Heh, reminds me of how to get help with Linux back in the day. If you directly asked for help, you'd be told to RTFM. If you stayed confidently that Windows could do something and that Linux sucks because it can't, you'd get users tripping over themselves with details and instructions,'just to prove you wrong.

Human psychology is fascinating!

azalemeth · on Oct 29, 2024

There's a direct cost in money, time and lives that has come from the US's adherence to their US Customary Units (which are often different to the old imperial units). People have literally died because of the confusion caused by having multiple systems of units in common use with ambiguous names (degrees, gallons, etc). Each year industry worldwide spends an enormous amount of money indirectly precisely because of this problem and it's still incredibly unlikely to be fixed within my lifetime.

Bash-alternatives that are not completely compatible frankly just don't have a chance.

Netch · on Nov 2, 2024

OK let them add an explicit check to standard tools, and/or to open(), mkdir(), etc. with O_PORTABLECHARS. And an environment option to disable this check.

Why they force the restriction at syscall level?

stephenr · on Oct 29, 2024

If it isn't distributed out of the box with every nix-like OS, it inherently isn't* “better by every objectively measurable metric" - distribution of a common, stable standard is a huge benefit in and of itself.

blueflow · on Oct 29, 2024

> distributed out of the box with every nix-like OS,

Python and lua are pretty close to that.

stephenr · on Oct 29, 2024

> Python and lua are pretty close to that.

Python maybe often installed by default but it's definitely not an essential/required package "out of the box" on every install. Also, in a thread where one topic is how POSIX shell handles whitespace in filenames, it's hilarious (not in a good way) that someone suggests a language that handles whitespace the wrong way in it's own code. Yes, significant whitespace is objectively wrong.

What OS/distro is Lua included on out of the box? That doesn't mean "available in a package". I mean literally included in every single install and cannot reasonably be omitted?

Regardless of the availability, the parent comment says

> better by every objectively measurable metric

Neither Python nor Lua are "better" than shell, at the types of things shell is commonly used for - they're objectively worse.

blueflow · on Oct 29, 2024

Lua gets onto every other Linux distro as dependency of some base system component. For example, rpm or pipewire depend on lua. Ubuntu and Debian ship with pipewire per default.

You should use the word "objectively" less.

stephenr · on Oct 30, 2024

> Lua gets onto every other Linux distro

Just FYI, there are UNIX-like, POSIX compatible systems that are not a Linux distro.

> rpm or pipewire depend on lua. Ubuntu and Debian ship with pipewire per default.

Pipewire? Do you mean this? https://packages.debian.org/bookworm/pipewire

That isn't even close to "installed on every system". Best I can tell from the reverse dependencies, it's required for some Gnome Remote Desktop tool, and best I can tell, it doesn't rely on Lua anyway (at least on Debian).

> You should use the word "objectively" less.

I specifically used the word objectively, because the original comment that I replied to, said this:

> better by every objectively measurable metric

blueflow · on Oct 30, 2024

pipewire -> wireplumber -> libwireplumber -> liblua

Pipewire being the Pulseaudio replacement from Redhat.

Bookworm is probably the last Debian without :P

stephenr · on Oct 30, 2024

> Pipewire being the Pulseaudio replacement from Redhat.

Right, so it's a desktop package that ultimately will be installed on about 1% of all Linux machines because the vast majority are servers without a desktop environment.

Also worth pointing out: liblua on Debian at least, is the shared library. It's not the binary to execute standalone Lua scripts.

blueflow · on Oct 30, 2024

This this like a game where you come up with bullshit and i have to come up with the facts to rectify it? RHEL/centOS have more than 1% market share alone.

Check your own installs and tell me if you find some that dont have liblua or libluajit.

For the library thing: I said "Python and lua are pretty close to that." earlier. I did not say that they have interpreters ready everywhere. But if the language core is already installed on a large fraction of machines, then adding the interpreter is not a big cost.

stephenr · on Oct 30, 2024

> already installed on a large fraction of machines

So far you've presented no evidence of this though, just that it's used by a new desktop-focused package.

All linux desktops over the last 30 years is not even a "large fraction" of total Linux installs, much less the ones that have already migrated to this new audio system.

> adding the interpreter is not a big cost

It's nothing to do with cost. It's about "how do I know this will absolutely 100% run on any POSIX machine I throw it on without any extra steps".

Remember the argument here is about something that is claimed to be "objectively better" than Shell. The ubiquitous nature of POSIX shell is a huge barrier for any possible competitor, and saying "well you just need to install it" just defeats the purpose. You might as well write it in fucking java and say "well you just need to install a JVM".

Edit to Add: a good number of systems I manage do have liblua installed... because HAProxy requires it, and those systems have HAProxy installed. Not because it was installed as part of the base OS or even a default group of packages.

Incidentally, HAProxy and thus liblua were installed on those systems by infrastructure management that's implemented as shell script. So what kind of chicken and egg argument do we need to have here about how exactly I can run a Lua script to install Lua?

blueflow · on Oct 30, 2024

> a good number of systems I manage do have liblua installed

/thread

consteval · on Oct 29, 2024

Even outside of distribution, python and lua aren't objectively better. For starters, they're much more verbose.

blueflow · on Oct 29, 2024

I just said that, scroll up.

throw16180339 · on Oct 29, 2024

I certainly have my complaints about Powershell, but it's got pretty good coverage, decent documentation, and cross platform support.

felixgallo · on Oct 29, 2024

if it weren't so irregular, inconsistent, spotty and tasteless, it'd be a great option.

nly · on Oct 29, 2024

Oil shell?

https://www.oilshell.org/

Compatible with most bash scripts

throwaway19972 · on Oct 29, 2024

> the design of Unix shell is bonkers

Compared to what?

mdavid626 · on Oct 29, 2024

Powershell?

poincaredisk · on Oct 29, 2024

PowerShell designer could learn from decades of programming language progress and especially shell usage. They could improve many aspects indeed. This doesn't mean that the original design is "bonkers", only that it's not perfect.

pjmlp · on Oct 30, 2024

The way Powershell works is largely based on what the computing world was doing with shells outside Bell Labs, at IBM, Xerox, and others places, exactly at similar timeframe as UNIX was happening.

mdavid626 · on Oct 31, 2024

Can you give examples of what should be improved in PowerShell?

oguz-ismail · on Oct 29, 2024

Verbosity is a huge problem there

consteval · on Oct 29, 2024

Modern programming language designers have a bad relationship with verbosity. I don't know why they do this.

It's a lang for an interactive shell, typing literally translates to developer speed. I understand the want for clarity and maybe that's nice in large scripts, but the main goal is to be a shell. So, optimize for that. Also, you probably shouldn't be using powershell for large scripts anyway.

The only recent lang I've seen that has a handle on this is Rust. You can tell they put a lot of thought into having keywords be as short as possible while still being descriptive.

ggm · on Oct 29, 2024

FoundTheCamelCaseConvert.

My God next you will say getopt() --longform is the bestest

throw16180339 · on Oct 29, 2024

It's been years since I used Powershell, but IIRC there are shortcuts for the common commands, e.g. cat, ls, mv, rm, and such DTRT.

Diti · on Oct 29, 2024

Those aliases are, I believe, only defined on Windows PowerShell (the closed-source version 5; not PowerShell 7). I wish those default aliases you mentioned weren’t a thing. Especially `curl` (people should use `iwr` instead), which is an alias of `Invoke-WebRequest`, because it makes the `curl.exe` shipped with Windows nearly undiscoverable.

dailykoder · on Oct 29, 2024

Works on my machine!

zelphirkalt · on Oct 29, 2024

This should not be as downvoted as it is. In a way shell is broken. The brokenness is in that it requires each command to serialize and deserialize again, considering all the weird things that can happen with the "all is a string" kind of approach, instead of having a proper data interchange format or even sending objects to next steps in the pipeline. This behavior is what necessitates even thinking about the changes listed in the post. We wouldn't even have that problem, if the design of shell was better thought out. Now we are dealing with decades of legacy built on these shaky foundations. I hate to admit it, but seems at least this aspect Powershell got right, whatever one may think about the rest of it.

chasil · on Oct 29, 2024

On my rhel7 system, the Debian dash shell is this large:

   $ ll /bin/dash
  -rwxr-xr-x. 1 root root 113536 Nov  5  2018 /bin/dash

I happen to have an old powershell installed:

  $ rpm -qi powershell | grep Size
  Size        : 126588370

A strict POSIX shell is always going to be vastly smaller, for many reasons.

I would prefer that the POSIX shell was an LR-parsed language, but you can't have everything.

enriquto · on Oct 29, 2024

> loop over a string array

Dear anal_reactor, what is a "string array"? I have used unix shells since nearly 30 years and never heard about them. And I consider myself a script-fu master!

There are two array-like constructions in the shell: list of words (separated by spaces) and list of lines (separated by newlines). Both cases are implemented as a single string, and the shell makes it trivial to iterate through its components.

ManBeardPc · on Oct 29, 2024

That is exactly the problem many people have with it. Encoding „arrays“ this way is foreign to everyone who comes from „normal“ programming languages. Both variants lead to problems because either character can occur in elements, worst case scenario they contain both at the same time. I can see why this leads to confusion and bugs.

skydhash · on Oct 29, 2024

It’s like people saying they won’t learn French because it has a different grammatical structure. There’s no “normal” natural language. If you’re used to the C-like syntax, learning C-like language will be easy. But that’s not an argument to say Lisp is confusing.

ManBeardPc · on Oct 29, 2024

That's why I put normal in quotes. There is however more to it than having a different grammatical structure: It works different from many commonly used languages that have actual arrays/lists where elements can contain anything the type allows. If you come from any of the common modern programming languages (lets say Java, Kotlin, C#, JS/TS, Python, Swift, Go, Rust, etc.) and expect something similar (because many of them are very similar) you will be confused. Using spaces or newlines to encode elements in a single string is just not robust and leads to easy to make mistakes.

skydhash · on Oct 29, 2024

Most of these languages were created long after bash and the other shells. The fact is that shell scripts allows for unquoted strings and quoting is a specific operation, not syntax. Also shell scripts were meant for automations, not for writing general programs. The basic units are commands, arguments, input, output, files,… so the design makes these easy to manipulate.

I’m not saying that we can’t improve, but I’m more in favor of making the tool more apt to solve a problem than making it easier to learn. Because the latter often wants to forego the requirement of understanding the problem space.

ManBeardPc · on Oct 29, 2024

Yes, these are newer. I mainly wanted to make the point that it is confusing if you are new to bash and come from these newer languages with the wrong expectations. The concise nature and many subtle details makes it very difficult for beginners and infrequent users.

Compare this to the newer programming languages where you explicitly call something with speaking names like .Trim(), .EndsWith(), support from compiler and IDE.

In my experience automation and general programs often are the same thing once things get more complicated. Bash scripts usually grow rapidly and are a giant PITA to maintain or refactor. Throw in build systems and helper scripts and you quickly receive a giant pile of spaghetti. Personally I just switch to one the mentioned programming languages once it goes above a simple sequence of operations.

Personally I don't see how to improve it much without becoming a full blown programming language, at which point it would probably make more sense to just release a library for common automation tasks that is also composable. Maybe I'm just not the right target audience.

skydhash · on Oct 29, 2024

The issue with your otherwise good reply is that someone are bringing expectations to an expert tool (programming languages, software, OS) and blidly assuming that everything will work as he thinks it should. Familiarity helps with learning, but shouldn’t replace it. Someone new to bash should probably start with a book.

And for bigger automation projects, there are lots of projects and programming languages that can help.

ManBeardPc · on Oct 29, 2024

I agree it is an issue but it is how many people work and think. Most of the time they are not even wrong. "Hey, I have variables and loops, I know that!".

I would even make the case for expert tools being as unsurprising and familiar as possible unless there is a very good reason for them not to. Also they should be robust against misuse and guide the user towards good practices. There are always beginners, people that rarely need to use it, people that do programming as "just a job" and people that make mistakes because they are distracted, tired or just human. Something like "rm -r /" is a good reminder of that for many people.

Plus there are already a lot of tools required. Reading a book about every tool I have to use would be unpractical for most projects. Maybe more expert tools should just be tools. The same way I can now just use Ubuntu and get a working desktop system including drivers for most common hardware. If I compare that to the past where I installed a Linux distribution and then found out I lack a driver for my network card but I need to download it from the internet... I still can modify my system if I need to, but it's nice that I don't have to. I think we can do similar things with many parts of development and free some capacity for other tasks.