Hacker Newsnew | past | comments | ask | show | jobs | submit | jcparkyn's commentslogin

> The whole point of names is that they aren't descriptive

I actually agree with this, but that's exactly the opposite of what TFA is arguing.


One of my previous side projects used this idea in the extreme: It's a two-player online word game (scrabble with some twists) but all the state is stored in the URL so it doesn't need a backend.

https://scrobburl.com/ https://github.com/Jcparkyn/scrobburl


Not to mention tracepoints (logging breakpoints), which are functionally the same as printf but don't require compiling/restarting.


> I think if you [...] and have some luck you will succeed no matter what

Ignoring the politics of it, this wording is pretty funny to me.


To be fair, this is the first thing that comes up when I google search for "DuPont connector" (after some sponsored shopping links).


Also on DDG.


Genuine question: Is it illegal to advertise for a product that doesn't exist, can't be bought, and you have no intention of taking money for?


Depending on which jurisdiction you are in, absolutely.

It could constitute bait advertising. The practice of using fake advertisements to get customers into your store for the purpose of trying to sell them something else.


It's a somewhat well-known startup "hack" https://learningloop.io/plays/spoof-landing-pages


Possibly, but per [1] it would come down to: "(1) whether the practice injures consumers; (2) whether it violates established public policy; (3) whether it is unethical or unscrupulous."

e.g. it might be deemed "unfair" if you're promoting vaporware to try to slow down sales of a competitor's actual product, and someone could prove consumers suffered "substantial" injury by holding off on purchasing something useful while waiting indefinitely for something else that you had no intention to actually sell. (Proving this would be hard, but I'm sure it's possible.) On the other hand, advertising a joke product—where a reasonable consumer is unharmed because they should have known it's a joke—is almost certainly legal.

See, generally: https://www.ftc.gov/business-guidance/resources/advertising-...

[1] https://www.ftc.gov/legal-library/browse/ftc-policy-statemen...


> Conditionals are common enough that they can justify the indulgence

I think there's another much more important factor that distinguishes conditionals from most other ternary operations (clamp, mix, FMA, etc): The "conditional evaluation" part, which is what makes it hard to replicate with a regular function call.


Agreed, and then there's the time of check/time of use issue with creating a user. Probably not a vulnerability if userService is designed well, but still a bit dubious.


You’re right, that’s potentially a correctness issue as well. Ideally we’d have a creation interface that would also perform the pre-existence check atomically, so there would be no need for the separate check in advance and the potential race condition would not exist. This does depend on the user service providing a convenient interface like that, though, and alas we aren’t always that lucky.


> squash an output into a range

This isn't the primary purpose of the activation function, and in fact it's not even necessary. For example see ReLU (probably the most common activation function), leaky ReLU, or for a sillier example: https://youtu.be/Ae9EKCyI1xU?si=KgjhMrOsFEVo2yCe


You can change the subject by bringing up as many different NN architectures, Activation Functions, etc. as you want. I'm telling you the basic NN Perceptron design (what everyone means when they refer to Perceptrons in general), has something like a `tanh` and not only is it's PRIMARY function to squash a number, that's it's ONLY function.


You need a non-linear activation function for the universal approximation theorem to hold. Otherwise, as others have said the model just collapses to a single layer.

Technically the output is still what a statistician would call “linear in the parameters”, but due to the universal approximation theorem it can approximate any non-linear function.

https://stats.stackexchange.com/questions/275358/why-is-incr...


As you can see in what I just posted about an inch below this, my point is that the process of training a NN does not involve adjusting any parameter to any non-linear functions. What goes into an activation function is a pure sum of linear multiplications and an add, but there's no "tunable" parameter (i.e. adjusted during training) that's fed into the activation function.


Learnable parameters on activations do exist, look up parametric activation functions.


If course they do exist. A parameterized activation function is the most obvious thing to try in NN design, and has certainly been invented/studied by 1000s of researchers.


How was that person derailing the convo? Nothing says an activation function has to "squash" a number to be in some range. Leaky ReLUs for instance do `f(x) = x if x > 0 else ax` (for some coefficient `a != 0`), that doesn't squash `x` to be in any range (unless you want to be peculiar about your precise definition of what it means to squash a number). The function takes a real in `[-inf, inf]` and produces a number in `[-inf, inf]`.

> Sure there's a squashing function on the output to keep it in a range from 0 to 1 but that's done BECAUSE we're just adding up stuff.

It's not because you're "adding up stuff", there is specific mathematical or statistical reason why it is used. For neural networks it's there to stop your multi layer network collapsing to a single layer one (i.e. a linear algebra reason). You can choose whatever function you want, for hidden layers tanh generally isn't used anymore, it's usually some variant of a ReLU. In fact Leaky ReLUs are very commonly used so OP isn't changing the subject.

If you define a "perceptron" (`g(Wx+b)` and `W` is a `Px1` matrix) and train it as a logistic regression model then you want `g` to be sigmoid. Its purpose is to ensure that the output can be interpreted as a probability (given that use the correct statistical loss), which means squashing the number. The inverse isn't true, if I take random numbers from the internet and squash them to `[0,1]` I don't go call them probabilities.

> and not only is it's PRIMARY function to squash a number, that's it's ONLY function.

Squashing the number isn't the reason, it's the side effect. And even then, I just said that not all activation functions squash numbers.

> All the training does is adjust linear weights tho, like I said.

Not sure what your point is. What is a "linear weight"?

We call layers of the form `g(Wx+b)` "linear" layers but that's an abused term, if g() is non-linear then the output is not linear. Who cares if the inner term `Wx + b` is linear? With enough of these layers you can approximate fairly complicated functions. If you're arguing as to whether there is a better fundamental building block then that is another discussion.


> What is a "linear weight"?

In the context of discussing linearity v.s. non-linearity adding the word "linear" in front of "weight" is more clear, which is what my top level post on this thread was all about too.

It's astounding to me (and everyone else who's being honest) that LLMs can accomplish what they do when it's only linear "factors" (i.e. weights) that are all that's required to be adjusted during training, to achieve genuine reasoning. During training we're not [normally] adjusting any parameters or weights on any non-linear functions. I include the caveat "normally", because I'm speaking of the basic Perceptron NN using a squashing-type activation function.


> It's astounding to me (and everyone else who's being honest) that LLMs can accomplish what they do when it's only linear "factors" (i.e. weights) that are all that's required to be adjusted during training, to achieve genuine reasoning.

When such basic perceptrons are scaled enormously, it becomes less surprising that they can achieve some level of 'genuine reasoning' (e.g., accurate next-word prediction), since the goal with such networks at the end of the day is just function approximation. What is more surprising to me is how we found ways to train such models i.e., advances in hardware accelerators, combined with massive data, which are factors just as significant in my opinion.


Yeah, no one is surprised that LLMs do what they're trained to do: predict tokens. The surprise comes from the fact that merely training to predict tokens ends up with model weights that generate emergent reasoning.

If you want to say reasoning and token prediction are just the same thing at scale you can say that, but I don't fall into that camp. I think there's MUCH more to learn, and indeed a new field of math or even physics that we haven't even discovered yet. Like a step change in mathematical understanding analogous to the invention of Calculus.


> sorts and prints the latest results at a regular interval

Slightly more complicated than that, because you can only sort and print elements once you have _all_ the elements that came before them. Once you add that layer you've got quite a lot more code (and potential for mistakes) than the promises version.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: