Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I might be misunderstanding something, but it seems the issue isn't really about whether GET can technically carry a body. The deeper concern is that HTTP methods have specific meanings, and mixing those signals can causes confusion and it's nice to have this semantic separation.


If you look at the summary table, the only difference between a GET and a QUERY is that the query can have a body. Other than that, they have the exact same characteristics and purpose, so there isn’t really a need to semantically separate them.


> If you look at the summary table, the only difference between a GET and a QUERY is that the query can have a body. Other than that, they have the exact same characteristics and purpose, so there isn’t really a need to semantically separate them.

This is outright false. RFC9110, which clarifies semantics of things like GET requests, is clear on how GET requests should not have request bodies because it both poses security issues and breaks how the web works.

Just because your homemade HTTP API expects a GET request packs a request body, that does not mean any of the servers it hits between your client and your server will. Think about proxies, API gateways, load balancers, firewalls, etc. Some cloud providers outright strip request bodies from requests.

The internet should not break just because someone didn't bothered to learn how HTTP works. The wise course of action is to create a new method with specific semantics that are clear and actionable without breaking the world.


I think you are confusing what I am saying. I am saying the rest of the semantics are the same, other than the body. If we are creating a new HTTP spec, we could give GET and QUERY be the same because they serve the same semantic purpose.

Obviously you can’t just start putting bodies in GET requests, because it breaks the current spec.


> I think you are confusing what I am saying. I am saying the rest of the semantics are the same, other than the body.

This is fundamentally wrong. The semantics are not the same, regardless of whether there's even a request body or not. The semantics are completely different. One is safe and idempotent, whereas the other is neither safe nor idempotent. The presence of a body doesn't even register as a factor.

> If we are creating a new HTTP spec

Also wrong. It's all about specifying another HTTP verb. HTTP is already specified to support other verbs.

> we could give GET and QUERY be the same because they serve the same semantic purpose.

QUERY is a GET that is designed to pass query parameters in request bodies, which GET explicitly does not support. They are separate operations which just so happen to be both safe and idempotent.


The problem is that they are not enforced. You can already have GET requests that modify state even though they are not supposed to.

What you are actually doing when making a specific kind of request is assuming the actual properties match the documented properties and acting accordingly.

A QUERY seems to be no more than a POST that documents it is idempotent. Furthermore, you should only QUERY a resource that has advertised it is idempotent via the “Accept-Query” header. You might as well name that the “Idempotent-Post” header and then you just issue a POST; exactly the same information and properties were expressed and you do not need a new request type to support it.


HTTP semantics aren’t hard enforced but that only means something if you always control the client, server, and all the middle layers like proxies or CDNs that your traffic flows over.

Your GET request can modify state. But if your request exceeds a browser’s timeout threshold, the browser will retry it. And then you get to spend a few days debugging why a certain notification is always getting sent three times (ask me how I know this)

Similarly, you can put a body on your GET request in curl. But a browser can’t. And if you need to move your server behind cloudflare one day, that body is gonna get dropped.


> A QUERY seems to be no more than a POST that documents it is idempotent.

This is false.

By design QUERY is both safe and idempotent. In the context of HTTP, safe means "read-only", whereas idempotent means that a method does not introduce changes on the server, and thus many requests have the same effect of posting a single request.

The fact that the semantics of an operation is deemed safe has far-reaching implications in the design of any participant of a HTTP request, including firewalls, load balancers, proxies.

> You might as well name that the “Idempotent-Post” header and then you just issue a POST;

This is outright wrong, and completely ignores the semantics of a POST request. POST requests by design are neither safe not idempotent. You do not change that with random request headers.


> POST requests by design are neither safe not idempotent.

Outright wrong. You are allowed to handle POST requests in a safe and idempotent way. In fact, the existing usage of POST to query, the literal impetus for this proposal, has exactly that behavior.

What you are not allowed to do is assume that any arbitrary POST request is safe and idempotent.

Only a endpoint that is documented to support POST and that is documented to be idempotent and safe should be sent a POST with request body and expect a response body and be idempotent and safe.

In comparison, only a endpoint that is documented to support QUERY (which implicitly is documented to be idempotent and safe) should be sent a QUERY with request body and expect a response body and be idempotent and safe.

Do you not see how similar those two are?

In fact, you could trivially make every endpoint that handles QUERY just do the same thing if it gets a POST. So why should the client have to care what request type it sends? Why make a new request type?

Of course we should want to define a standardized QUERY server endpoint behavior and document whether a server endpoint has QUERY behavior on POST; that is valuable, but that should be left distinct from the client protocol and constraints.

The only benefit I can see for the client to distinguish QUERY from POST is that it allows intermediate layers to know the request is expected to be idempotent and safe on the incoming edge. The outgoing edge is not benefitted because the server can easily tag things appropriately.

I guess a cache can use that information to only begin allocating a cache entry if the client attempts a QUERY thus saving it from tentatively recording all requests on the chance that the server will say that the actual request that occurred is safe? And that is assuming that the cache does not coordinate with the server to just pre-know what endpoints are safe and idempotent. In that case all the cache would need to do is parse the location to verify if it matches a safe endpoint. So your benefit of adding a new request type is you get to “unga-bunga is QUERY” instead of doing some parsing and matching overhead.

Seems like a weak benefit relative to the benefits of a simpler and more uniform client protocol.


> Outright wrong. You are allowed to handle POST requests in a safe and idempotent way.

You clearly have no idea about what you are discussing. The definition of HTTP verb semantics clearly defines the concept of safe and idempotent methods. You should seriously learn about the concept before commenting on discussions on semantics. If that's too much work, RFC 9110 clearly and unambiguously states that only the GET, HEAD, OPTIONS, and TRACE methods are defined to be safe. Not POST.

It's completely irrelevant if you somehow implemented a POST method that neither changes the state of the server nor returns the same response whether you invoke it once or many times. All participants in a HTTP request adhere to HTTP's semantics, and thus POST requests are treated as unsafe and non-idempotent by all participants. That affects things like caching, firewalls, even security scanners. The HTTP specs determine how all participants handle requests, not just the weekend warrior stuff someone chose to write in a weekend.


You clearly do not understand protocol/interface design.

The behavior and constraints of a generic interface is a lower bound on the behavior and upper bound on the assumable constraints by all direct participants when handled in general. This is POST in general. This is the weakest usage.

A specialized interface may enhance behavior or tighten constraints further. This is actually using POST to achieve some website-specific end. You can not actually use a website/reason about usage in anything other than a read-only fashion without knowing what the endpoints actually do and when to issue requests at which endpoint.

Indirect or intermediary participants should maintain behavior and constraints at the lower of the level of specialization they are aware of and the participants they are between. If you are designing a generic intermediary, then you do not have access to or knowledge about the specialized interfaces, so are less able to leverage specialized behavior or constraints. In HTTP, various headers serve the purpose of communicating various classes of common interface specializations so someone can write more powerful generic intermediarys. However, you could always write a specific intermediary or handler if you control portions of the chain, which people do all the time.

This applies across all classes of software interfaces and protocols not just websites using HTTP. But I guess HTTP is just such a special snowflake that it is beyond learning good practice from other fields.


I'm confused - wouldn't idempotent POST be PUT? Isn't the proposed QUERY for fetching semantics?


I think the idea is that POST creates a record (and in theory fails if that record already exists). I guess the commenter above is saying that if you inverted that (fail when the record doesn't exist, return the record if it does) it would be similar to QUERY? Not sure if I agree with that, but PUT's return semantics are a bit vague.. it often returns partial or combined records, or just a 200 OK (with or without a response body), or 204 No Content for unchanged records (with or without a response body)

It's clear what POST returns, so... perhaps QUERY is more similar to it in that sense?


Whatever the original intent was, POST definitely does not return a new record consistently in most actual APIs. It's frequently used for actions that don't conceptually create anything at all.


No, I was referencing the example in the article in literally the very first section showing and explaining how POST endpoints are used for fetching data when GET endpoints are too limited. This is literally their motivating impetus for the QUERY request type.

When considered abstractly, POST is just a request body and a response body. This is obviously powerful enough to define any behavior you want; it is just a channel flowing a nearly arbitrary amount of opaque data between the client and server.

However, this kind of sucks because it does not define or constrain the behavior of the client or the server. QUERY says that the server is intended to interpret the request body as fetch parameters and return a response body as the fetched data, and further guarantee that the fetch is safe/idempotent. This is very useful.

My disagreement is that there is no good reason for the client request format to care. You should “POST” to a “QUERY” endpoint. The fact that this endpoint guarantees QUERY behavior is just part of the documented server interface in the same way that certain endpoints may not support PUT. This is a server constraint, not a client constraint so should not change the client transport format.

Requiring a new Client request type to agree with a new Server endpoint type is just unnecessary and mixes up server versus client responsibility and interface design.


I'm not following how this is different from not even using HTTP verbs. We didn't define them because it's the only possible way to declare client intent. They're cognitively useful for setting expectations, organization, announcing abilities, separation of concerns, etc. The fact that POST is today sometimes used in practice as a safe+idempotent query (i.e. a GET with a body) seems like the black sheep violating those useful qualities.


Client intent is distinct from server interpretation. Distinguishing these is important when defining protocols/interfaces.

A client can blindly PUT a endpoint that does not support PUT, only GET. In the cold connection case where does this fail? The client succeeds at serializing the message, the client succeeds at sending it, the message gets successfully received, and only upon examining the message itself does the server know it should not follow the client intent. This is the same behavior as if the client sent garbled nonsense.

The key here is that it is the server demanding certain message structures. The client is free to send whatever, it just only succeeds if it sends something the server will accept. The server distinguishes, but to the client it is no different from sending any other opaque blob of data. This is thus just a server-constraint, the endpoint is GET-only. To use this website/HTTP interface, the client needs to know that.

It will need to know to format the message correctly, but that derives from knowledge of the documented server interface. If you had just a direct connection with no intermediary participants, then you could trivially swap out HTTP for any other transport protocol without a loss of functionality or behavior. That points to there being no reason to syntactically distinguish on the client to server leg of the transport protocol. However, you still have GET endpoint behavior since that is actually preserved across transport changes, so that is the important class of behavior to define.

The opposing point here is that if you do have intermediarys then they may care about the message format and you may need to syntactically distinguish them. I suspect this is less beneficial than a simpler and more flexible, yet precise transport format.

Basically, we should prefer simple, flexible transport formats and push interpretation (and thus behavior) out to the endpoints which actually have enough context to know the true properties of what they are doing.


> The fact that this endpoint guarantees QUERY behavior is just part of the documented server interface

And how do you communicate this behavior to the client (and any other infrastructure in-between) in a machine-readable way?


PUT is the idempotent one. POST typically performs an action; PUT just creates-or-updates.


The existing mechanism to get QUERY semantics is a POST that encodes the “fetch parameters” in the body and the response contains the fetched values. You then out-of-band document that this specific use of a fetching POST is idempotent.

This is literally expressed in the document in section 1: Introduction. They just want to take that POST request and replace the word POST with QUERY which also means the server is intended to assure the request is idempotent instead of needing that documented out-of-band.


For some reason the RFC focuses on idempotency, but then says it's explicitly intended for enabling caching semantics. Caching a query that mutates visible state doesn't really make sense, and like you point out if you just want idempotent modifications PUT already has the relevant semantics. I guess we haven't learned our lesson from making the original HTTP semantics super squishy.


> For some reason the RFC focuses on idempotency,

It focuses on a bit more on safety, which is why every mention of it the proposed method having the "idempotent" property is immediately preceded (in most cases in the same sentence) by description of it having the "safe" property.


Essentially correct, QUERY is safe, like GET, not merely idempotent, like PUT. Safety implies idempotence, but not vice versa.


Does “safe” here mean just “non-mutating”?


No, it doesn't just mean that (it does mean non-mutating from the point of view of the client and in regard to the target resource, but the essential meaning involves more than that and it is more subtle than simply “non-mutating”.)

The specific definition is in the HTTP spec, and I don't think I can describe it more concisely without losing important information necessary for really understanding it.

https://www.rfc-editor.org/rfc/rfc9110#section-9.2.1


Yes.


It would be pretty impossible to actually ‘enforce’ that GETs don’t modify state. I am not sure if I would call the lack of enforcement a problem when it is more a simple fact about distributed systems; no specification can enforce what a service does outside of the what is returned in a response.


That is exactly my point. There is no reason to syntactically distinguish what is semantically non-distinguishable.

The interpretation of a request is up to the server. There is no reason for the client to syntactically distinguish that the request body is for a POST vs QUERY; the request parameters and response have the same shape with the same serialization format.

However, on the other side, a server does control interpretation, so it is responsible for documenting and enforcing how it will interpret. QUERY semantics vs generic POST semantics is a receive/server-side decision and thus should not be a syntactic element of client requests, merely a server description of endpoint semantics (“QUERY endpoint” meaning shorthand for POST endpoint with query semantics).

edit: Thinking about it some more, there is one possible semantic difference which is that a transparent caching layer could use a syntactically different POST (i.e. QUERY) to know it should be allowed to cache the request-response. I do not know enough about caching layers to know how exactly they make fill/eviction choices to know if that is important.


> There is no reason to syntactically distinguish what is semantically non-distinguishable.

The point is to have a standard, so you can more easily learn what an API is doing instead of having to start from scratch every time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: