The text indicates that NL can be a multi-character sequence, but the BNF doesn't permit it.
The BNF has been adjusted to allow CR, LF, or CR LF as NL.
If protocol names form the first word of associated protocol-specific metavariable names, an implication is that collisions may occur (e.g., consider a protocol named 'REQUEST' with a field such as 'METHOD'.)
"... if the scheme used was HTTP" should either lowercase the scheme name, or change 'scheme' to 'protocol.'
The scheme has been downcased.
Section 6.1.7 says that PATH_TRANSLATED should not be defined if the request includes no path-info.. but section 6.1 says we don't distinguish between undefined metavariables and empty ones.
Change "a RFC 1413" to "an RFC 1413".
'... the protocol remains "http".' needs to be changed; protocol names are upcased.
The second paragraph of section 6.2 mentions '... on the standard input stream' even though the first paragraph suggests that 'stdin' may not be the way data are communicated to the script. Ross suggests that we get rid of the phrase, ending the sentence with '... bytes to scripts.'
The third paragraph of section 7 specifically refers to 'script process'; the word 'process' should be elided as it's an implementation-specific detail and may not apply in all cases.
Do we really need to say that HTTP/1.1 requests can be given HTTP/1.1 response status codes? Don't both HTTP/1.0 and HTTP/1.1 explicitly allow unknown values and say they should be treated as though they were the known x00 ones?
No changes made; existing text seems best.
The first paragraph of the section on Script-URI ends with 'This mechanism is as follows' -- but the next text is about header fields, not URLs. Perhaps some text got lost in some edit?
The 'V' in 'MetaVariables' should be downcased.
References to RFC 2068 should be updated to refer instead to RFC 2616.
Ross Patterson suggests that we change all occurrences of 'define' to 'provide,' as regards metavariables. He says:
"I *REALLY* like the use of "define" in the fifth paragraph of section 8.2, where we use it to mean "document for understanding by users and programmers", rather than "create". I think we might be better off if we replaced all the other uses of "define" when talking about a metavariable's existance with "provide" (as we do in section 6.1.1)."
RFC 2396 replaces all three of the older URL RFCs. Places that should be updated include sections 7.2.1.2, 8.2 (item 3; should be "[4], section 5"), 3.2 (suggested replacement for the second paragraph: "The Script-URI has the syntax of the "generic URI" as defined in section 3 of RFC 2396 [4], with the exception that the <authority> component is replaced with the more specific <host><port> pair:".
Note that we can omit the original "with the exception... permitted" phrase, since RFC 2396 leaves parameters and fragments out of "generic URI".
There are a few references to HTTP which should either be marked as such or made more generic:
As I understand URL-encoding as defined in RFC 2396 section 2, any context it's used in must define the set of characters that must be encoded, or more directly, the set of "unreserved" chars that don't need to be encoded (in 2396, 2.2 last paragraph, and 2.4). A comment stating that may help, for clarification. And, we need to define that set of unreserved chars wherever we use URL-encoding here.
One such place is in 3.2, second to last paragraph, for 'enc-script' and 'enc-path-info'. If scripts ever need to generate the Script-URI from the metavariables, we need to define the set of unreserved characters for each.
The spec (in 3.2, 6.1.14, and 9.2 fifth paragraph) almost prevents a script from generating a URL that calls itself (because the Script-URI could be server-invented), but it's very useful (even critical) to do so, and in fact BCP is that servers provide a way for scripts to do this, usually with SCRIPT_NAME.
To fix this, we could strengthen the last paragraph in 8.2 to read "Servers SHOULD provide a well-documented way for scripts to generate URLs which will locate the same script again. Often, this is done by making the SCRIPT_NAME metavariable equal to the leading path component of the Request-URI that refers to the script." Is there a compelling reason not to do so?
I understand there may be special-purpose servers that invent their own Script-URI schemes, but they're not precluded by adding this SHOULD, and frankly I think they're vastly outweighed by situations that need a self-referencing script.
In 3.2, the '"?" QUERY_STRING' should arguably be omitted from the BNF when QUERY_STRING is empty.
In 6.1.5, the last paragraph doesn't clearly state that servers SHOULD provide metavariables for all HTTP request header fields, with certain exceptions. This is clearly stated in 8.3. If we intend it as in 8.3, add the SHOULD sentence to 6.1.5.
In 6.1.6, second to last paragraph-- Is there a compelling reason for allowing restrictions to PATH_INFO? If nothing else, could we specifically limit what MAY be restricted, i.e. somehow let URLs (either static or script-generated) contain a PATH_INFO that is guaranteed to pass to the script unchanged? Without that, use of PATH_INFO in portable scripts or even HTML pages is severely limited. The way it's worded, ANY character in PATH_INFO could be altered by the server. Could we at least protect alphanumeric and "/", and other useful punctuation like "%"?
Ultimately, I'd like to disallow restrictions on PATH_INFO; it's much less useful when subject to arbitrary changes by the server. PATH_INFO's not just for paths anymore.
Also in 6.1.6, somewhat related to the last point, note that the entire BNF for PATH_INFO simplifies to the single line
PATH_INFO = "" | ( "/" *CHAR )
except that other places in the spec (8.2, 9.2) refer to "segments" of PATH_INFO.
In 6.1.7, PATH_TRANSLATED is described as a SHOULD metavariable, while 8.3 lists it as a MAY metavariable. I prefer MAY, myself -- PATH_TRANSLATED seems special-purpose, messy, non-portable, and weighs unnecessarily on the PATH_INFO spec. It seems like more of a burden than a benefit. We could do more powerful things with less effort in 1.2 (e.g., add a DOCUMENT_ROOT metavariable).
In 6.1.12, REMOTE_USER is described as a SHOULD metavariable, while 8.3 lists it as a MAY metavariable. I prefer SHOULD; it helps with security checking and other things.
In 6.1.17 (SERVER_PROTOCOL), I believe the "protocol" BNF should begin with an "alpha". This would match the "scheme" definition in RFC 2396, section 3.1.
In 6.1.18, I believe the SERVER_SOFTWARE BNF shouldn't be just "1*product", but "product *( 1*SP product )", i.e. include spaces between products.
Note that in SERVER_SOFTWARE, Apache includes the token "(Unix)", but "()" are tspecials and shouldn't be in tokens.
In 6.2, the third paragraph (about NPH scripts) seems to contradict 8.1.2, which allows the server to remove transfer encoding from the content-body. One should take priority, and state it. Maybe we could end the third paragraph with "... unaltered by the server, except as directed in section 8.1.2."
Section 7 says "Servers... MAY support NPH output." I'd prefer that was a SHOULD, at least. NPH is very valuable, and BCP is that servers normally have an NPH mechanism.
In 7.1, the first paragraph should read "... the SERVER_PROTOCOL *meta*variable...."
Also in 7.1, the final SHOULD statement is contingent on the new 8.1.4, which requires the server to enforce protocol compliance. Maybe end the sentence with "... and no transport-visible buffering, subject to enforcing protocol compliance as described in section 8.1.4."
Note that NPH scripts will have trouble with SERVER_PROTOCOL and the HTTP spec -- SERVER_PROTOCOL is the protocol level of the request and response, but the HTTP response should identify the HTTP version the server is capable of. For example, if a client makes an HTTP/1.0 request to a HTTP/1.1 server, the server should send a response in HTTP/1.0 format, with the exception that the status line starts with "HTTP/1.1" instead of "HTTP/1.0". Our SERVER_PROTOCOL identifies the request protocol, but not the server's capabilities; an NPH script needs both to generate an HTTP-compliant response.
Or we can just let section 8.1.4 (protocol compliance) take care of it, and require the server to insert the correct version in the response.
In 7.2, we might add a comment like in the last paragraph of 9.2, stating that "scripts SHOULD try to send the CGI header fields as soon as possible, and SHOULD send them before any HTTP header fields." I don't think it's relevant to the BNF, since it's not a MUST.
In 7.2.1.2, I'm not sure about this BNF. Where do "scheme", "qchar", "safe", and "extra" come from? Is "scheme" from SERVER_PROTOCOL's "protocol"? Are the other three from the QUERY_STRING definition of previous drafts?
We should maybe use the BNFs for "absoluteURI" and "fragment" from RFC 2396, sections 3 and 4.1.
Also in 7.2.1.2, the handling of a rel-URL-abs-path Location (end of first big paragraph) implies that the actual location is never communicated to the client, so the client never knows the redirected URL. This seems counterintuitive.
It does in fact work this way in Apache. However, rel-URL-rel-path (unsupported in this spec) causes a 302 response to be sent.
Now, maybe it's the server's job to send an appropriate "Content-Location:" header to identify the actual location of the resource in the response. But this would violate 7.2.1.2, because it's a slightly different response than would be generated by a request for the redirected URL. So maybe we should allow certain appropriate differences between the actual response, and "the response that [the server] would have produced in response to a request containing [the redirected URL]."
In 8.1.1, first paragraph, I'd clarify the phrase to "... in order to achieve particular settings of the meta-variables *relating to the Script-URI*."
Also in 8.1.1, the entire section after the first paragraph seems to have been lost between drafts 00 and 01. Where did it go? Without it, the first paragraph makes no sense.
In 8.2, I'd start the first paragraph with "On systems that support a command line for CGI scripts, servers SHOULD provide..."
Changed as suggested.
Also in 8.2, I don't think the fourth paragraph (encoded "/" handling) is needed. I would think it's up to whoever's creating the PATH_INFO string to make sure any non-delimiting "/" is correctly escaped, and THEN URL-encode the whole thing to put into the complete URL. Upon receiving the request, the server URL-decodes PATH_INFO before passing it to the script; the script then splits on "/", and then unescapes any non-delimiting "/" in the segments.
Or does this paragraph describe actual behaviour of current servers?
This caution would only apply if PATH_INFO is being used for heirarchical data, which is not always the case in current practice.In 9.2, now that the second paragraph is a MUST, it should be moved up to 9.1, or at least that part of the paragraph that is a MUST.
Also in 9.2, fifth paragraph -- Related to item 16 above, I think it's both useful and BCP for scripts to use relative URLs in their output. Usually, the script knows exactly where it is, from SCRIPT_NAME or something else. Maybe start this paragraph with "If it is impossible..." rather than "As it is impossible...".
Also, "relative URL" could be an absolute path or a relative path, but I think there is only a problem with a relative path, since the script does know what server it's running on.In 10.1, AmigaDOS, current working directory: Change "is" to "SHOULD be".
In 11.3, second paragraph, the SHOULD is pretty important and perhaps should even be MUST -- "precautions MUST be taken to protect the core memory of the server, or to ensure that untrusted code cannot be executed." Some precautions, at least. The alternative is "no precautions", as I read it. Although arguably, it's not a protocol compliance issue, since it wouldn't cause incompatibilities....
In 11.4 and 11.5 -- Should "should" be "SHOULD"? May "may" be "MAY"? In several places.
All of section 11 is advisory and doesn't specify compliance criteria, so the 'should's and 'may's here belong in lower-case.