-
Notifications
You must be signed in to change notification settings - Fork 0
Add science purpose #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ | |
| \editor{Markus Demleitner} | ||
|
|
||
| % \previousversion[????URL????]{????Concise Document Label????} | ||
| \previousversion[https://www.ivoa.net/documents/Notes/softid/]{Version 1.0} | ||
| \previousversion{This is the first public release} | ||
|
|
||
| \newcommand{\headername}[1]{{\tt #1}} | ||
|
|
@@ -64,7 +65,7 @@ \section*{Conformance-related definitions} | |
| \section{Introduction} | ||
|
|
||
| Very early on in the construction of client-server architectures it was | ||
| found that it is useful to have mechanisms for | ||
| found that it is useful to have mechanisms for | ||
| discovering which software runs | ||
| at the other side of a connection, rather typically to aid in debugging. | ||
| In particular, HTTP, which is the basis of many of the VO's protocols | ||
|
|
@@ -140,6 +141,15 @@ \subsubsection{Notifications} | |
| request for a software update), server developers might want to contact | ||
| deployers of vulnerable or otherwise broken software. | ||
|
|
||
| \subsubsection{Mitigation of Reckless Crawling} | ||
|
|
||
| Around 2024, many crawlers doing indiscriminate bulk downloads at high | ||
| rates appeared on the internet; filtering their requests is a particular | ||
| challenge in the VO, where machine clients potentially running a large | ||
| number of legitimate queries are the norm. Services may want to more | ||
| strictly limit requests from user agents that do not comply with VO | ||
| rules on their identification. | ||
|
|
||
| \subsection{Security and Privacy Considerations} | ||
|
|
||
| Several guidelines on IT security discourage giving details on the | ||
|
|
@@ -148,8 +158,8 @@ \subsection{Security and Privacy Considerations} | |
|
|
||
| Following the practices proposed here will, indeed, weaken the | ||
| ``security by obscurity'' put forward in these treatments; on the other | ||
| hand, when, as is the case in the VO, attackers only have | ||
| to scan perhaps several hundred URLs, | ||
| hand, when, as is the case in the VO, attackers only have | ||
| to scan perhaps several hundred sites, | ||
| relying on security by obscurity does not seem a promising policy. | ||
|
|
||
| On the other | ||
|
|
@@ -163,7 +173,8 @@ \subsection{Security and Privacy Considerations} | |
| seems unlikely that rogue services could be aided by information on the | ||
| client version when they target clients. | ||
|
|
||
| Software identification does play a role in user privacy; user agents | ||
| Software identification does play a role in user privacy; | ||
| user agent identifications | ||
| are regularly employed in user tracking on the WWW. While, presumably, | ||
| the generally non-profit operators in the VO will not use such data to | ||
| significantly violate their users' privacy, client authors may want to | ||
|
|
@@ -215,10 +226,9 @@ \subsection{User-Agent Header IVOA Recommendations} | |
| The Operations IG endorses and encourages use of these standard | ||
| rules concerning the \headername{User-Agent} header, | ||
| and adds a further convention, which does not | ||
| conflict with the above rules: clients whose primary purpose | ||
| is \emph{operational}, as opposed to \emph{scientific}, | ||
| should indicate that purpose by including a | ||
| comment token of the form | ||
| conflict with the above rules. User agents written to interact with VO | ||
| services should indicate their purpose by including a | ||
| comment token of the form | ||
| $$\hbox{\verb|(IVOA-<op-purpose> <optional-extra-text>)|.}$$ | ||
|
|
||
| Suggested {\tt op-purpose} values are currently: | ||
|
|
@@ -229,14 +239,23 @@ \subsection{User-Agent Header IVOA Recommendations} | |
| performance (monitoring) or standards-compliance (validation); | ||
| at this point, | ||
| no good reason to separate the different cases was identified. | ||
| \item[copy] | ||
|
|
||
| \item[copy] | ||
| The purpose of the access is to replicate (parts of) the content | ||
| published through | ||
| the service, be it for aggregation (harvesting) or re-publication | ||
| (mirroring). | ||
|
|
||
| \item[science] | ||
| The access was done to directly support a science case. This explicitly | ||
| includes education and training, in particular because we do not want to | ||
| suggest that software used in such settings -- which plausibly is going | ||
| to be the same as software used in pure research -- should be | ||
| reconfigured for them. | ||
|
|
||
| \end{description} | ||
|
|
||
| This list may evolve in the future; extensions should be proposed on | ||
| This list may evolve in the future; extensions should be proposed on | ||
| the ops@ivoa.net mailing list. Custom {\tt op-purpose} values are permitted. | ||
| Case is significant in {\tt op-purpose} values and its ``{\tt IVOA-}'' prefix. | ||
|
|
||
|
|
@@ -251,39 +270,40 @@ \subsection{User-Agent Header IVOA Recommendations} | |
| Formally: | ||
|
|
||
| \begin{verbatim} | ||
| ivoa-comment = "(IVOA-" op-purpose *( | ||
| ivoa-comment = "(IVOA-" op-purpose *( | ||
| ctext | quoted-pair | comment ) ")" | ||
| op-purpose = "test" | "copy" | token | ||
| op-purpose = "test" | "copy" | "science" | token | ||
| \end{verbatim} | ||
|
|
||
| Tokens of the form \verb|ivoa-comment| should not appear in the | ||
| \headername{User-Agent} field | ||
| if the request is a ``normal'' user science query. There | ||
| are obviously grey areas between operational and science requests; this | ||
| convention does not attempt to provide a rigid definition of these | ||
| categories. | ||
|
|
||
| This arrangement allows service operators to test in their logs for | ||
| This arrangement allows service operators to filter their logs against | ||
| \headername{User-Agent} values | ||
| whose content matches the sequence ``\verb|(IVOA-|'', or | ||
| perhaps ``\verb|(IVOA-test|'', and adjust their usage statistics | ||
| whose content matches the sequence ``\verb|(IVOA-test|'' (or, if so | ||
| desired, | ||
| ``\verb|(IVOA-copy|'' as well) and adjust their usage statistics | ||
| appropriately. Note, however, that it is not feasible to force operational | ||
| clients to follow this convention, so service operators will still need | ||
| to be careful in analysing their usage statistics. | ||
|
|
||
| User agents intended for researchers should set their IVOA comment to | ||
| IVOA-science. The purpose of this rule is to help operators to throttle | ||
| indiscriminate downloads by ``stupid'' crawlers (like the harvesters | ||
| employed to gather training material for AI models around 2025) without | ||
| impacting common clients; for instance, rate limits could be tight | ||
| without a conforming user agent header. | ||
|
Comment on lines
+288
to
+292
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't say simply "the purpose of this rule is [throttling]", since there are other use cases, for instance managing usage statistics. Possible alternative wording:
|
||
|
|
||
| \subsection{Examples} | ||
|
|
||
| A science query from the STILTS tapquery TAP client might contain the | ||
| HTTP header | ||
| \begin{verbatim} | ||
| User-Agent: STILTS/3.1-4 Java/1.8.0_181 | ||
| User-Agent: STILTS/3.1-4 (IVOA-science) Java/1.8.0_181 | ||
| \end{verbatim} | ||
| while a query from the STILTS taplint TAP service validator might | ||
| contain the header | ||
| \begin{verbatim} | ||
| User-Agent: STILTS/3.1-4 (IVOA-test) Java/1.8.0_181 | ||
| \end{verbatim} | ||
| or maybe | ||
| or maybe | ||
| \iftth | ||
| \begin{verbatim} | ||
| User-Agent: STILTS/3.1-4 (IVOA-test http://validators.org/results) Java/1.8.0_181 | ||
|
|
@@ -356,12 +376,16 @@ \subsection{Notes} | |
| will serve many different resources), the use cases for global server | ||
| identification can probably be satisfied by running one request each | ||
| against these servers, access URLs for which can readily be discovered | ||
| in the Registry as it is. | ||
| in the Registry as it is. | ||
|
|
||
| \appendix | ||
| \section{Changes from Previous Versions} | ||
|
|
||
| No previous versions yet. | ||
| \subsection{Changes from Version 1.0} | ||
|
|
||
| Now recommending the use of a \texttt|IVOA-science| ivoa-comment as | ||
| potential mitigation strategy of AI crawlers. | ||
|
|
||
| % these would be subsections "Changes from v. WD-..." | ||
| % Use itemize environments. | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about "to directly support a science case"; I'd suggest something a bit more woolly like "in support of science usage" or "in the context of science usage". I think the main target here is to differentiate clients that understand the VO/astronomy services they are engaging with from those that are just hitting anything they can find. From a practical point of view, at least for clients like topcat and stilts, it's not likely to be feasible to get them to present different user-agent headers on the basis of the user intention for particular
requests, only on the basis of the tools in use.
Given that I'm wondering if there's a different term than "science" that should be used here, but I don't have great suggestions. IVOA-voclient or just IVOA-client maybe?