From 21c361d9cdab1c8b0045670f0087830cb11a7950 Mon Sep 17 00:00:00 2001 From: Markus Demleitner Date: Fri, 19 Dec 2025 19:03:34 +0100 Subject: [PATCH 1/2] Adding science as a purpose in the IVOA comment. --- Makefile | 4 +-- ivoatex | 2 +- softid.tex | 76 +++++++++++++++++++++++++++++++++++------------------- 3 files changed, 53 insertions(+), 29 deletions(-) diff --git a/Makefile b/Makefile index c568a42..9900cb5 100644 --- a/Makefile +++ b/Makefile @@ -4,10 +4,10 @@ DOCNAME = softid # count up; you probably do not want to bother with versions <1.0 -DOCVERSION = 1.0 +DOCVERSION = 1.1 # Publication date, ISO format; update manually for "releases" -DOCDATE = 2021-05-28 +DOCDATE = 2025-12-18 # What is it you're writing: NOTE, WD, PR, REC, PEN, or EN DOCTYPE = NOTE diff --git a/ivoatex b/ivoatex index 61c5949..e317384 160000 --- a/ivoatex +++ b/ivoatex @@ -1 +1 @@ -Subproject commit 61c59493c774be35c86f868d776a28468eca5b0f +Subproject commit e317384fc38d87058966bf9f205a245865e28980 diff --git a/softid.tex b/softid.tex index b13d823..903d45d 100644 --- a/softid.tex +++ b/softid.tex @@ -18,6 +18,7 @@ \editor{Markus Demleitner} % \previousversion[????URL????]{????Concise Document Label????} +\previousversion[https://www.ivoa.net/documents/Notes/softid/]{Version 1.0} \previousversion{This is the first public release} \newcommand{\headername}[1]{{\tt #1}} @@ -64,7 +65,7 @@ \section*{Conformance-related definitions} \section{Introduction} Very early on in the construction of client-server architectures it was -found that it is useful to have mechanisms for +found that it is useful to have mechanisms for discovering which software runs at the other side of a connection, rather typically to aid in debugging. In particular, HTTP, which is the basis of many of the VO's protocols @@ -140,6 +141,15 @@ \subsubsection{Notifications} request for a software update), server developers might want to contact deployers of vulnerable or otherwise broken software. +\subsubsection{Mitigation of Reckless Crawling} + +Around 2024, many crawlers doing indiscriminate bulk downloads at high +rates appeared on the internet; filtering their requests is a particular +challenge in the VO, where machine clients potentially running a large +number of legitimate queries are the norm. Services may want to more +strictly limit requests from user agents that do not comply with VO +rules on their identification. + \subsection{Security and Privacy Considerations} Several guidelines on IT security discourage giving details on the @@ -148,8 +158,8 @@ \subsection{Security and Privacy Considerations} Following the practices proposed here will, indeed, weaken the ``security by obscurity'' put forward in these treatments; on the other -hand, when, as is the case in the VO, attackers only have -to scan perhaps several hundred URLs, +hand, when, as is the case in the VO, attackers only have +to scan perhaps several hundred sites, relying on security by obscurity does not seem a promising policy. On the other @@ -163,7 +173,8 @@ \subsection{Security and Privacy Considerations} seems unlikely that rogue services could be aided by information on the client version when they target clients. -Software identification does play a role in user privacy; user agents +Software identification does play a role in user privacy; +user agent identifications are regularly employed in user tracking on the WWW. While, presumably, the generally non-profit operators in the VO will not use such data to significantly violate their users' privacy, client authors may want to @@ -215,10 +226,9 @@ \subsection{User-Agent Header IVOA Recommendations} The Operations IG endorses and encourages use of these standard rules concerning the \headername{User-Agent} header, and adds a further convention, which does not -conflict with the above rules: clients whose primary purpose -is \emph{operational}, as opposed to \emph{scientific}, -should indicate that purpose by including a -comment token of the form +conflict with the above rules. User agents written to interact with VO +services should indicate their purpose by including a +comment token of the form $$\hbox{\verb|(IVOA- )|.}$$ Suggested {\tt op-purpose} values are currently: @@ -229,14 +239,23 @@ \subsection{User-Agent Header IVOA Recommendations} performance (monitoring) or standards-compliance (validation); at this point, no good reason to separate the different cases was identified. -\item[copy] + +\item[copy] The purpose of the access is to replicate (parts of) the content published through the service, be it for aggregation (harvesting) or re-publication (mirroring). + +\item[science] +The access was done to directly support a science case. This explicitly +includes education and training, in particular because we do not want to +suggest that software used in such settings -- which plausibly is going +to be the same as software used in pure research -- should be +reconfigured for them. + \end{description} -This list may evolve in the future; extensions should be proposed on +This list may evolve in the future; extensions should be proposed on the ops@ivoa.net mailing list. Custom {\tt op-purpose} values are permitted. Case is significant in {\tt op-purpose} values and its ``{\tt IVOA-}'' prefix. @@ -251,39 +270,40 @@ \subsection{User-Agent Header IVOA Recommendations} Formally: \begin{verbatim} -ivoa-comment = "(IVOA-" op-purpose *( +ivoa-comment = "(IVOA-" op-purpose *( ctext | quoted-pair | comment ) ")" -op-purpose = "test" | "copy" | token +op-purpose = "test" | "copy" | "science" | token \end{verbatim} -Tokens of the form \verb|ivoa-comment| should not appear in the -\headername{User-Agent} field -if the request is a ``normal'' user science query. There -are obviously grey areas between operational and science requests; this -convention does not attempt to provide a rigid definition of these -categories. - -This arrangement allows service operators to test in their logs for +This arrangement allows service operators to filter their logs against \headername{User-Agent} values -whose content matches the sequence ``\verb|(IVOA-|'', or -perhaps ``\verb|(IVOA-test|'', and adjust their usage statistics +whose content matches the sequence ``\verb|(IVOA-test|'' (or, if so +desired, +``\verb|(IVOA-copy|'' as well) and adjust their usage statistics appropriately. Note, however, that it is not feasible to force operational clients to follow this convention, so service operators will still need to be careful in analysing their usage statistics. +User agents intended for researchers should set their IVOA comment to +IVOA-science. The purpose of this rule is to help operators to throttle +indiscriminate downloads by ``stupid'' crawlers (like the harvesters +employed to gather training material for AI models around 2025) without +impacting common clients; for instance, rate limits could be tight +without a conforming user agent header. + \subsection{Examples} A science query from the STILTS tapquery TAP client might contain the HTTP header \begin{verbatim} -User-Agent: STILTS/3.1-4 Java/1.8.0_181 +User-Agent: STILTS/3.1-4 (IVOA-science) Java/1.8.0_181 \end{verbatim} while a query from the STILTS taplint TAP service validator might contain the header \begin{verbatim} User-Agent: STILTS/3.1-4 (IVOA-test) Java/1.8.0_181 \end{verbatim} -or maybe +or maybe \iftth \begin{verbatim} User-Agent: STILTS/3.1-4 (IVOA-test http://validators.org/results) Java/1.8.0_181 @@ -356,12 +376,16 @@ \subsection{Notes} will serve many different resources), the use cases for global server identification can probably be satisfied by running one request each against these servers, access URLs for which can readily be discovered -in the Registry as it is. +in the Registry as it is. \appendix \section{Changes from Previous Versions} -No previous versions yet. +\subsection{Changes from Version 1.0} + +Now recommending the use of a \texttt|IVOA-science| ivoa-comment as +potential mitigation strategy of AI crawlers. + % these would be subsections "Changes from v. WD-..." % Use itemize environments. From a17640c2cac558b7855dca5f679a6da667797175 Mon Sep 17 00:00:00 2001 From: Markus Demleitner Date: Tue, 23 Dec 2025 10:20:13 +0100 Subject: [PATCH 2/2] Administrative Makefile update --- Makefile | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 9900cb5..7ffdd85 100644 --- a/Makefile +++ b/Makefile @@ -22,13 +22,21 @@ SOURCES = $(DOCNAME).tex # List of image files to be included in submitted package (anything that # can be rendered directly by common web browsers) -FIGURES = +FIGURES = # List of PDF figures (figures that must be converted to pixel images to # work in web browsers). -VECTORFIGURES = +VECTORFIGURES = # Additional files to distribute (e.g., CSS, schema files, examples...) AUX_FILES = tapstats.py include ivoatex/Makefile + +ivoatex/Makefile: + @echo "*** ivoatex submodule not found. Initialising submodules." + @echo + git submodule update --init + +test: + @echo "*** No tests defined"