From 7f74d0c6c3529e5ea3807da2f6f91f3e185e41f2 Mon Sep 17 00:00:00 2001
From: Fuqiao Xue
Date: Thu, 21 Mar 2024 16:05:04 +0800
Subject: [PATCH] Typo fix
---
index.html | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/index.html b/index.html
index b0ed702..e7acc00 100644
--- a/index.html
+++ b/index.html
@@ -370,7 +370,7 @@ Comparison with search engines
-- The implicit contract that content creators expect from search engines crawlers –i.e., that they will bring exposure to their content– does not have a systematic equivalent for content integrated into [=AI systems=]; while some such systems are gaining the ability to point back to the source of their training data used in a given [=inference=], this is hardly a widespread feature of these systems, nor is it obvious it could be applied systematically (e.g., would linking back to sources for a generated image even make sense?); even if it could, fewer sources would likely be exposed than in a typical search engine results page, and the incentives for the user to follow the links would likely be sustantially lower.
+
- The implicit contract that content creators expect from search engines crawlers –i.e., that they will bring exposure to their content– does not have a systematic equivalent for content integrated into [=AI systems=]; while some such systems are gaining the ability to point back to the source of their training data used in a given [=inference=], this is hardly a widespread feature of these systems, nor is it obvious it could be applied systematically (e.g., would linking back to sources for a generated image even make sense?); even if it could, fewer sources would likely be exposed than in a typical search engine results page, and the incentives for the user to follow the links would likely be substantially lower.
robots.txt
directives allow specific rules to be given to specific crawlers based on their user agent; while this has been practically manageable when dealing with (for better or for worse) few well-known search engine crawlers, expecting content creators to maintain potential allow- and block-lists of the rapidly expanding number of crawlers deployed to retrieve training data seems unlikely to achieve sustainable results.