-
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c1dcd93
commit 8710802
Showing
16 changed files
with
920 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,31 @@ | ||
# Carbon.txt validator | ||
|
||
This validator reads carbon.txt files, and validates them against a spec defined on http://carbontxt.org. | ||
|
||
|
||
|
||
|
||
# Usage | ||
|
||
## With the CLI | ||
|
||
Run a validation against a given domain, and see if a) there is a CSRD report, and b) that the datapoints for a greenweb verification exist | ||
Run a validation against a given domain, or file, say if the file is valid TOML, and it confirms to the carbon.txt spec | ||
|
||
``` | ||
carbontxt validate --greenweb-csrd domain.com | ||
``` | ||
```shell | ||
# parse the carbon.txt file on default paths on some-domain.com | ||
carbontxt validate domain some-domain.com | ||
|
||
Run a validation against a given domain, and only say if the file is valid TOML, and it confirms to the carbon.txt spec | ||
# parse a remote file available at https://somedomain.com/path-to-carbon.txt | ||
carbontxt validate file https://somedomain.com/path-to-carbon.txt | ||
|
||
``` | ||
carbontxt validate --syntax-only domain | ||
carbontxt validate --syntax-only -f ./path-to-file.com # look at the file only | ||
``` | ||
# parse a local file ./path-to-file.com | ||
carbontxt validate file ./path-to-file.com | ||
|
||
Run a validation against a given domain, and only say if the file is valid TOML, and it confirms to the spec, AND if the links are still live | ||
# pipe the contents of a file into the file validation command as part of a pipeline | ||
cat ./path-to-file.com | carbontxt validate file | ||
|
||
``` | ||
carbontxt validate --greenweb-csrd --syntax-only --check-links domain | ||
``` | ||
|
||
Run a validation against a given domain, and download the evidence. Take a screenshot of the page if a webpage downloading it and the HTML, (maybe WARCing it), otherwise download the file directly otherwise. | ||
## With the HTTP API | ||
|
||
``` | ||
carbontxt fetch --greenweb-csrd --fetch | ||
``` | ||
(Coming up next) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# How carbon.txt is designed. | ||
|
||
The carbon.txt validator is split into a series of components, with clear divisions of responsibility | ||
|
||
- **Finders**: Finders are responsible for accepting a domain or URI, and resolving it to the final URI where a carbon.txt file is accessible, for fetching and reading. | ||
- **Parsers**: Parsers are responsible for parsing carbon.txt files, then making sure they valid and conform to the required data schema. | ||
- **Processors**(s): Processors are responsible for parsing specific kinds of linked documents, and data, and returning a valid data structure. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# pytest.ini | ||
[pytest] | ||
pythonpath = src |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +0,0 @@ | ||
def hello() -> str: | ||
return "Hello from carbon-txt-validator!" | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
import httpx | ||
import dns.resolver | ||
import tomllib as toml | ||
from pathlib import Path | ||
from urllib.parse import urlparse, ParseResult | ||
from typing import Optional | ||
import logging | ||
import rich | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
logger.setLevel(logging.DEBUG) | ||
|
||
|
||
class FileFinder: | ||
""" | ||
Responsible for figuring out which URI to fetch | ||
a carbon.txt file from. | ||
""" | ||
|
||
def _parse_uri(self, uri: str) -> Optional[ParseResult]: | ||
""" | ||
Return a parsed URI object if the URI is valid, otherwise return None | ||
""" | ||
parsed_uri = urlparse(uri) | ||
if parsed_uri.scheme in ("http", "https"): | ||
return parsed_uri | ||
return None | ||
|
||
def _lookup_dns(self, domain): | ||
"""Try a DNS TXT record lookup for the given domain""" | ||
|
||
# look for a TXT record on the domain first | ||
# if have it, return that | ||
try: | ||
dns_record = dns.resolver.resolve(domain, "TXT") | ||
if dns_record: | ||
url = dns_record[0].strings[0].decode() | ||
rich.print(url) | ||
return url | ||
|
||
except dns.resolver.NoAnswer: | ||
logger.info("No result from TXT lookup") | ||
return None | ||
|
||
def resolve_domain(self, domain) -> str: | ||
""" | ||
Accepts a domain, and returns a URI to fetch a carbon.txt file from. | ||
""" | ||
|
||
uri_from_domain = self._lookup_dns(domain) | ||
|
||
if uri_from_domain: | ||
return self._parse_uri(uri_from_domain).geturl() | ||
|
||
# otherwise we look for a carbon.txt file at the root of the domain, then fallback to | ||
# one at the .well-known path following the well-known convention | ||
default_paths = ["/carbon.txt", "/.well-known/carbon.txt"] | ||
|
||
for url_path in default_paths: | ||
uri = urlparse(f"https://{domain}{url_path}") | ||
response = httpx.head(uri.geturl()) | ||
if response.status_code == 200: | ||
return uri.geturl() | ||
|
||
# if a path has the 'via' header, it suggests a managed service or proxy | ||
# follow that path to fetch the active carbon.txt file | ||
|
||
# TODO allow file on the domain to override the one specificed in a 'via' | ||
|
||
def resolve_uri(self, uri: str) -> str: | ||
""" | ||
Accept a URI pointing to a carbon.txt file, and return the final | ||
resolved URI, following any 'via' referrers or similar. | ||
""" | ||
|
||
# check if the uri looks like one we might reach over HTTP / HTTPS | ||
parsed_uri = self._parse_uri(uri) | ||
|
||
# if there is no http or https scheme, we assume a local file | ||
if not parsed_uri: | ||
path_to_file = Path(uri) | ||
return str(path_to_file.resolve()) | ||
|
||
response = httpx.head(parsed_uri.geturl()) | ||
if response.status_code == 200: | ||
return parsed_uri.geturl() | ||
|
||
# fallback to doing retry or notifying us if the domain is unreachable | ||
# TODO: | ||
|
||
# check if there is a via header | ||
# TODO: do we follow multiple 'via' headers? If so, how many hops do we follow before we timeout? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
import tomllib as toml | ||
from . import schemas | ||
import httpx | ||
import pathlib | ||
|
||
|
||
class CarbonTxtParser: | ||
""" | ||
Responsible for parsing carbon.txt files, checking | ||
they are valid TOML, and that parsed data structures | ||
have the expected top level keys and values. | ||
""" | ||
|
||
def get_carbon_txt_file(self, str) -> str: | ||
""" | ||
Accept a URI and either fetch the file over HTTP(S), or read the local file. | ||
Return a string of contents of the remote carbon.txt file, or the local file. | ||
""" | ||
if str.startswith("http"): | ||
result = httpx.get(str).text | ||
return result | ||
|
||
if pathlib.Path(str).exists(): | ||
return pathlib.Path(str).read_text() | ||
|
||
def parse_toml(self, str) -> dict: | ||
""" | ||
Accept a string of TOML and return a CarbonTxtFile | ||
object | ||
""" | ||
parsed = toml.loads(str) | ||
return parsed | ||
|
||
def validate_as_carbon_txt(self, parsed) -> schemas.CarbonTxtFile: | ||
""" | ||
Accept a parsed TOML object and return a CarbonTxtFile, validating that | ||
necessary keys are present and values are of the correct type. | ||
""" | ||
|
||
carb_txt_obj = schemas.CarbonTxtFile(**parsed) | ||
|
||
return carb_txt_obj |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
from typing import Literal, Optional, List | ||
|
||
from pydantic import BaseModel, Field | ||
|
||
|
||
class Provider(BaseModel): | ||
""" | ||
Providers in this context are upstream providers of hosted services. | ||
The domain is used as key for looking up a corresponding provider in the | ||
Green Web Platform | ||
""" | ||
|
||
domain: str | ||
name: Optional[str] = None | ||
services: Optional[List[str]] = None | ||
|
||
|
||
class Credential(BaseModel): | ||
""" | ||
Credentiials are essentially supporting documentation for a claim to be running on | ||
green energy. | ||
""" | ||
|
||
domain: Optional[str] = None | ||
doctype: Literal[ | ||
"web-page", "annual-report", "sustainability-page", "certificate", "other" | ||
] | ||
url: str | ||
|
||
|
||
class Organisation(BaseModel): | ||
""" | ||
An Organisation is the entity making the claim to running its infrastructure | ||
on green energy. In the very least it should have some credentials point to, even | ||
if it is exclusively relying on upstream providers for its green claims. | ||
""" | ||
|
||
credentials: List[Credential] = Field(..., min_length=1) | ||
|
||
|
||
class Upstream(BaseModel): | ||
""" | ||
Upstream refers to one or more providers of hosted services that the Organisation | ||
is relying on to operate a digital service, like running a website, or application. | ||
""" | ||
|
||
# organisations that don't use third party providers could plausibly have an | ||
# empty upstream list. We also either accept providers as a single string representing | ||
# a domain, or a dictionary containing the fields defined in the Provider model | ||
providers: Optional[List[Provider | str]] = None | ||
|
||
|
||
class CarbonTxtFile(BaseModel): | ||
""" | ||
A carbon.txt file is the data structure that acts as an index for supporting evidence | ||
for green claims made by a specific organisation. It is intended to links to | ||
machine readable data or supporting documentation in the public domain. | ||
""" | ||
|
||
upstream: Optional[Upstream] = None | ||
org: Organisation |
Oops, something went wrong.