This repository provides a Model Context Protocol (MCP) server that retrieves the raw HTML content from a given URL to provide context to Large Language Models (LLMs). It acts as a tool for LLMs to fetch real-time web page content beyond their training data.
The URL Content MCP server serves as a bridge between LLMs and the web. Through a standardized MCP interface, an LLM can request the content of a specific URL and receive the HTML content of that page. This allows AI assistants to access up-to-date web content on demand.
- Fetch Web Page Content: Retrieve the HTML content of a web page given its URL.
- Real-Time Data: Access current information directly from web pages in real time.
- Optional Caching: Optionally cache fetched content in memory to avoid repeated network calls for the same URL during the server's runtime.
- STDIO and SSE Support: Run the server in
stdiomode for integration as a subprocess, or insse(HTTP Server-Sent Events) mode to serve requests over HTTP.
- Python 3.8+ – The server is written in Python and requires version 3.8 or higher.
- Internet Access – The server needs network access to fetch web pages from the internet.
Note: This server fetches raw HTML content. Ensure the target URL is accessible and returns text/HTML content. Some websites may block automated requests or require specific user-agent headers.
Clone this repository and install the package along with its dependencies:
git clone https://github.com/artryazanov/url-content-mcp.git
cd url-content-mcp
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtThis will install the necessary Python packages as listed in requirements.txt. You can also install the package in editable mode (e.g., pip install -e .) if you plan to modify the code.
After installation, you can run the MCP server using the provided console script url-content-mcp or by executing the module. The server supports two modes of operation: STDIO (for direct integration with an MCP-compatible client) and SSE (for running as an HTTP server).
By default, the server runs in stdio mode. In this mode, the server reads MCP requests from standard input and writes responses to standard output. This mode is suitable for integrating with applications that manage the server as a subprocess and communicate via MCP protocol (such as certain AI assistant platforms).
Example (running in stdio mode):
url-content-mcpWhen running in stdio mode, the server will start and wait for incoming MCP requests via stdin (typically from an AI client). Each request (formatted according to the MCP protocol) will be processed, and the server will output the result to stdout as JSON.
To run the server as an HTTP service, use the --transport sse option. In SSE mode, the server will start an HTTP server and provide a RESTful endpoint for fetching URL content.
Example (running in SSE mode on port 8080):
url-content-mcp --transport sse --host 0.0.0.0 --port 8080 --enable-cacheThis starts the server in SSE mode, listening on all interfaces (0.0.0.0) at port 8080, with caching enabled. In this mode, you can send HTTP GET requests to the server's /fetch/{url} endpoint to retrieve content. Note: The {url} in the path should be URL-encoded.
For example, to fetch the content of http://example.com, encode the URL and request:
http://localhost:8080/fetch/http%3A%2F%2Fexample.com
This will return a JSON response containing the URL and the HTML content of the page. The response structure looks like:
{
"url": "http://example.com",
"content": "<!DOCTYPE html>...</html>"
}If an error occurs during fetching (for example, a network error or a non-200 HTTP status), the response will include an "error" field with a message, and the "content" may be an empty string.
Note: When running in Docker or other container environments, use --host 0.0.0.0 to bind to all interfaces, and ensure the container's port is published (e.g., -p 8080:8080).
--transport, -t(string): Transport protocol for the server. Eitherstdio(default) orsse(to run an HTTP server for SSE).--host(string): Host address to bind the HTTP server in SSE mode (default:127.0.0.1).--port(int): Port number for SSE mode (default:8080).--enable-cache(flag): Enable in-memory caching of fetched content. If this flag is set, the server will cache the content of each URL after the first fetch during its runtime.
Run url-content-mcp --help to see the usage information.
This server provides one MCP tool that the LLM can use:
- Description: Fetches the content of a web page at the given URL and returns the HTML content.
- Parameters:
url(string, required) – The web page URL to fetch.
- Returns: A JSON object with the following structure:
url: The URL that was fetched.content: The HTML content of the page as a string. (This will contain the raw HTML, including tags.)error: optional – An error message string, if an error occurred during fetching. This field is only present if there was an error (on success it is omitted).
The fetch_url tool is registered with the MCP server, so an LLM client can call this function to retrieve web page content. In stdio mode, the function is invoked via MCP tool calls in the protocol. In sse (HTTP) mode, the server exposes a GET endpoint /fetch/{url} (with the URL percent-encoded) that returns the same data.
This project includes a test suite to ensure the server works correctly.
- Install test dependencies:
pip install -r requirements-dev.txt(includes pytest). - Run all tests:
pytest
A Dockerfile is provided to containerize the MCP server. To build the Docker image:
docker build -t url-content-mcp .To run the server via Docker (exposing port 8080 for SSE mode):
docker run --rm -it -p 8080:8080 url-content-mcp --transport sse --host 0.0.0.0This will start the MCP server inside a container. You can then interact with it via HTTP requests to http://localhost:8080 (for SSE mode) or attach it to an MCP-compatible client in stdio mode.
This project is licensed under the Unlicense license. See the LICENSE file for details.