feat: mdcms build writes mdcms.json; register accepts URLs

mdcms build now calls generate_site_manifest() at the end of every build,
writing mdcms.json to the site root. This file lists all deployable files
and empty directories, and is deployed alongside the site so any mdcms
user can register a copy of the site from its URL.

mdcms register now accepts a GitHub repo URL or plain HTTPS URL as PATH
or via --from. GitHub URLs try mdcms.json (raw content) first and fall
back to the Contents API tree-walk. Plain URLs require mdcms.json to be
present and fail with a clear error if it is not found.

- generate_site_manifest() added; called at end of run_build
- download_template(dest, source=None) dispatches on source type
- _parse_github_url() extracts owner/repo/branch/subpath from GitHub URLs
- _fetch_manifest() / _apply_manifest() handle the manifest protocol
- _download_tree_api() retained as GitHub Contents API fallback
- _http_get_github() carries Accept header for Contents API responses
- MANIFEST_FILENAME = "mdcms.json"; GITHUB_URL_RE added
- app/template-manifest.json replaced by app/mdcms.json
- register command: PATH accepts URL; --from option added

https://claude.ai/code/session_01Ai8xRvmrzdhuTKiRQ2fnn9
This commit is contained in:
Claude 2026-06-07 18:00:17 +00:00
parent be698a2bdd
commit 8e7f5d3ae9
No known key found for this signature in database
3 changed files with 201 additions and 36 deletions

View file

@ -2,11 +2,6 @@
"mdcms": "0.4", "mdcms": "0.4",
"files": [ "files": [
"404.html", "404.html",
"config.yml",
"index.html",
"nav.yml",
"template-manifest.json",
"theme.yml",
"assets/icons/add.svg", "assets/icons/add.svg",
"assets/icons/arrow_drop_down.svg", "assets/icons/arrow_drop_down.svg",
"assets/icons/arrow_right.svg", "assets/icons/arrow_right.svg",
@ -33,10 +28,15 @@
"assets/icons/text_compare.svg", "assets/icons/text_compare.svg",
"assets/icons/warning.svg", "assets/icons/warning.svg",
"assets/images/favicon.png", "assets/images/favicon.png",
"config.yml",
"index.html",
"nav.yml",
"pages/about.md", "pages/about.md",
"pages/docs.md", "pages/docs.md",
"pages/home.md", "pages/home.md",
"pages/tabs-accordions.md" "pages/tabs-accordions.md",
"search.json",
"theme.yml"
], ],
"dirs": [ "dirs": [
"assets/fonts", "assets/fonts",

View file

@ -197,17 +197,15 @@ When a site uses category-suffixed page files (e.g. `page.current.md`) and is ho
--- ---
## Manifest-driven template download (`mdcms.py`, `app/template-manifest.json`) ## Manifest-driven download and URL-based register (`mdcms.py`, `app/mdcms.json`)
`mdcms register` no longer uses the GitHub Contents API to discover and download the starter template. Instead it fetches `app/template-manifest.json` — a single JSON file that lists every file and empty directory in the template — then downloads each file directly as a raw URL. `mdcms build` now writes `mdcms.json` to the site root on every build. `mdcms register` can accept a GitHub repo URL or a plain HTTPS URL as the source to download from.
### Why this matters ### `mdcms build` writes `mdcms.json`
The old approach walked the GitHub tree API recursively (one authenticated API call per directory). This hit rate limits, required GitHub-specific logic, and made it impossible to host the template anywhere other than the GitHub API endpoint. At the end of each build, `generate_site_manifest()` walks the site directory, lists every non-hidden file (excluding `mdcms.json` itself), records any empty directories, and writes `mdcms.json`. This file is deployed alongside the rest of the site — it is the machine-readable index of what the site contains.
The new approach fetches one manifest then one raw file per entry. Raw downloads bypass API rate limits entirely and work from any HTTP source: a CDN, a self-hosted mirror, or a local server. `download_template()` accepts an optional `base_url` argument for this purpose. Format:
### `app/template-manifest.json` format
```json ```json
{ {
@ -217,14 +215,32 @@ The new approach fetches one manifest then one raw file per entry. Raw downloads
} }
``` ```
`files`paths relative to the app root that are fetched and written verbatim. `files`all deployable files, paths relative to the site root.
`dirs` — empty directories to create (no file is needed to keep them). `dirs` — empty directories to create on download (no file needed to keep them alive).
Generated files (`manifest.json`, `service-worker.js`, `search.json`) are intentionally absent; they are produced by `mdcms build` and should not be pre-populated in a fresh site. ### `mdcms register` accepts URLs
### `_http_get` replaces `_github_get` `PATH` can now be a GitHub repo URL or a plain HTTPS URL pointing to a deployed mdcms site. A `--from URL` option is also available as an explicit override.
The old `_github_get` sent GitHub API headers (`Accept: application/vnd.github.v3+json`) and returned raw bytes. It is replaced by a generic `_http_get(url)` that works with any HTTP source. This function is also referenced by `fetch-deps`. ```
mdcms register mysite # existing behaviour
mdcms register mysite ./mydir # local path
mdcms register mysite https://github.com/owner/repo # GitHub repo
mdcms register mysite https://github.com/owner/repo/tree/main/subdir
mdcms register mysite --from https://example.com/mysite # deployed site
```
**GitHub URL** — tries `mdcms.json` from the raw content URL first; falls back to the GitHub Contents API tree-walk if no manifest is found.
**Plain HTTPS URL** — fetches `{url}/mdcms.json`; if not found, reports an error with guidance.
### `app/mdcms.json`
The starter template now ships with its own `mdcms.json`. This means `mdcms register mysite https://github.com/kbenestad/mdcms/tree/main/app` works via the manifest path with no API calls.
### `_http_get` / `_http_get_github`
`_http_get(url)` — generic SSL-verified GET, no vendor headers. Used for raw file downloads and manifest fetches.
`_http_get_github(url)` — adds `Accept: application/vnd.github.v3+json` for Contents API responses (only needed in the fallback tree-walk path).
--- ---

175
mdcms.py
View file

@ -41,7 +41,12 @@ CATEGORY_CODE_RE = re.compile(r"^[a-zA-Z0-9\-]+$")
REGISTRY_FILE = Path.home() / ".config" / "mdcms" / "sites.json" REGISTRY_FILE = Path.home() / ".config" / "mdcms" / "sites.json"
TEMPLATE_BASE_URL = "https://raw.githubusercontent.com/kbenestad/mdcms/main/app" TEMPLATE_BASE_URL = "https://raw.githubusercontent.com/kbenestad/mdcms/main/app"
TEMPLATE_MANIFEST = "template-manifest.json" MANIFEST_FILENAME = "mdcms.json"
GITHUB_URL_RE = re.compile(
r"https?://github\.com/([^/]+)/([^/]+?)(?:\.git)?"
r"(?:/tree/([^/]+?)(?:/(.+?))?)?/?$"
)
# ─── Version helpers ────────────────────────────────────────── # ─── Version helpers ──────────────────────────────────────────
@ -543,6 +548,8 @@ def run_build(site_path: Path):
fg="cyan", fg="cyan",
)) ))
generate_site_manifest(site_path)
# ─── PWA generation ─────────────────────────────────────────── # ─── PWA generation ───────────────────────────────────────────
@ -651,7 +658,7 @@ self.addEventListener('fetch', event => {{
(site_path / "service-worker.js").write_text(sw, encoding="utf-8") (site_path / "service-worker.js").write_text(sw, encoding="utf-8")
click.echo(f" Wrote service-worker.js (cache: {cache_name})") click.echo(f" Wrote service-worker.js (cache: {cache_name})")
# ─── HTTP helper ───────────────────────────────────────────── # ─── HTTP helpers ─────────────────────────────────────────────
def _http_get(url: str) -> bytes: def _http_get(url: str) -> bytes:
req = urllib.request.Request(url, headers={"User-Agent": f"mdcms/{CLI_VERSION}"}) req = urllib.request.Request(url, headers={"User-Agent": f"mdcms/{CLI_VERSION}"})
@ -660,15 +667,84 @@ def _http_get(url: str) -> bytes:
return resp.read() return resp.read()
def _http_get_github(url: str) -> bytes:
"""HTTP GET with GitHub API Accept header (for Contents API responses)."""
req = urllib.request.Request(
url,
headers={
"User-Agent": f"mdcms/{CLI_VERSION}",
"Accept": "application/vnd.github.v3+json",
},
)
ctx = ssl.create_default_context(cafile=certifi.where())
with urllib.request.urlopen(req, timeout=15, context=ctx) as resp:
return resp.read()
# ─── Site manifest generation ─────────────────────────────────
def generate_site_manifest(site_path: Path):
"""Write mdcms.json to site_path listing all deployable files and empty dirs."""
files = []
empty_dirs = []
for entry in sorted(site_path.rglob("*")):
rel = entry.relative_to(site_path)
# Skip anything inside a hidden directory or with a hidden name
if any(p.startswith(".") for p in rel.parts):
continue
if entry.is_file():
rel_str = str(rel).replace("\\", "/")
if rel_str != MANIFEST_FILENAME:
files.append(rel_str)
elif entry.is_dir():
# Only list dirs that have no non-hidden children
visible = [c for c in entry.iterdir() if not c.name.startswith(".")]
if not visible:
empty_dirs.append(str(rel).replace("\\", "/"))
manifest: dict = {
"mdcms": read_site_version(site_path) or "0.4",
"files": files,
}
if empty_dirs:
manifest["dirs"] = empty_dirs
(site_path / MANIFEST_FILENAME).write_text(
json.dumps(manifest, indent=2, ensure_ascii=False), encoding="utf-8"
)
click.echo(f" Wrote {MANIFEST_FILENAME} ({len(files)} files)")
# ─── Template download ──────────────────────────────────────── # ─── Template download ────────────────────────────────────────
def download_template(dest: Path, base_url: str = TEMPLATE_BASE_URL): def _parse_github_url(url: str) -> "tuple | None":
"""Download the mdcms starter template using template-manifest.json.""" """Return (owner, repo, branch, subpath) for a GitHub URL, else None."""
base = base_url.rstrip("/") m = GITHUB_URL_RE.match(url.strip())
click.echo(f"Downloading site template into {dest} ...") if not m:
return None
owner = m.group(1)
repo = m.group(2)
branch = m.group(3) or "main"
subpath = (m.group(4) or "").strip("/")
return owner, repo, branch, subpath
def _fetch_manifest(base_url: str) -> "dict | None":
"""Fetch mdcms.json from base_url. Returns parsed dict or None if not found."""
url = base_url.rstrip("/") + "/" + MANIFEST_FILENAME
try: try:
manifest_url = f"{base}/{TEMPLATE_MANIFEST}" data = _http_get(url)
manifest = json.loads(_http_get(manifest_url).decode("utf-8")) manifest = json.loads(data.decode("utf-8"))
if isinstance(manifest.get("files"), list):
return manifest
except Exception:
pass
return None
def _apply_manifest(manifest: dict, base_url: str, dest: Path):
"""Download all files in manifest from base_url into dest."""
base = base_url.rstrip("/")
for rel in manifest.get("files", []): for rel in manifest.get("files", []):
file_dest = dest / rel file_dest = dest / rel
file_dest.parent.mkdir(parents=True, exist_ok=True) file_dest.parent.mkdir(parents=True, exist_ok=True)
@ -676,6 +752,63 @@ def download_template(dest: Path, base_url: str = TEMPLATE_BASE_URL):
file_dest.write_bytes(_http_get(f"{base}/{rel}")) file_dest.write_bytes(_http_get(f"{base}/{rel}"))
for rel in manifest.get("dirs", []): for rel in manifest.get("dirs", []):
(dest / rel).mkdir(parents=True, exist_ok=True) (dest / rel).mkdir(parents=True, exist_ok=True)
def _download_tree_api(api_url: str, dest: Path, depth: int = 0):
"""Recursively download from the GitHub Contents API (fallback when no manifest)."""
items = json.loads(_http_get_github(api_url).decode("utf-8"))
for item in items:
item_dest = dest / item["name"]
if item["type"] == "dir":
item_dest.mkdir(parents=True, exist_ok=True)
_download_tree_api(item["url"], item_dest, depth + 1)
elif item["type"] == "file":
click.echo(f" {' ' * depth}{item['name']}")
item_dest.parent.mkdir(parents=True, exist_ok=True)
item_dest.write_bytes(_http_get(item["download_url"]))
def download_template(dest: Path, source: str = None):
"""Download a site template from a URL or GitHub address.
source may be:
- A GitHub repo URL (https://github.com/owner/repo or .../tree/branch/path)
- Any HTTPS URL pointing to a deployed mdcms site that has mdcms.json
- None uses the built-in mdcms starter template
"""
effective = (source or TEMPLATE_BASE_URL).rstrip("/")
click.echo(f"Downloading site template into {dest} ...")
try:
github = _parse_github_url(effective)
if github:
owner, repo, branch, subpath = github
raw_base = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}"
if subpath:
raw_base = f"{raw_base}/{subpath}"
manifest = _fetch_manifest(raw_base)
if manifest is not None:
_apply_manifest(manifest, raw_base, dest)
else:
# No manifest — fall back to GitHub Contents API tree walk
api_url = f"https://api.github.com/repos/{owner}/{repo}/contents"
if subpath:
api_url = f"{api_url}/{subpath}"
if branch not in ("main", "master"):
api_url += f"?ref={branch}"
_download_tree_api(api_url, dest)
else:
manifest = _fetch_manifest(effective)
if manifest is None:
if source:
raise click.ClickException(
f"No {MANIFEST_FILENAME} found at {effective}.\n"
"The URL must point to a deployed mdcms site with a manifest, "
"or to a GitHub repository."
)
raise click.ClickException(
f"Could not fetch template manifest from {effective}"
)
_apply_manifest(manifest, effective, dest)
click.echo(click.style("Template downloaded successfully.", fg="green")) click.echo(click.style("Template downloaded successfully.", fg="green"))
except urllib.error.URLError as e: except urllib.error.URLError as e:
raise click.ClickException(f"Download failed: {e}") raise click.ClickException(f"Download failed: {e}")
@ -715,12 +848,22 @@ def cli():
@cli.command() @cli.command()
@click.argument("name") @click.argument("name")
@click.argument("path", required=False, default=None, type=click.Path()) @click.argument("path", required=False, default=None)
def register(name, path): @click.option("--from", "source", default=None, metavar="URL",
help="Download template from a GitHub repo or deployed site URL.")
def register(name, path, source):
"""Register a site by NAME at PATH (default: current directory). """Register a site by NAME at PATH (default: current directory).
If no mdcms site is found at the target path, the starter template is PATH may be a local directory or a URL to download from. If no mdcms
downloaded from GitHub automatically. site is found at the local path, the template is downloaded from --from
(or PATH if it is a URL, or the built-in mdcms starter by default).
\b
Examples:
mdcms register mysite
mdcms register mysite ./mydir
mdcms register mysite https://github.com/owner/repo
mdcms register mysite --from https://example.com/deployed-site
""" """
reg = load_registry() reg = load_registry()
@ -729,6 +872,12 @@ def register(name, path):
f"'{name}' is already registered. Use 'mdcms delete {name}' to remove it first." f"'{name}' is already registered. Use 'mdcms delete {name}' to remove it first."
) )
# If PATH looks like a URL, treat it as the download source rather than a local path.
if path and path.startswith(("http://", "https://", "git://")):
if source is None:
source = path
path = None
site_path = Path(path).resolve() if path else Path.cwd() site_path = Path(path).resolve() if path else Path.cwd()
if not site_path.is_dir(): if not site_path.is_dir():
@ -746,7 +895,7 @@ def register(name, path):
if site_version is None: if site_version is None:
click.echo(f"No mdcms site found at {site_path}.") click.echo(f"No mdcms site found at {site_path}.")
download_template(site_path) download_template(site_path, source)
site_version = read_site_version(site_path) site_version = read_site_version(site_path)
if site_version is None: if site_version is None:
raise click.ClickException( raise click.ClickException(