Help: Metadata schema

From WikiLeague, the free baseball governance encyclopedia.

All Help pages

METADATA_SCHEMA

Every document in the archive is paired with a .md file containing YAML frontmatter (the structured metadata) followed by an optional prose body (a factual summary, key excerpts, related-document notes).

The metadata file lives in the same folder as the document it describes, sharing the document's base filename. So documents/antitrust-and-courts/1972-06-19_caselaw_flood-v-kuhn.pdf is paired with documents/antitrust-and-courts/1972-06-19_caselaw_flood-v-kuhn.md.

This file is the canonical schema. The downstream front-end (built later) will parse against this schema. Field names, types, and controlled values are stable; changes require a schema version bump.

Current schema version: 1.2.0 (bumped 2026-05-19 Phase 3 Item 3: added optional cited_documents field for the citation graph — see §5 below and CHANGELOG.md. Old 1.0.0 / 1.1.0 metadata files remain valid; the bump is purely additive and non-breaking.)

1. Schema (annotated)

---
# ─── Schema control ──────────────────────────────────────────
schema_version: "1.0.0"          # Required. Always present for migration safety.

# ─── Identity ────────────────────────────────────────────────
title: ""                        # Required. Full official title as published.
short_title: ""                  # Optional. Common citation form ("Flood v. Kuhn").
also_known_as: []                # Optional. Array of alternate titles or popular names.

# ─── Date ────────────────────────────────────────────────────
date: ""                         # Required. ISO 8601, partial dates allowed (YYYY, YYYY-MM, YYYY-MM-DD).
date_precision: ""               # Required. One of: "day" | "month" | "year".
date_type: ""                    # Required. What `date` refers to: "issued" | "signed" |
                                 #   "ratified" | "decided" | "effective" | "published" | "introduced".
effective_period:                # Optional. For docs with a defined term (CBAs, leases).
  start: ""                      # ISO 8601.
  end: ""                        # ISO 8601, or "open" if indefinite.

# ─── Classification ──────────────────────────────────────────
doc_type: ""                     # Required. Controlled value from TAXONOMY.md.
category: ""                     # Required. Matches a folder under documents/.
subcategories: []                # Optional. Free-form tags drawn from TAXONOMY.md.
tags: []                         # Optional. Free-form, lowercase, underscore-separated.

# ─── Parties / actors ────────────────────────────────────────
parties:                         # Optional but expected for case law, agreements, contracts.
  - role: ""                     # e.g., "petitioner", "respondent", "signatory", "issuer".
    name: ""                     # Full name as it appears in the document.
    affiliation: ""              # Optional. e.g., "MLBPA", "Office of the Commissioner".
jurisdiction: ""                 # Optional. e.g., "U.S. Supreme Court", "S.D.N.Y.",
                                 #   "9th Cir.", "U.S. Congress", "Office of the Commissioner".

# ─── Citation ────────────────────────────────────────────────
citation:
  bluebook: ""                   # Required for case law if known. Otherwise "unknown" or omit.
  parallel: []                   # Optional. Parallel reporters.
  public_law: ""                 # For federal statutes.
  statutes_at_large: ""          # For federal statutes.
  hearing_id: ""                 # For Congressional hearings (e.g., "S. Hrg. 107-427").
  case_number: ""                # Docket number for filings/cases without a reporter citation.
  court: ""                      # Court of filing/decision when not captured in `jurisdiction`.
  other: ""                      # Catch-all for non-standard citations.

# ─── Source / provenance ─────────────────────────────────────
source:
  retrieved_date: ""             # Required. ISO 8601. When this file was acquired.
  retrieved_by: ""               # Required. "claude/cowork-{session-id}" or "alex" or similar.
  primary_url: ""                # Where this copy was downloaded. Strongly preferred for needs_review+.
  lead_url: ""                   # Optional. For placeholders: where we expect to find the file.
  archive_url: ""                # Wayback Machine snapshot. Required for `verified` web-sourced.
  original_publisher: ""         # The body that originally published this document.
  confirmation_sources:          # Required for `verified`. ≥2 entries.
    - url: ""
      publisher: ""
      retrieved_date: ""
      notes: ""                  # e.g., "compared text verbatim, identical"
  physical_provenance: ""        # Optional. For docs from physical archives:
                                 #   "Bowie Kuhn Papers, box 4, folder 12, Cleveland Public Library"
  snapshot_notes: ""             # Optional. e.g., "Wayback snapshot blocked by robots.txt;
                                 #   verified against print copy in Bobby's library".

# ─── File ────────────────────────────────────────────────────
file:
  filename: ""                   # Required for `needs_review`+. The companion file's name.
  format: ""                     # Required for `needs_review`+. "pdf" | "txt" | "html" | "docx" | "epub" | "other".
  pages: 0                       # Optional. For paginated docs.
  size_bytes: 0                  # Optional but encouraged.
  sha256: ""                     # Required for `needs_review`+.
  previous_hashes:               # Optional. For files that have been replaced.
    - hash: ""
      replaced_on: ""
      reason: ""
  previous_filenames: []         # Optional. Preserved when renamed.
  additional_files:              # Optional. For multi-part doc sets covered by one metadata file.
    - filename: ""
      format: ""
      sha256: ""
      description: ""            # e.g., "exhibit volume 1", "appendix A"
  processing_notes: ""           # Optional. "OCR'd with ocrmypdf v15.4.0".

# ─── Status ──────────────────────────────────────────────────
status: ""                       # Required. "placeholder" | "needs_review" | "verified".
status_history:                  # Append-only log of status changes.
  - status: ""
    date: ""
    by: ""
    reason: ""

# ─── Relationships ───────────────────────────────────────────
supersedes: []                   # Slugs of documents this one replaces.
superseded_by: []                # Slugs of documents that replace this one.
amends: []                       # Documents this one formally amends.
amended_by: []
related_documents: []            # Slugs of tangentially-related documents
                                 #   (broader than citation: legal context, parallel cases).
cited_documents: []              # Slugs of documents this one formally CITES in its
                                 #   body — case-law citations, statute references, etc.
                                 #   Reverse-resolved at build time into a "Cited by"
                                 #   block on each referenced doc, producing the
                                 #   archive's citation graph. Distinct from
                                 #   related_documents (which is a broader
                                 #   tangential-relevance set). Added in schema 1.2.0.
exhibits: []                     # Slugs of attached exhibits (for filings, hearings).

# ─── Substance ───────────────────────────────────────────────
abstract: ""                     # Optional. 1-3 sentence neutral description.
key_provisions: []               # Array of strings. Required for `verified`. Each provision
                                 #   should be factual and attributable to a specific section
                                 #   of the document.
quoted_excerpts:                 # Optional but valuable. Direct quotes.
  - text: ""
    citation: ""                 # e.g., "Section 11(a)(2)" or "Slip op. at 14".
  - text: ""
    citation: ""

# ─── Cross-references (optional, neutral) ────────────────────
notes: ""                        # Free-form research notes. Acceptable place for
                                 #   "see also" pointers, ambiguities, things to follow up.

# ─── Audit ───────────────────────────────────────────────────
last_modified: ""                # ISO 8601. Updated on every metadata change.
last_modified_by: ""             # Session ID or human name.
---

# {Title}

Optional prose body. Use this for:
- A neutral, factual summary of the document (longer than the `abstract` field).
- Reproduced excerpts beyond what fits cleanly in `quoted_excerpts`.
- Cross-references to related documents in the archive (link to their `.md` files).
- A "research notes" section for context discovered during retrieval.

The prose body should read like a law librarian's catalog note. It does not interpret, advocate, or editorialize. Show analysis lives elsewhere.

2. Required fields by status

Field placeholder needs_review verified
schema_version
title
date, date_precision, date_type
doc_type, category
source.retrieved_date, source.retrieved_by
status, status_history
source.primary_url or physical_provenance
file.filename, file.format, file.sha256
source.archive_url (for web-sourced)
source.confirmation_sources (≥2)
citation (if applicable)
key_provisions (≥1)
parties (where applicable)

3. Worked example — Flood v. Kuhn

This is a complete, illustrative metadata file. Use it as the template.

---
schema_version: "1.0.0"

title: "Curt Flood, Petitioner, v. Bowie K. Kuhn, Commissioner of Baseball, et al."
short_title: "Flood v. Kuhn"
also_known_as:
  - "Flood v. Kuhn (1972)"

date: "1972-06-19"
date_precision: "day"
date_type: "decided"

doc_type: "case_law"
category: "antitrust-and-courts"
subcategories:
  - "reserve_clause"
  - "antitrust_exemption"
tags:
  - "scotus"
  - "antitrust"
  - "reserve_clause"
  - "curt_flood"
  - "marvin_miller"
  - "labor"

parties:
  - role: "petitioner"
    name: "Curt Flood"
    affiliation: ""
  - role: "respondent"
    name: "Bowie K. Kuhn, Commissioner of Baseball, et al."
    affiliation: "Office of the Commissioner of Baseball"
jurisdiction: "U.S. Supreme Court"

citation:
  bluebook: "Flood v. Kuhn, 407 U.S. 258 (1972)"
  parallel:
    - "92 S. Ct. 2099"
    - "32 L. Ed. 2d 728"
  other: ""

source:
  retrieved_date: "2026-05-17"
  retrieved_by: "claude/cowork-9167cb28"
  primary_url: "https://tile.loc.gov/storage-services/service/ll/usrep/usrep407/usrep407258/usrep407258.pdf"
  archive_url: "https://web.archive.org/web/2026*/tile.loc.gov/.../usrep407258.pdf"
  original_publisher: "U.S. Government Publishing Office (U.S. Reports)"
  confirmation_sources:
    - url: "https://supreme.justia.com/cases/federal/us/407/258/"
      publisher: "Justia"
      retrieved_date: "2026-05-17"
      notes: "Compared verbatim against LoC PDF; identical text."
    - url: "https://www.law.cornell.edu/supremecourt/text/407/258"
      publisher: "Cornell Legal Information Institute"
      retrieved_date: "2026-05-17"
      notes: "Compared verbatim against LoC PDF; identical text."

file:
  filename: "1972-06-19_caselaw_flood-v-kuhn.pdf"
  format: "pdf"
  pages: 47
  size_bytes: 0                  # populate after download
  sha256: ""                      # populate after download
  processing_notes: ""

status: "placeholder"             # demo file — promote when actually acquired
status_history:
  - status: "placeholder"
    date: "2026-05-17"
    by: "claude/cowork-9167cb28"
    reason: "Initial entry as schema example."

supersedes: []
superseded_by: []
related_documents:
  - "1922_caselaw_federal-baseball-club-v-national-league"
  - "1953-11-09_caselaw_toolson-v-new-york-yankees"
  - "1998-10-27_legislation_curt-flood-act-of-1998"

abstract: "5–3 Supreme Court decision affirming baseball's antitrust exemption originally established in Federal Baseball Club v. National League (1922) and reaffirmed in Toolson v. New York Yankees (1953). Justice Blackmun's majority opinion characterized the exemption as 'an anomaly' and 'an aberration' but declined to overturn it on stare decisis grounds, deferring to Congress."

key_provisions:
  - "Held that baseball's exemption from federal antitrust law established in Federal Baseball Club (1922) remains in force."
  - "Recognized the exemption as 'an anomaly' and 'an aberration' that has 'remained anomalous and aberrational from the beginning,' but declined to overturn it on stare decisis grounds."
  - "Identified Congressional action, not judicial intervention, as the appropriate remedy."
  - "Justice Blackmun's opening section recited a list of memorable baseball figures, an unusual stylistic choice that drew commentary."

quoted_excerpts:
  - text: "Professional baseball is a business and it is engaged in interstate commerce."
    citation: "407 U.S. 258, 282"
  - text: "If there is any inconsistency or illogic in all this, it is an inconsistency and illogic of long standing that is to be remedied by the Congress and not by this Court."
    citation: "407 U.S. 258, 284"

notes: "Curt Flood's challenge originated in his 1969 trade from the St. Louis Cardinals to the Philadelphia Phillies. Related labor context: this case predates the 1975 Messersmith-McNally arbitration ruling that effectively dismantled the reserve clause through CBA mechanics rather than judicial ruling."

last_modified: "2026-05-17"
last_modified_by: "claude/cowork-9167cb28"
---

# Flood v. Kuhn

The Supreme Court's third and most recent direct ruling on baseball's antitrust exemption. The Court reaffirmed the exemption established by Federal Baseball Club v. National League (1922) and continued by Toolson v. New York Yankees (1953), while explicitly acknowledging the exemption is anomalous and that no other professional sport enjoys it.

Justice Harry Blackmun authored the majority opinion. His introductory section recited the names of 88 notable baseball players, an unusual stylistic departure that has been cited in legal commentary on judicial opinion writing for decades.

The 5–3 decision (Justice Powell did not participate) declined to overturn on stare decisis grounds, with Blackmun reasoning that Congress had been aware of the exemption and had not acted to remove it, and that retroactive disruption of the industry's structural reliance on the exemption would be inappropriate for the Court to impose. The opinion explicitly invited Congressional action.

Congressional response came twenty-six years later in the Curt Flood Act of 1998, which removed the antitrust exemption only as it applied to major-league players' labor relations — leaving the broader exemption (including its application to franchise relocation, minor league players, and amateur draft) intact.

## Related documents in the archive

- `1922_caselaw_federal-baseball-club-v-national-league.md` — the original case establishing the exemption.
- `1953-11-09_caselaw_toolson-v-new-york-yankees.md` — the first reaffirmance.
- `1998-10-27_legislation_curt-flood-act-of-1998.md` — the partial Congressional response invited by this opinion.
- `1975_arbitration_messersmith-mcnally.md` — the parallel labor-side dismantling of the reserve clause via grievance arbitration.

4. Notes for schema users

  • Empty strings vs unknown: Use empty string ("") when the field is not applicable. Use the literal string "unknown" when the field is applicable but the value is not yet known. This distinction matters for the front-end.
  • Slugs in relationship fields: Use the document's base filename without extension (e.g., 1972-06-19_caselaw_flood-v-kuhn). This makes cross-linking unambiguous.
  • Arrays: Always use list form for fields typed as arrays, even if there's only one entry, and even if empty. Don't switch between scalar and array.
  • YAML style: Use block-style YAML throughout (key: then newline then indented - for list items, or key: then newline then indented key: for dict members). Do NOT use flow-style (inline [{...}] or {key: value, ...} on a single line). Flow-style is brittle for the build's parser and inconsistent with the rest of the archive. See _templates/metadata-template.md for the canonical format.
  • Status history is append-only: never edit prior entries; add new ones for every transition.
  • Updating metadata: every change must bump last_modified and update last_modified_by.

5. Collections (multi-document research entries)

Some research entries are document sets rather than single files. The archive stores them as collection folders, with one folder per collection under the appropriate category. See NAMING.md §8 for the full filename convention; here is the metadata layout:

5.1 Collection structure

documents/<category>/<date>_<doctype>_<slug>/
    README.md                            ← collection-level metadata (this schema)
    MANIFEST.md                          ← (optional) free-form per-file inventory
    {date}_{doctype}_{sub-slug}.md       ← (optional) individually-citable
    {date}_{doctype}_{sub-slug}.pdf         sub-documents, each with its own metadata
    *.pdf, *.txt                         ← (optional) non-citable supporting files,
                                           tracked under README.md's
                                           file.additional_files

5.2 Collection-level README.md

The collection's README.md uses this same metadata schema in full. Its frontmatter declares:

  • title: the collection's title (e.g., "1915 NARA Federal League Case Files Collection").
  • date, date_precision, date_type: a representative date for the collection (often year-precision for spanning collections).
  • doc_type: the primary doctype of the constituent files (e.g., filing for a court case-files collection).
  • status: the collection-level status. Verifies if all sub-documents and supporting files have been verified per STANDARDS.md.
  • file.additional_files: list non-citable supporting files. Each entry has filename, format, sha256, description.
  • notes: free-form descriptive intro to the collection.

5.3 MANIFEST.md (optional)

A free-form per-file inventory: list of files in the collection, per-file SHA256, page counts where relevant, descriptive notes. MANIFEST.md does NOT need its own YAML frontmatter — it is a sibling document of README.md, not its own indexable entry.

5.4 Sub-document metadata files

When a sub-document is separately citable in its own right (has its own holding, its own citation, its own provenance), it gets its own .md paired metadata file following the normal {date}_{doctype}_{sub-slug} convention. Each appears as a separate INDEX.md row using the slug <collection-slug>/<sub-slug>.

Sub-document metadata files reference the collection in notes (free-form) but do not introduce new non-schema fields.

5.5 Examples in the archive

  • documents/antitrust-and-courts/1915_filing_federal-league-case-files-nara/ (now 1915_filing_federal-league-case-files-nara/ after the Pass B rename) — NARA box, 121 case files. README.md is the collection metadata; MANIFEST.md lists per-file SHAs; no per-file .md because individual case files aren't separately citable.
  • documents/governance/1921_agreement_major-league-agreement/ — README.md is the collection metadata for the 1921 Major League Agreement, which has multiple file formats (HTML, plain text) of the same single document.
  • documents/arbitration-and-grievances/1975_arbitration_messersmith-mcnally/ — README.md is the collection metadata for the Messersmith-McNally arbitration. Two sub-documents (1976-02-03_caselaw_kansas-city-royals-v-mlbpa-wd-mo and 1976-03-09_caselaw_kansas-city-royals-v-mlbpa-8th-cir) have their own metadata because they are separately-citable federal-court rulings that built on the arbitration.
  • documents/legislation-and-hearings/2005_hearings_steroid-batch/ — README.md is the collection metadata for the 2005 House Government Reform Committee steroid hearings; MANIFEST.md inventories the 19 individual witness testimony files. None of the sub-files have their own .md because they are sub-components of the larger hearing record, not separately-citable.