The Dataset

This analysis is built on tuition, fee, and living expense data scraped directly from the official websites of 1,861 U.S. universities offering graduate and professional programs during the 2025–2026 academic year. The source institution list was drawn from IPEDS (the Integrated Postsecondary Education Data System maintained by the National Center for Education Statistics), filtered to schools reporting graduate-level completions.

The final dataset contains 7,333 program-institution combinations across 28 canonical program types, covering every major graduate and professional degree in the country: MD, JD, MBA, DNP, PA, DPT, MFA, MSW, and twenty others.

Data Collection Pipeline

An 11-step pipeline identified, fetched, classified, and extracted cost data from institutional web pages:

  1. Target building from IPEDS institution list
  2. Domain resolution, including subdomain expansion for multi-campus systems
  3. Page fetching with JavaScript rendering, PDF extraction, and OCR
  4. Page classification (tuition table, narrative, fee schedule, etc.)
  5. LLM-assisted data extraction with structured schemas and fee decomposition (each mandatory charge captured as a named, categorized, waivability-tagged component)
  6. Entity resolution normalizing raw program type strings into canonical labels
  7. Final assembly with derived fields (annual COA, program duration, total cost)
  8. Automated review queue generation from validation flags
  9. Manual review of flagged items
  10. Incremental re-extraction for error rows
  11. Systematic data cleanup and deduplication

Data Cleaning

Starting from 12,746 raw extracted rows, the cleanup pipeline removed duplicates, corrected rate inversions, imputed missing living expenses using university-level medians, and consolidated near-duplicate records. The final output contains 7,333 rows across 1,861 universities.

Funding Gap Calculation

For each unique combination of program, institution, and residency status (resident, non-resident, or single-rate):

Annual Funding Gap = max(0, Annual COA - Loan Cap)

Where Annual COA = Yearly Tuition + Mandatory Fees + Estimated Annual Living Expenses. The loan cap is $50,000 for the 11 federally designated professional degrees under 34 CFR § 668.2, and $20,500 for all other graduate programs.

Total Funding Gap = Annual Funding Gap × Years to Completion

The annual gap measures the per-year shortfall; the total gap measures the cumulative shortfall over the life of the degree. Both figures appear throughout this report, clearly labeled.

Data Quality

The dataset combines automated extraction with manual spot-checks at 336 universities (3,153 program rows, or 43.0% of the dataset). Spot-checked figures are traceable to primary institutional sources: bursar pages, COA PDFs, and registrar fee schedules. Extraction confidence across all rows: 75.9% HIGH, 15.1% MEDIUM, 9.0% LOW. No row is excluded from analysis regardless of confidence level.