Back to Investor Brief

Investor brief · Provider graph

Building a defensible menopause provider graph

Pause-Health.ai constructs its own menopause provider graph from CMS NPPES, state board data, and clinic-site service detection — fully public-source, ToS-clean, and compounding with every referral we run.

Why this exists

The Menopause Society's "Find a Menopause Practitioner" directory is the field's best public quality signal, but it is not licensable today: their terms of use explicitly prohibit scraping, republishing, and embedding. Even if we eventually negotiate access (see the Menopause Society strategy), MSCPs are a small subset of the total menopause-relevant provider pool.

Pause needs a complete, defensible provider graph anyway. We build it from public-domain primary sources, score it with our own model, and let it compound through closed-loop outcomes data. This is a long-term moat.

Data sources

CMS NPPES (NPI Registry)

Public domain · bulk download + REST API

All US healthcare providers with an NPI. Includes taxonomy codes, primary practice address, license state, and authoritative provider identity. ~6M records, refreshed weekly.

  • PurposeAuthoritative provider identity. The NPI is the join key that everything else hangs off.

State medical board licensure

Public records · API where available (CA, TX, NY, FL), bulk for the rest

Active license status, license history, disciplinary actions. Variable schema per state; we normalize into a single internal model.

  • PurposeFilter to currently-licensed providers. Surface disciplinary actions as a downweight in our trust score.

NPPES taxonomy filter

Derived

Narrow ~6M providers to ~80K candidates by filtering for taxonomies relevant to menopause care: OB/GYN, Family Medicine, Internal Medicine, Endocrinology, Nurse Practitioner (women's health), Certified Nurse Midwife, Physician Assistant (women's health).

  • PurposeCuts the candidate set by ~75× before we spend any compute on clinic-site analysis.

Clinic-site service detection

Derived · Pause-built

For each candidate clinic, fetch the public clinic website and run structured-data extraction for explicit mentions of menopause, HRT, perimenopause, hormone replacement, vasomotor, and related services. Caching, rate-limiting, robots.txt-respecting.

  • PurposeDistinguishes general OB/GYNs from clinicians actually marketing menopause services.

Trusted third-party verification

Public-facing third-party directories

Cross-check against certifiedmenopause.com and similar verified-provider sites for additional credibility signal. We never republish; we only use as a sanity check against our own scoring.

  • PurposeReduce false positives in our scoring. Catch credential-holders we might have missed.

Outcomes signal (closed loop, Phase 2)

Pause-internal

Once we have referrals flowing through Pause, the patient and provider outcomes from those referrals become the strongest possible scoring signal — and one no one else has.

  • PurposeThe actual moat. Every successful referral makes the graph better; every poor one downweights the destination.

Scoring model

FactorWeightWhat it captures
Credential signalHighestMSCP / NCMP / ABMS board certification in OB/GYN, IM, Endo, or FM. Self-attested in pilot; verified against primary sources before any pilot signs.
Service-mention signalHighClinic-site explicitly lists menopause / HRT / perimenopause services. Catches the clinicians who self-identify as menopause-serious.
License standingGatingActive license, no current disciplinary action. Anything below this is a hard exclude, not a downweight.
Geographic coverageMediumDistance to patient, accepting-new-patients flag (where available), insurance match. Practical referability.
Outcomes feedbackCompoundingPause's own referral outcomes data. Starts at zero, grows monotonically with usage. This is what eventually outranks every other signal.

Strategic considerations

Why NPPES is the right substrate

It is public domain, refreshed weekly, and used by every other healthcare data product. There is no licensing complication and no terms-of-use trap.

Why we don't just buy a vendor graph

The commercial provider graphs (Definitive Healthcare, IQVIA OneKey, etc.) are excellent for general healthcare. None of them score for menopause specifically. We would still need to build the menopause overlay — so we just build the whole thing.

Why this is a moat

Once Pause is producing referrals at scale, the outcome data we capture from each referral is uncopyable. The graph improves with every patient we serve. New entrants have to start at zero.

Compliance posture

Everything we ingest is public information. We respect robots.txt and rate limits. We carry provenance for every field we surface. We expose a provider opt-out mechanism.

Phased plan

Phase 0 — Decide and design

Now

Decision documented (this page). Data model for the provider graph defined. Pause-internal review of compliance posture.

Phase 1 — NPPES + taxonomy filter

2 weeks

Ingest the NPPES bulk dump, normalize, filter to menopause-relevant taxonomies. Output: ~80K candidate provider rows in our internal store.

Phase 2 — State license + service detection

4–6 weeks

Wire the top-volume state board sources. Run the clinic-site service detector against the candidate set. Score and rank. Output: a ranked menopause provider list with provenance.

Phase 3 — Closed-loop scoring

After first 1,000 referrals

Pull patient and provider outcomes from Pause's own data. Re-weight the scoring model. From here, the graph self-improves.

Read deeper