---
name: pjp-newsletter-sweep
display_name: PJP Newsletter Sweep
description: Each morning, scan the Axios Pro Rata newsletter against Pickle Jar Partners' CRM, draft personalized outreach emails for company matches, and score non-match mentions for fit against PJP's investment thesis.
type: custom
owner: Pickle Jar Partners
inputs:
  - newsletter_text          # raw text of the day's Axios Pro Rata
  - crm_sheet                # Google Sheet (or xlsx) with CRM rows
outputs:
  - match_drafts             # list of Gmail drafts, one per CRM match
  - watchlist_brief          # one-page summary of high-fit non-matches with rationale
  - run_log                  # what was extracted, what matched, what didn't, why
---

# PJP Newsletter Sweep

A daily skill that turns Pickle Jar Partners' 2-3 hour manual newsletter triage into a 5-minute review. Reads the Axios Pro Rata newsletter, cross-references against PJP's CRM, drafts personalized congratulations / check-in emails for known contacts, and surfaces unknown companies that fit PJP's thesis.

This skill is deterministic where it can be (string normalization, domain matching) and uses an LLM only where judgment is needed (entity extraction, email tone, fit scoring). Every run produces an audit log so partners can trust the output.

---

## When to use this skill

Trigger this skill once per business day, in the morning (08:30 local), after the Axios Pro Rata email lands. Also runs on-demand if a partner forwards a back-issue.

Do **not** run it for unrelated newsletters — the entity extraction prompts are tuned to Pro Rata's structure (sectioned by deal type, dollar amounts inline, executive moves in bullets).

---

## Inputs

| Input | Source | Notes |
|---|---|---|
| `newsletter_text` | Gmail message body, `from:axios.com subject:"Axios Pro Rata"` | Strip HTML to text. Keep the section headers — they encode the news type. |
| `crm_sheet` | `Pickle Jar Partners CRM.xlsx` (sheet `Companies`) | Columns: `Company Name`, `Company Domain`, `Company Description`, `Contact Name`, `Contact Email`, `Relationship Type`, `Last Contact Date`, `Notes`. |
| `pjp_thesis` | `pjp_thesis.md` (in the agent's attached files) | Full PJP investment thesis: stage range, sector priorities, anti-thesis, geography, scoring rubric, watchlist threshold. The partner-of-record updates this quarterly; the agent reads it at every run. |

If `pjp_thesis.md` is missing, the skill falls back to a one-line default: *growth-stage (Series B–E), US/Europe HQ, B2B SaaS / fintech / AI infra. Pass on consumer-only and pre-seed.* This default is intentionally minimal — the real thesis lives in the file, not in the agent's prompt.

---

## Outputs

### 1. Match drafts (Gmail)
For every CRM company mentioned in the newsletter, create a Gmail draft addressed to the CRM contact, with subject and body filled in. Drafts are NOT auto-sent — partners review and click send.

### 2. Morning brief (single email to all partners)
ONE unified email that lands in the partners' inbox at 8:30 — the entry point for the morning. Combines:
  - **Header:** counts at a glance — drafts ready, watchlist size, negative-news watch.
  - **Drafts ready to send**, grouped by Relationship Type (Portfolio / Pipeline / Network), one bullet per draft with contact name, company, subject, and news_type. Partners scan, decide which to open in Drafts, click send.
  - **Watchlist**: companies in the newsletter that aren't in the CRM but score ≥ 7/10 against the thesis. Score, news headline, why-it-matches, suggested next step.
  - **Negative-news watch**: CRM contacts whose company had bad news (layoffs / lawsuit / departure). No congrats draft created — surfaced for the partner to decide on personal-touch outreach.
  - **Footer:** run summary with mention/match/draft counts and threshold reference.

The brief is the *one thing* a partner reads to start their morning. Drafts still live in the Gmail Drafts folder — the brief just tells them what's there. This replaces the older "separate watchlist email" pattern.

### 3. Run log (always)
A timestamped record of: how many companies were extracted, how many matched, why each non-match was excluded, draft IDs generated. Stored in the assignment's job log for audit.

---

## Procedure

### Step 1 — Extract company mentions

Run the LLM extraction prompt below against `newsletter_text`. The prompt returns a JSON array of mentions, one per company.

```
You are extracting company mentions from the Axios Pro Rata newsletter.

For each company mentioned in the text below, return a JSON object with:
- company_name: the exact name as written
- normalized_name: lowercase, no legal suffix ("Inc", "Ltd", "Corp"), no
  punctuation, common words like "the" stripped (e.g. "Stripe, Inc." → "stripe")
- domain_guess: best guess at primary domain, lowercase ("stripe.com")
- news_type: one of {funding_round, acquisition, hire, departure,
  product_launch, ipo, partnership, lawsuit, layoffs, other}
- headline_sentence: the single sentence in the newsletter that describes
  the news, verbatim
- amount_or_detail: dollar amount, valuation, role title, or other key
  detail (string, may be empty)

Return ONLY the JSON array. Skip companies that are mentioned only
incidentally (e.g. "ex-Google engineer" doesn't count as a Google mention).

Newsletter text:
<<<NEWSLETTER>>>
```

Validate the JSON. If parsing fails, retry once with the prompt prefix *"Your last output was not valid JSON. Return ONLY a JSON array."* If it still fails, log the error and continue with whatever did parse — partial output is better than none.

### Step 2 — Match against the CRM

For each mention, find a CRM row using this priority order. Stop at the first hit.

#### 2a. Normalize first
Before any comparison, normalize both sides:
- Lowercase.
- Strip legal suffixes: `inc`, `incorporated`, `llc`, `ltd`, `limited`, `corp`, `corporation`, `co`, `sa`, `gmbh`, `plc`, `ag`, `bv`, `nv`, `pbc`, `pty`. (Public-benefit corporations and South African Pty Ltd entities show up enough to matter.)
- Strip punctuation.
- Drop leading "the ".

So `"Stripe, Inc."`, `"Anthropic, PBC"`, and `"Rippling Inc"` all collapse to their bare-name form before matching.

#### 2b. Match priority

1. **Domain match (highest confidence).** `mention.domain_guess == crm.Company Domain` (case-insensitive). Trusted unconditionally.

2. **Exact normalized name match.** `normalized_mention_name == normalized_crm_name`. For the ambiguous-name list (see 2c), also requires a context-word hit OR a concrete `news_type` (funding/acquisition/IPO/hire/layoffs/launch).

3. **Substring match — TWO directions, each with different guards.** Both names must be ≥ 4 characters and not already equal.

    - **Direction A: mention shorter than CRM** (e.g. mention `"Scale"` vs CRM `"Scale AI"`). This is the legitimate "less-specific reference" case. Apply the ambiguous-name guard from 2c.

    - **Direction B: mention longer than CRM** (e.g. mention `"Figma Software"` vs CRM `"Figma"`). This is the **risky** case — usually a different company that happens to share a name root, occasionally a real subsidiary. Reject unless **either** of:
      - Domain hit (`mention.domain_guess == crm.Company Domain`), or
      - At least one context word from `CONTEXT_WORDS[crm_name]` appears in the headline or amount/detail text.

      Without one of those, log `substring_long_skipped_no_corroboration` and skip.

4. **Fuzzy match (last resort).** Levenshtein ratio ≥ 0.88 on normalized names. Same ambiguous-name guard. Every fuzzy hit is logged so a human can spot-check.

#### 2c. Ambiguous-name guard

These one-word common-noun names need stronger evidence than name match alone, because they appear prosaically in newsletters: `Plaid`, `Ramp`, `Brex`, `Anduril`, `Deel`, `Scale`, `Notion`, `Linear`, `Mercury`.

For an ambiguous name, the match is rejected unless the headline contains a context word from a per-company list. Examples:
- `Notion` requires one of: `workspace`, `productivity`, `notion.so`, `notes`, `docs`.
- `Plaid` requires one of: `plaid.com`, `fintech`, `banking`, `open banking`, `financial data`.
- `Scale AI` requires one of: `data labeling`, `ai infrastructure`, `defense`, `scale.com`.

The full table lives in the `CONTEXT_WORDS` dict in `pjp_pipeline.py`.

Note: context words are also defined for **non-ambiguous CRM companies** (Stripe, Figma, Airtable, Databricks, Canva, Rippling, OpenAI, Anthropic), but only used in Direction B substring matching above. They're never required for a clean exact or domain match.

#### 2d. Tie-breaking

If two CRM rows match the same mention, pick the one with the most recent `Last Contact Date`. Log the collision.

### Step 3 — Draft emails for matches

The drafting goal is "a friend who happens to be on your cap table," not "a VC analyst fishing for intel." This is the difference between a draft a partner sends after a 5-second skim and a draft they delete and rewrite.

The drafting system has three layers: a structured prompt with hard rules, per-news-type opener guidance, and a critique-and-regenerate loop that catches what the first pass missed.

#### 3a. Hard rules — every draft must satisfy these

Drafts are rejected and regenerated if any of these fail:

1. **Greeting.** Body opens with `Hi <FirstName>,` on its own line.
2. **Real congratulations.** Body contains a real congrats word: `congrats`, `congratulations`, `well done`, `kudos`, `big move`, `huge congrats`. (Skipped for negative news_types.)
3. **NO em-dash.** Absolutely no `—` character (U+2014, the long horizontal line). It is the single strongest AI tell. Use a regular hyphen `-`, comma, period, semicolon, or parentheses instead. This applies to BOTH the body and the subject line. **This rule is non-negotiable.**
4. **Real CTA required.** Every draft must contain a specific call-to-action that gives the recipient something to act on. CTAs are OFFERS or PROPOSALS, never extractive asks. Valid CTA patterns:
   * Time-bound proposal: "Coffee in the next two weeks?", "Free for a quick call this Thursday/Friday?"
   * Specific offer: "Happy to make an intro to [SPECIFIC PERSON / SEGMENT] if it'd help with [SPECIFIC THING]"
   * Sourcing offer: "Let me know if I can help source [hire / customer / EU operator]"
   * Event reference: "Looking forward to seeing you at [SPECIFIC EVENT]"
   * Light open offer (Network only): "If you ever want a sounding board on [TOPIC], drop me a note"

   Detection: at least one of these markers must appear in the body: `?`, `happy to `, `let me know if`, `looking forward`, `if useful`, `if helpful`, `open to `, `drop me a`, `ping me`, `around for`, `if you ever want`, `would love to`, `let's grab`, `let's catch up`, `free for`.

   "Cheering you on", "No need to reply", "Just wanted to send a note" are SIGN-OFFS, not CTAs. They cannot stand alone — they must be paired with a real CTA above.
5. **No source attribution.** Never mention "Pro Rata", "Axios", "the newsletter", "this morning", "front and center". The contact knows where their news came from.
6. **No extractive asks.** Banned: "I'd love to hear your read", "compare notes", "from your seat", "from your vantage point", "lands internally", "internal dynamics", "positioning on both sides".
7. **No AI smell.** No "I hope this email finds you well", "I wanted to reach out", "I came across", "Just wanted to drop a quick note", "hope you're doing well".
8. **Sign-off.** End with `Best,\n<PartnerFirstName>` on its own line.
9. **Length** scales with relationship type:
   * Portfolio  → 45–110 words
   * Pipeline   → 65–130 words
   * Network    → 25–80 words
10. **Subject line.** Specific to the news event, < 50 characters. Forbidden tokens: `Saw`, `Axios`, `Pro Rata`, `Quick note`, `Touching base`, `Following up`, `Checking in`. NO em-dash.

The full banned-phrase list and forbidden-subject-token list live in `pjp_pipeline.py` (`BANNED_PHRASES`, `SUBJECT_BANNED`).

#### 3b. Per-news-type opener guidance

The opener is the most failure-prone part — it sets the tone for the whole email. Each news_type has a directional anchor:

| news_type | Opener guidance |
|---|---|
| `funding_round` | Explicitly congratulate on the round in the first sentence ("Saw the round news — congrats"). |
| `acquisition` (acquirer) | Praise the strategic fit ("Big move on the [TARGET] deal"). |
| `acquisition` (acquiree) | Congratulate on the deal landing. |
| `ipo` | Comment on the milestone with energy ("S-1 day", "Big day"). |
| `hire` (contact IS the hire) | Direct congrats to them ("Huge congrats on the new role"). |
| `hire` (contact NOT the hire) | Comment on the strategic value of the hire ("Strong addition"). |
| `product_launch` | Reference the product by name and say something concrete ("Saw [PRODUCT] ship — fun launch"). |
| `partnership` | Comment on why the partnership makes strategic sense. |

The opener is **directional, not a template** — the drafting model picks appropriate phrasing within the anchor, varying by relationship type.

#### 3c. Contact-is-or-isn't-the-news heuristic

Before drafting, check whether the CRM contact's last name appears in the headline. If yes (e.g., "Scale AI promoted David Kim to President" — David Kim IS the CRM contact), the draft is direct congratulations *to them* about *their* news. If no, the draft is about the company / a colleague / a deal.

Without this check, you get awkward emails congratulating someone about themselves in third person.

#### 3d. Tone scaling per relationship type

| Relationship | Tone | Length | CTA pattern |
|---|---|---|---|
| Portfolio | Warm, may reference shared history (board seat, prior conferences, last partner call) | 45-110 words | Time-bound proposal — "Coffee in the next two weeks?", "Free for a quick call this Thursday/Friday?", paired with a sourcing offer ("Happy to also help source [X]") |
| Pipeline | Engaged, professional, references prior thread from `Notes` | 65-130 words | Specific topical proposal — "Open to a 30-min call on [SPECIFIC TOPIC] in the next two weeks?", paired with a signal-share offer ("Happy to share signal from our end on adjacent customer accounts") |
| Network | Brief, respectful | 25-80 words | Light open offer — "If you ever want a sounding board on [TOPIC], drop me a note", "Looking forward to seeing you at [EVENT]" |

Every draft has a CTA, even Network. Network CTAs are just lighter and more open-ended (no time-bound proposal, no specific ask).

#### 3e. Notes-field handling

If `Notes` mentions a specific topic, weave the **substance** in naturally without quoting verbatim:

| Notes contain | Body may reference |
|---|---|
| "Series C conversation. Following up in Q1." | "the Series C thread we were on" |
| "Board observer. Quarterly check-in scheduled." | "shared board context" |
| "Met at SaaStr 2024." | "since SaaStr" |
| "Monthly partner call. Strong Q4 numbers." | "since our last partner call" |

If `Last Contact Date` > 90 days ago, include a one-line soft acknowledgment of the gap ("It's been a while").

#### 3f. Critique-and-regenerate loop

After generating a draft, run it through the critique check (3a). If any rule fails, regenerate with a corrective prompt that includes the specific failure reasons. Cap at 2 retries.

The critique is intentionally strict — it's the second LLM pass that catches what the first pass missed (forgetting the greeting, slipping a banned phrase, missing the congrats word). For a typical run, ~80% of drafts pass first time and ~20% need one regeneration; very few need two.

#### 3g. Saving — exactly one draft per contact

**The critique-and-regenerate loop is internal. Intermediate drafts are NOT saved.**

The loop may produce 1, 2, or 3 candidate drafts per contact before settling on a final version. Only the FINAL draft is persisted — either:
- The first one that passes critique, OR
- The last one produced after max retries (saved with the remaining issues noted in the run log so a partner can review).

**Never call the Gmail `createDraft` action inside the regenerate loop.** The save happens exactly once, after the loop exits. If the agent calls Gmail's draft-create action multiple times for the same contact, the contact's drafts folder accumulates duplicates — which is a critical bug. One match → one Gmail draft.

The same rule applies to the local audit trail: only the final draft is written to `email_drafts.md`. The run log can note how many regenerations were needed (audit signal) without storing every intermediate body.

### Step 4 — Score non-match mentions for fit

For mentions that did not match the CRM, score each against PJP's thesis on a 1-10 scale. This replaces the partners' "feelings-based" decision in step 4 of their original process.

The thesis is read from `pjp_thesis.md` (attached to the agent build) on every run. Partners edit that file directly when the thesis evolves; the agent picks up changes automatically on the next morning's run. No prompt edits, no redeploy.

Scoring rubric (sum of component scores, then clamp 1-10):

| Component | Range | How to score |
|---|---|---|
| Stage fit | 0-3 | +3 if news mentions Series B/C/D/E, growth round, or a strategic-led round. +1 if seed/A. 0 if M&A target, IPO, or stage unclear. |
| Sector fit | 0-3 | +3 if matches PJP's sector list (B2B SaaS, fintech, AI infra). +1 if adjacent (devtools, vertical SaaS). 0 if outside (consumer, biotech, energy). |
| Geography | 0-2 | +2 if HQ in US or Western Europe (infer from newsletter mentions like "SF-based", "London-headquartered"). +1 if elsewhere but English-speaking. 0 otherwise. |
| Signal quality | 0-2 | +2 if news indicates traction (revenue, growth, named investor). +1 if neutral. 0 if news is negative (lawsuit, layoffs, departure). |

Companies scoring ≥ 7 land in the watchlist brief. Below 7 are logged but not surfaced — partners can review the full log on demand.

If `pjp_thesis` is updated, re-score affects only future runs. Do not retroactively re-score past briefs.

### Step 5 — Assemble the unified morning brief

The morning brief is one email — sent to all partners, structured to be readable in 60 seconds. Sections in order:

1. **One-line header** with the day's date and at-a-glance counts.
2. **Drafts ready to send**, grouped by Relationship Type (Portfolio first, then Pipeline, then Network). Each row: contact name, company, draft subject, news_type. Partners scan to decide which drafts to review in Gmail.
3. **Watchlist**: non-CRM companies scoring ≥ 7/10. Each entry shows the news headline, the why-it-matches rationale, and the suggested next step.
4. **Negative-news watch**: CRM contacts where the news was layoffs / lawsuit / departure / down-round. No congrats was drafted; surfaced here so a partner can decide on a personal-touch outreach.
5. **Footer**: run summary with counts and threshold.

If no companies score ≥ 7, the watchlist section reads: *"No new companies passed the watchlist threshold today. <N> mentions reviewed and logged."*

### Step 6 — Deliver

- **Match drafts** → Gmail drafts folder of the partner who owns the CRM contact (lookup by `Contact Email` domain or by static `partner_owner_map` if configured). NEVER auto-sent.
- **Morning brief** → single email to all four partners + office manager. Subject format:
  `PJP Morning Sweep — <YYYY-MM-DD> — <N> drafts ready, <M> on watchlist`
  Body in HTML for inbox rendering, with markdown alternative for plaintext clients.
- **Run log** → assignment job log, retained 90 days. Also written to a designated Drive folder if Drive is connected, for long-term audit.

---

## Edge cases and how to handle them

- **Newsletter not received** (no message in inbox by 09:00). Skip silently — do not page partners. The next morning's run handles the backlog if they forward the missed issue.
- **Newsletter received but empty** (e.g. holiday "no edition today"). Detect by `len(extracted_mentions) == 0`. Log and exit.
- **CRM row has no `Contact Email`**. Skip drafting; add to the run log as `match_no_contact` so a partner can update the sheet.
- **CRM row has stale email** (bounce on a previous send, tracked in skill memory). Draft a memo to the partner instead, suggesting they update the contact.
- **Same company mentioned multiple times** in one newsletter (e.g. a feature article + a sidebar). De-duplicate by matched CRM company; pick the strongest news_type. Priority order:
  `funding_round` > `ipo` > `acquisition` > `hire` > `product_launch` > `partnership` > `other` > `departure` > `layoffs` > `lawsuit`.
  The dedup decision is logged ("Stripe: kept ipo over hire") so the audit trail explains why only one draft was created.
- **Contact has changed companies** (`Notes` says "moved to X" or LinkedIn signal). Don't draft against the old company. Flag for CRM update.
- **News is negative** (lawsuit, layoffs, executive ousted). Do **not** draft a "congrats" email. Log as `match_negative_news` and surface in the watchlist brief instead, suggesting the partner reach out personally if appropriate.
- **Company is a competitor to a portfolio company** (`Notes` contains "competitive to <X>"). Default to drafting a check-in email, but add a flag so the partner sees the conflict before sending.

---

## Quality bars (non-negotiable)

1. **No silent failures.** Every step writes to the run log, including "0 matches today."
2. **No auto-send.** Drafts only — every email gets human review before going out.
3. **No fabricated detail.** If the email body would require detail not in the newsletter or CRM, omit it. Better a shorter email than an inaccurate one.
4. **Audit-ready.** A partner should be able to ask "why did we not draft to Stripe today?" and the run log answers in one query.

---

## Why this skill is built the way it is

A few decisions that look unusual but are deliberate:

- **LLM only for extraction and drafting, not matching.** The matching step is rule-based because matching is a place LLMs hallucinate (they will confidently match "Notion" to a paragraph that uses the word "notion"). Determinism here is worth the brittleness — we'd rather miss a match (which the partner will catch) than draft a wrong-recipient email.
- **Substring match is two-directional, with asymmetric guards.** The "mention is shorter than CRM name" direction (e.g., `Scale` → `Scale AI`) is permissive but applies the ambiguous-name guard. The "mention is longer than CRM name" direction (e.g., `Figma Software` → `Figma`) is the risky one — different-company-with-similar-name happens far more often than a subsidiary — so it requires domain or context-word corroboration. Caught the false positive where `Figma Software (Pty) Ltd` (a South African accounting vendor) would have matched CRM Figma.
- **News type drives email tone, dedup, and disambiguation.** Acquisition of a portfolio company gets a different draft than an executive hire. When the same CRM company is mentioned twice in one issue (e.g., Stripe in the BFD + Stripe in People Moves), we dedup and keep the higher-priority news_type.
- **Watchlist threshold is 7, not 5.** Partners' time is the scarce resource. Higher threshold = fewer false positives = brief gets read.
- **Banned phrase list in email check.** Removes the "AI smell" that partners would notice and bin.

---

## Failure modes and recovery

| Symptom | Likely cause | Fix |
|---|---|---|
| Draft sent to wrong contact | CRM contact stale | Update `Contact Email`; the next run picks it up |
| No drafts on a day with obvious matches | LLM extraction returned malformed JSON twice | Check run log for `extraction_error`; retry the run manually |
| Draft mentions detail not in the newsletter | Email-quality check missed a hallucination | Tighten the "no fabricated detail" check; report sample for prompt tuning |
| Watchlist brief is empty for a week | Threshold too high or thesis too narrow | Adjust `pjp_thesis` in skill memory or lower threshold to 6 |

---

## Operating notes

- **Run cost.** ~1 LLM extraction call + 1 draft call per match + 1 scoring call per non-match. Typical day: 15-25 LLM calls, well under $0.50 per run.
- **Latency.** Target end-to-end < 90 seconds. If a partner is staring at the screen waiting, something is wrong.
- **Memory.** The skill remembers: prior bounces, contacts who changed companies, last 14 days of run logs. Memory is workspace-scoped, not skill-scoped.
- **Updates to the CRM** (new rows, new contacts) are picked up automatically on the next run — no redeploy needed.

---

## Validation

This skill ships with a four-newsletter regression suite (`tests/`) covering:

- **Normal day** — happy path with strategic mix of news_types and relationship tones (Portfolio, Pipeline, Network).
- **Quiet day** — validates the 0-match path. No errors, no empty drafts, no paging.
- **Adversarial day** — every mention is a name-collision trap (`RAMP Logistics`, `Brexley Industries`, `Striped Capital`, `Figma Software (Pty) Ltd`). Expected: zero drafts, every trap rejected with an audit-log reason.
- **Edge cases** — legal-suffix variants (`Stripe, Inc.`, `Anthropic, PBC`, `Rippling Inc`), partial-name extension with corroboration (`Airtable Holdings`, `Notion AI Mail`), partial-name extension *without* corroboration (`Anthropic Capital` law firm), and same-company multi-mention dedup.

Run with `python3 run_all_tests.py`. Latest pass: **4/4 ✓**. Two real bugs caught while writing the suite (substring direction-B false positive and missing dedup) are now fixed and regression-protected.
