ResearchRecon

Turn any company URL into structured competitive intelligence — company profiles, pricing tiers, hiring signals, Reddit sentiment, and page-change monitoring, all returned as clean JSON.

ResearchRecon scrapes the messy public web and hands you back structured data. Point it at a company homepage, a SaaS pricing page, a careers page, or a Reddit thread, and an LLM extraction pipeline parses the raw HTML into typed JSON fields you can drop straight into a dashboard, CRM, or sales-intel workflow. It solves the part nobody wants to build: fetching pages reliably, stripping the markup, and coaxing consistent structure out of wildly inconsistent layouts.

Endpoints

All extraction endpoints are POST and take a JSON body. Every endpoint accepts url (required) and an optional use_browser boolean (default false) that switches on a headless Chromium render for JavaScript-heavy pages.

Endpoint	What it returns
`POST /extract/company`	Structured company profile: `company_name`, `description`, `founders[]`, `funding`, `industry`, `headquarters`, `employee_count`, `products[]`. Returns `url` and a `raw` model-output field.
`POST /extract/pricing`	Structured pricing: `pricing_model`, `tiers[]` (each with `name`, `price`, `features[]`), `free_trial`, `notes`. Returns `url` and `raw`.
`POST /extract/jobs`	Hiring intelligence: `company`, `total_openings`, `jobs[]` (each with `title`, `location`, `department`, `url`, `description_snippet`), and `hiring_signals[]` (strategic insights inferred from the openings). Returns `url` and `raw`.
`POST /extract/reddit`	Reddit discussion analysis for a given Reddit URL: `subreddit`, `posts[]` (each with `title`, `url`, `score`, `comments`, `summary`), `sentiment` (positive / negative / neutral / mixed), and `themes[]`. Returns `url` and `raw`.
`POST /monitor/competitor`	Change detection for a page: `status` (`changed` or `no_change`), `current_hash`, `previous_hash`, and — when content changed — a plain-text `change_summary` describing what's different from the last snapshot.
`GET /health`	Service liveness check (`status`, `service`).

Why this API

One call, structured output. No HTML parsing, no selectors to maintain, no LLM prompt engineering on your end — send a URL, get typed JSON fields back.
Reddit sentiment from the real source. The Reddit endpoint fetches Reddit's own .json API (not a scraped HTML mirror), reads up to 20 posts with their real scores and comment counts, then summarizes sentiment and recurring themes.
JavaScript rendering on demand. Set use_browser: true and the page is loaded in headless Chromium (waiting for network idle) so single-page apps and lazy-loaded content extract correctly.
Built-in change monitoring. /monitor/competitor stores a SHA-256 hash and snapshot of each page; on the next call it detects whether anything changed and, if so, returns an LLM-written summary of the diff.
Cached and fast. Extraction results are cached so repeat lookups of the same URL come back quickly, and page snapshots are retained for a rolling week to power change detection.

Typical use cases

Enrich a CRM or lead list: feed company homepages to /extract/company for founders, funding, HQ, and product lines.
Competitive pricing tracking: pull a rival's pricing page into structured tiers and feature lists.
Talent-signal scouting: read a competitor's careers page to see what roles they're filling and what that implies about their roadmap.
Brand and product sentiment: analyze Reddit threads for how a product is actually being received.
Watch a competitor's homepage or pricing page and get alerted, with a written summary, when it changes.

Good to know

url is required on every extraction endpoint. The /extract/reddit endpoint expects a Reddit URL (e.g. https://reddit.com/r/python or a thread/search URL), not a free-text company name — it fetches that URL's .json feed directly.
Extraction is LLM-driven, so any field can be null when the page doesn't contain that information or the model can't determine it. Each response also includes a raw field with the model's unparsed output for debugging.
The company endpoint enriches beyond the page. For well-known companies it may fill founders, funding, and employee_count from public knowledge; these can be approximations (prefixed with ~, e.g. ~$8.7B raised, ~8,000). Obscure companies will return null rather than invented figures. Treat estimated values as approximate.
Monitoring needs two calls. The first /monitor/competitor call on a URL establishes the baseline and returns no_change with a null previous_hash; subsequent calls compare against the stored snapshot. Snapshots are kept for about a week.
Page content is truncated before analysis (roughly the first several thousand characters per page), so very long pages are summarized from their leading content.
use_browser: true is slower but necessary for JS-rendered pages; the default fast path uses a direct HTTP fetch.