ResearchRecon
Turn any company URL into structured competitive intelligence — company profiles, pricing tiers, hiring signals, Reddit sentiment, and page-change monitoring, all returned as clean JSON.
ResearchRecon scrapes the messy public web and hands you back structured data. Point it at a company homepage, a SaaS pricing page, a careers page, or a Reddit thread, and an LLM extraction pipeline parses the raw HTML into typed JSON fields you can drop straight into a dashboard, CRM, or sales-intel workflow. It solves the part nobody wants to build: fetching pages reliably, stripping the markup, and coaxing consistent structure out of wildly inconsistent layouts.
Endpoints
All extraction endpoints are POST and take a JSON body. Every endpoint accepts url (required) and an optional use_browser boolean (default false) that switches on a headless Chromium render for JavaScript-heavy pages.
| Endpoint | What it returns |
|---|---|
POST /extract/company |
Structured company profile: company_name, description, founders[], funding, industry, headquarters, employee_count, products[]. Returns url and a raw model-output field. |
POST /extract/pricing |
Structured pricing: pricing_model, tiers[] (each with name, price, features[]), free_trial, notes. Returns url and raw. |
POST /extract/jobs |
Hiring intelligence: company, total_openings, jobs[] (each with title, location, department, url, description_snippet), and hiring_signals[] (strategic insights inferred from the openings). Returns url and raw. |
POST /extract/reddit |
Reddit discussion analysis for a given Reddit URL: subreddit, posts[] (each with title, url, score, comments, summary), sentiment (positive / negative / neutral / mixed), and themes[]. Returns url and raw. |
POST /monitor/competitor |
Change detection for a page: status (changed or no_change), current_hash, previous_hash, and — when content changed — a plain-text change_summary describing what's different from the last snapshot. |
GET /health |
Service liveness check (status, service). |
Why this API
- One call, structured output. No HTML parsing, no selectors to maintain, no LLM prompt engineering on your end — send a URL, get typed JSON fields back.
- Reddit sentiment from the real source. The Reddit endpoint fetches Reddit's own
.jsonAPI (not a scraped HTML mirror), reads up to 20 posts with their real scores and comment counts, then summarizes sentiment and recurring themes. - JavaScript rendering on demand. Set
use_browser: trueand the page is loaded in headless Chromium (waiting for network idle) so single-page apps and lazy-loaded content extract correctly. - Built-in change monitoring.
/monitor/competitorstores a SHA-256 hash and snapshot of each page; on the next call it detects whether anything changed and, if so, returns an LLM-written summary of the diff. - Cached and fast. Extraction results are cached so repeat lookups of the same URL come back quickly, and page snapshots are retained for a rolling week to power change detection.
Typical use cases
- Enrich a CRM or lead list: feed company homepages to
/extract/companyfor founders, funding, HQ, and product lines. - Competitive pricing tracking: pull a rival's pricing page into structured tiers and feature lists.
- Talent-signal scouting: read a competitor's careers page to see what roles they're filling and what that implies about their roadmap.
- Brand and product sentiment: analyze Reddit threads for how a product is actually being received.
- Watch a competitor's homepage or pricing page and get alerted, with a written summary, when it changes.
Good to know
urlis required on every extraction endpoint. The/extract/redditendpoint expects a Reddit URL (e.g.https://reddit.com/r/pythonor a thread/search URL), not a free-text company name — it fetches that URL's.jsonfeed directly.- Extraction is LLM-driven, so any field can be
nullwhen the page doesn't contain that information or the model can't determine it. Each response also includes arawfield with the model's unparsed output for debugging. - The company endpoint enriches beyond the page. For well-known companies it may fill
founders,funding, andemployee_countfrom public knowledge; these can be approximations (prefixed with~, e.g.~$8.7B raised,~8,000). Obscure companies will returnnullrather than invented figures. Treat estimated values as approximate. - Monitoring needs two calls. The first
/monitor/competitorcall on a URL establishes the baseline and returnsno_changewith anullprevious_hash; subsequent calls compare against the stored snapshot. Snapshots are kept for about a week. - Page content is truncated before analysis (roughly the first several thousand characters per page), so very long pages are summarized from their leading content.
use_browser: trueis slower but necessary for JS-rendered pages; the default fast path uses a direct HTTP fetch.