pankaj

the problem

web scraping is fragile. you write CSS selectors, the site changes its markup, your scraper breaks. you fix it, it breaks again. repeat forever.

what it does

define the data you want with a zod schema. pluckr figures out how to extract it. when pages change, it self-heals. no selectors to write or maintain.

how it works

you pass raw HTML and a zod schema describing the shape of data you want
an LLM generates CSS selectors for each field through an agentic tool loop
selectors are tested against the actual page before being committed
working selectors get cached. repeat extractions are instant, zero LLM calls
if a selector breaks because the page changed, pluckr detects it and auto-regenerates

why sqlite + redis

most scraping runs are one machine, one script. sqlite is perfect. but for distributed setups you need shared cache, so redis is an option too. both ship as separate packages (@pluckr/sqlite, @pluckr/redis).

works with any LLM

built on the vercel AI SDK, so you can plug in openai, anthropic, google, or any compatible provider. the agentic loop works the same regardless of the model behind it.