the problem
web scraping is fragile. you write CSS selectors, the site changes its markup, your scraper breaks. you fix it, it breaks again. repeat forever.
what it does
define the data you want with a zod schema. pluckr figures out how to extract it. when pages change, it self-heals. no selectors to write or maintain.
how it works
- you pass raw HTML and a zod schema describing the shape of data you want
- an LLM generates CSS selectors for each field through an agentic tool loop
- selectors are tested against the actual page before being committed
- working selectors get cached. repeat extractions are instant, zero LLM calls
- if a selector breaks because the page changed, pluckr detects it and auto-regenerates
why sqlite + redis
most scraping runs are one machine, one script. sqlite is perfect. but for distributed setups you need shared cache, so redis is an option too. both ship as separate packages (@pluckr/sqlite, @pluckr/redis).
works with any LLM
built on the vercel AI SDK, so you can plug in openai, anthropic, google, or any compatible provider. the agentic loop works the same regardless of the model behind it.