AutomatiQ records a user's browser activity and employs AI to produce executable Python scripts for web automation or data extraction. Hosted on GitHub at StoneSteel27/AutomatiQ with 40 stars, this Python project targets Python 3.11+ and carries an MIT license. Currently in alpha, it warns users that features may break or change, with a VISION.md file outlining its goals.
The tool addresses the gap between human browsing and programmatic repetition. Users often perform tasks manually in a browser—clicking links, filling forms, extracting data—then struggle to replicate them in code. AutomatiQ automates this translation: it captures the session without needing prior script inspection or selectors. An AI agent handles the reverse-engineering, outputting a standalone Python script ready for reuse.
Core workflow
AutomatiQ operates in three phases, as diagrammed in its README:
Record: Launches Chrome with instrumentation to capture screen video, network requests (including bodies, cookies), and interactions like clicks, typing, and navigation, all timestamped. Users browse normally and stop with
Ctrl+C.Compile: Processes the capture. Vision AI examines video clips around each action. Network data gets decoded, deduplicated, and organized into a workspace dump.
Agent: An LLM (via LiteLLM-supported models) analyzes the workspace in a sandboxed IPython environment. It experiments iteratively—up to 60 steps by default—writing and testing code until producing a functional script.
Keyboard shortcuts aid control across phases:
| Phase | Key | Action |
|---|---|---|
| Recording | Ctrl+C |
Stop and save session |
| Compilation | Esc |
Skip remaining AI analysis |
| Compilation | y/n |
Confirm/deny skip prompt |
| Agent | q |
Quit agent session |
| Agent | Esc |
Cancel LLM call or execution |
Ctrl+C force-quits anytime. GitHub workflows show passing tests and lint status on the main branch.
CLI flags customize behavior:
| Flag | Description |
|---|---|
--model MODEL |
LiteLLM model for agent |
--recorder-model MODEL |
Vision model for clips |
--base-url URL |
Custom OpenAI-compatible endpoint |
--max-steps N |
Agent iterations (default: 60) |
--sandbox-timeout SEC |
IPython cell timeout (default: 60s) |
--output-dir PATH |
Output root (default: ./output) |
--no-banner |
Skip animation |
--verbose |
Detailed logs |
-V, --version |
Version info |
-h, --help |
Help |
Chrome launches via Chrome DevTools Protocol (CDP) for precise capture of requests, responses, and events.
Getting it running
Installation uses pip:
pip install automatiq
Provide an API key for a LiteLLM-supported model, such as Gemini:
export GEMINI_API_KEY=your-key-here
Launch a session:
automatiq run https://example.com
Chrome opens to the URL. Perform actions, then Ctrl+C. The tool compiles footage, runs the agent, and saves the script to ./output by default. Outputs include video clips, network dumps, and the final Python file. For custom setups, adjust --output-dir or --model (e.g., --model=gpt-4o).
The process generates a workspace with structured data: JSON for requests, MP4 clips for visuals, and agent traces. Scripts use familiar libraries like requests or Playwright, inferred from the session.
Real-world use cases
AutomatiQ suits quick prototyping of repetitive browser tasks. A developer scraping product prices from an e-commerce site browses once—navigates categories, filters, copies data—and gets a script extracting the same info headless. Testers record login flows for QA automation without writing locators. Data analysts capture dashboard interactions to build extraction pipelines.
It fits non-coders dipping into automation or experts needing fast MVPs before refining. The Discord server (discord.gg/8j7dFWMMDA) offers community support for tweaks. Output scripts run independently, no AutomatiQ dependency post-generation.
Limitations appear in complex sites with heavy JavaScript. Alpha status means incomplete vision analysis for edge cases, like occluded clicks or dynamic SPAs. README notes ongoing browser capture refinements.
How it stacks up
Traditional tools demand upfront coding. Selenium or Playwright recorders exist but output brittle selector-based code requiring fixes for site changes. Puppeteer scripts need manual event mapping. AutomatiQ skips this: AI infers logic from video and network traces, producing resilient code via experimentation.
Browser extensions like Wildfire or iMacros capture macros but lack AI code generation. No-code platforms (e.g., Browserflow) build visual flows, not Python scripts. AutomatiQ bridges to code, heavier on compute (LLM calls, video processing) but lighter on dev time for one-offs.
Open-source peers like Playwright's codegen mimic sessions to code, yet miss AI abstraction—no LLM dedupes requests or hypothesizes logic. AutomatiQ's agent iterates in sandbox, self-correcting errors, unlike static recorders.
Caveats and source
Expect instability in alpha; complex sessions may timeout or yield partial scripts. --max-steps and --sandbox-timeout tune reliability, but manual debugging of agent traces helps. Not for production pipelines yet—refine outputs manually.
Full details and issues live at https://github.com/StoneSteel27/AutomatiQ. Check VISION.md for roadmap.
Comments