AutomatiQ records a user's browser activity and employs AI to produce executable Python scripts for web automation or data extraction. Hosted on GitHub at StoneSteel27/AutomatiQ with 40 stars, this Python project targets Python 3.11+ and carries an MIT license. Currently in alpha, it warns users that features may break or change, with a VISION.md file outlining its goals.

The tool addresses the gap between human browsing and programmatic repetition. Users often perform tasks manually in a browser—clicking links, filling forms, extracting data—then struggle to replicate them in code. AutomatiQ automates this translation: it captures the session without needing prior script inspection or selectors. An AI agent handles the reverse-engineering, outputting a standalone Python script ready for reuse.

Core workflow

AutomatiQ operates in three phases, as diagrammed in its README:

  1. Record: Launches Chrome with instrumentation to capture screen video, network requests (including bodies, cookies), and interactions like clicks, typing, and navigation, all timestamped. Users browse normally and stop with Ctrl+C.

  2. Compile: Processes the capture. Vision AI examines video clips around each action. Network data gets decoded, deduplicated, and organized into a workspace dump.

  3. Agent: An LLM (via LiteLLM-supported models) analyzes the workspace in a sandboxed IPython environment. It experiments iteratively—up to 60 steps by default—writing and testing code until producing a functional script.

Keyboard shortcuts aid control across phases:

Phase Key Action
Recording Ctrl+C Stop and save session
Compilation Esc Skip remaining AI analysis
Compilation y/n Confirm/deny skip prompt
Agent q Quit agent session
Agent Esc Cancel LLM call or execution

Ctrl+C force-quits anytime. GitHub workflows show passing tests and lint status on the main branch.

CLI flags customize behavior:

Flag Description
--model MODEL LiteLLM model for agent
--recorder-model MODEL Vision model for clips
--base-url URL Custom OpenAI-compatible endpoint
--max-steps N Agent iterations (default: 60)
--sandbox-timeout SEC IPython cell timeout (default: 60s)
--output-dir PATH Output root (default: ./output)
--no-banner Skip animation
--verbose Detailed logs
-V, --version Version info
-h, --help Help

Chrome launches via Chrome DevTools Protocol (CDP) for precise capture of requests, responses, and events.

Getting it running

Installation uses pip:

pip install automatiq

Provide an API key for a LiteLLM-supported model, such as Gemini:

export GEMINI_API_KEY=your-key-here

Launch a session:

automatiq run https://example.com

Chrome opens to the URL. Perform actions, then Ctrl+C. The tool compiles footage, runs the agent, and saves the script to ./output by default. Outputs include video clips, network dumps, and the final Python file. For custom setups, adjust --output-dir or --model (e.g., --model=gpt-4o).

The process generates a workspace with structured data: JSON for requests, MP4 clips for visuals, and agent traces. Scripts use familiar libraries like requests or Playwright, inferred from the session.

Real-world use cases

AutomatiQ suits quick prototyping of repetitive browser tasks. A developer scraping product prices from an e-commerce site browses once—navigates categories, filters, copies data—and gets a script extracting the same info headless. Testers record login flows for QA automation without writing locators. Data analysts capture dashboard interactions to build extraction pipelines.

It fits non-coders dipping into automation or experts needing fast MVPs before refining. The Discord server (discord.gg/8j7dFWMMDA) offers community support for tweaks. Output scripts run independently, no AutomatiQ dependency post-generation.

Limitations appear in complex sites with heavy JavaScript. Alpha status means incomplete vision analysis for edge cases, like occluded clicks or dynamic SPAs. README notes ongoing browser capture refinements.

How it stacks up

Traditional tools demand upfront coding. Selenium or Playwright recorders exist but output brittle selector-based code requiring fixes for site changes. Puppeteer scripts need manual event mapping. AutomatiQ skips this: AI infers logic from video and network traces, producing resilient code via experimentation.

Browser extensions like Wildfire or iMacros capture macros but lack AI code generation. No-code platforms (e.g., Browserflow) build visual flows, not Python scripts. AutomatiQ bridges to code, heavier on compute (LLM calls, video processing) but lighter on dev time for one-offs.

Open-source peers like Playwright's codegen mimic sessions to code, yet miss AI abstraction—no LLM dedupes requests or hypothesizes logic. AutomatiQ's agent iterates in sandbox, self-correcting errors, unlike static recorders.

Caveats and source

Expect instability in alpha; complex sessions may timeout or yield partial scripts. --max-steps and --sandbox-timeout tune reliability, but manual debugging of agent traces helps. Not for production pipelines yet—refine outputs manually.

Full details and issues live at https://github.com/StoneSteel27/AutomatiQ. Check VISION.md for roadmap.