Ayyouboss0011/SherlockMaps: a professional, open-source Google Maps we

A professional, open-source Google Maps web crawler that extracts company information from Google Maps. Built with Playwright for browser automation.

Sherlock Maps

Open-Source Google Maps Webcrawler

Features

Object-Oriented - Cleanly structured with classes, dataclasses, and design patterns
Search - Search Google Maps with any search term
Detailed Company Information extraction:
- Company name
- Category / Industry
- Address
- Phone number
- Website URL
- Rating (stars)
- Number of reviews
- Plus Code
- Opening hours
- Attributes (wheelchair accessibility, etc.)
Deduplication based on company name + website
URL Validation (filters out invalid websites)
Multiple Output Formats: JSON, CSV, Pretty-Print, File, Print
REST API - Asynchronous job queue server with full-featured endpoints
Docker Support - Containerized deployment
Chrome Profile Persistence - Session data persists between runs

Quick Start

Option 1: Docker (Recommended)

The easiest way to get started. Docker handles all dependencies, Playwright, and browser setup automatically.

Using docker-compose (Simplest)

git clone https://github.com/Ayyouboss0011/SherlockMaps.git
cd GoogleMapsCrawler

# Start the API server
docker compose up -d

# The API is now running at http://localhost:8000
# Interactive documentation: http://localhost:8000/docs

Start a crawl via API

curl -X POST http://localhost:8000/crawl \
  -H "Content-Type: application/json" \
  -d '{"prompt": "restaurants berlin"}'

Get results

# Check job status
curl http://localhost:8000/status

# Get all results
curl http://localhost:8000/results

Stop the container

docker compose down

Using Docker CLI (without docker-compose)

# Clone the repository
git clone https://github.com/Ayyouboss0011/SherlockMaps.git
cd GoogleMapsCrawler

# Build the image
cd core && docker build -t sherlock-maps . && cd ..

# Run as API server
docker run -d -p 8000:8000 --name sherlock-maps sherlock-maps

# Run in CLI mode (one-time crawl)
docker run --rm -e PROMPT="restaurants berlin" sherlock-maps python /app/core/main_cli.py

Option 2: Without Docker

Install Python dependencies manually and run the crawler directly.

Prerequisites

Python 3.9 or higher
Git

Installation

# Clone the repository
git clone https://github.com/Ayyouboss0011/SherlockMaps.git
cd GoogleMapsCrawler

# Install Python dependencies
cd core
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

Run the CLI crawler

# Set search term
export PROMPT="restaurants berlin"

# Run the crawler
python main.py

Run the REST API server

# Start the API server on port 8000
python api_main.py

# The API is now running at http://localhost:8000
# Interactive documentation: http://localhost:8000/docs

Start a crawl via API

curl -X POST http://localhost:8000/crawl \
  -H "Content-Type: application/json" \
  -d '{"prompt": "restaurants berlin"}'

Use as a Python library

from core.crawler import run_crawler

results = run_crawler(
    prompt="restaurants berlin",
    headless=False,
    output_format="json"
)

for company in results:
    print(f"{company.name} - {company.website}")

CLI Mode

The crawler can be used directly from the command line. All results are output to stdout as JSON by default.

Output Formats

# JSON to stdout (default)
export PROMPT="restaurants berlin"
python main.py

# Save as JSON file
export PROMPT="restaurants berlin"
export OUTPUT_FORMAT="file"
python main.py

# Save as CSV file
export PROMPT="restaurants berlin"
export OUTPUT_FORMAT="csv"
python main.py

# Formatted output (one company per block)
export PROMPT="restaurants berlin"
export OUTPUT_FORMAT="print"
python main.py

# Human-readable output
export PROMPT="restaurants berlin"
export OUTPUT_FORMAT="pretty"
python main.py

# Headless mode (for production/servers)
export PROMPT="restaurants berlin"
export HEADLESS="true"
python main.py

Output Formats Overview

Format	Description
`json`	JSON array to stdout (default)
`file`	Saves results as `sherlock-maps_YYYYMMDD_HHMMSS.json`
`csv`	Saves results as `sherlock-maps_YYYYMMDD_HHMMSS.csv`
`print`	Each company individually with separator
`pretty`	Human-readable format with aligned fields

Environment Variables

Variable	Description	Default
`PROMPT`	Search term for Google Maps	Required
`OUTPUT_FORMAT`	Output format: `json`, `print`, `file`, `csv`, `pretty`	`json`
`HEADLESS`	Run browser in headless mode	`false`
`GOOGLE_API_KEY`	Optional Google API key	(empty)

Python Library

Simple (Convenience Function)

from core.crawler import run_crawler

results = run_crawler(
    prompt="restaurants berlin",
    headless=False,
    output_format="json"
)

for company in results:
    print(f"{company.name} - {company.website}")

Complete (With Configuration)

from core.models import CrawlerConfig, CompanyData
from core.crawler import GoogleMapsCrawler

# Create configuration
config = CrawlerConfig(
    search_prompt="restaurants berlin",
    headless=False,
    output_format="pretty",
)

# Use crawler with context manager
with GoogleMapsCrawler(config) as crawler:
    results = crawler.crawl()

# Process results
for company in results:
    if isinstance(company, CompanyData):
        print(f"{company.name}: {company.rating} stars ({company.reviews_count} reviews)")

Custom Search at Runtime

from core.models import CrawlerConfig
from core.crawler import GoogleMapsCrawler

config = CrawlerConfig(
    search_prompt="cafes berlin",
    output_format="json",
)

with GoogleMapsCrawler(config) as crawler:
    # First search
    results1 = crawler.crawl()

    # Second search with different term
    results2 = crawler.crawl(prompt="restaurants munich")

REST API

The crawler can run as a persistent service with REST API. The container starts as an API server and can process multiple crawl jobs sequentially.

Start the API

# Build the image
cd core
docker build -t sherlock-maps .

# Start API server (port 8000)
docker run -p 8000:8000 sherlock-maps

# With custom port
docker run -p 8080:8080 -e API_PORT=8080 sherlock-maps

API Endpoints

Health & Status

Method	Path	Description
GET	`/health`	Health check (for Docker orchestrators)
GET	`/status`	Current status (idle/busy), active jobs, queue length
GET	`/stats`	Detailed statistics

Crawler Control

Method	Path	Description
POST	`/crawl`	Start a new crawl job
GET	`/crawl/{job_id}`	Get job status
GET	`/crawl/{job_id}/results`	Get job results
DELETE	`/crawl/{job_id}`	Cancel a running job
GET	`/crawl/history`	List all jobs with pagination

Data Management

Method	Path	Description
GET	`/results`	Get all results
POST	`/results/export`	Export results
DELETE	`/results/clear`	Clear all results

Configuration

Method	Path	Description
GET	`/config`	Get current configuration
PUT	`/config`	Update configuration

Browser

Method	Path	Description
GET	`/browser/info`	Browser information
POST	`/browser/restart`	Restart browser

API Examples

# Start a new crawl job
curl -X POST http://localhost:8000/crawl \
  -H "Content-Type: application/json" \
  -d '{"prompt": "restaurants berlin", "output_format": "json"}'

# Get job status
curl http://localhost:8000/crawl/<job_id>

# Get results
curl http://localhost:8000/crawl/<job_id>/results

# Get all results as CSV
curl "http://localhost:8000/results?format=csv"

# Get status
curl http://localhost:8000/status

# Health check
curl http://localhost:8000/health

# Cancel job
curl -X DELETE http://localhost:8000/crawl/<job_id>

# Job history
curl "http://localhost:8000/crawl/history?limit=10&offset=0"

Request Example

{
  "prompt": "restaurants berlin",
  "output_format": "json",
  "headless": false,
  "locale": "de-DE",
  "max_results": 100
}

Response Example (Job Status)

{
  "job_id": "abc-123-def",
  "status": "completed",
  "prompt": "restaurants berlin",
  "created_at": "2026-01-15T10:30:00Z",
  "completed_at": "2026-01-15T10:31:30Z",
  "results_count": 42,
  "error": null
}

Job Status

Status	Description
`pending`	In the queue
`running`	Currently running
`completed`	Successfully completed
`failed`	Failed
`cancelled`	Cancelled

Interactive API Documentation

When the API server is running, interactive Swagger documentation is available:

http://localhost:8000/docs

How It Works

Search - Navigates to Google Maps with the search term
Scroll - Loads all search results by scrolling
Extract - Navigates to each result's detail page and extracts:
- Company name, category, address, phone, website
- Rating and number of reviews
- Opening hours
- Attributes
Filter - Removes duplicates and validates website URLs
Output - Outputs results in the desired format

Architecture

Sherlock Maps/
├── .gitignore
├── docker-compose.yml
├── README.md
├── public/
│   └── SherlockMaps.png
└── core/
    ├── __init__.py                   # Package exports
    ├── main.py                       # CLI entry point
    ├── main_cli.py                   # CLI logic
    ├── crawler.py                    # Main crawler class
    ├── requirements.txt              # Python dependencies
    ├── api/
    │   ├── __init__.py
    │   ├── models.py                 # API data models
    │   ├── queue_manager.py          # Job queue management
    │   └── server.py                 # FastAPI server
    ├── browser/
    │   ├── __init__.py
    │   └── browser_manager.py        # Browser lifecycle management
    ├── exceptions/
    │   ├── __init__.py
    │   └── crawler_exceptions.py     # Custom exceptions
    ├── extractors/
    │   ├── __init__.py
    │   └── maps_extractor.py         # Google Maps data extraction
    ├── models/
    │   ├── __init__.py
    │   ├── company.py                # CompanyData model
    │   └── crawler_config.py         # CrawlerConfig model
    ├── output/
    │   ├── __init__.py
    │   └── output_handler.py         # Output formats
    └── processors/
        ├── __init__.py
        ├── url_validator.py          # URL validation
        └── deduplication_processor.py # Deduplication

Class Overview

Class	Module	Description
`Sherlock Maps`		Open-Source Google Maps Webcrawler
`GoogleMapsCrawler`	`crawler.py`	Main class, orchestrates the entire crawling process
`BrowserManager`	`browser/browser_manager.py`	Manages Playwright browser lifecycle
`MapsExtractor`	`extractors/maps_extractor.py`	Extracts company data from Google Maps
`CompanyData`	`models/company.py`	Data model for a company
`CrawlerConfig`	`models/crawler_config.py`	Crawler configuration
`URLValidator`	`processors/url_validator.py`	Validates HTTP(S) URLs
`DeduplicationProcessor`	`processors/deduplication_processor.py`	Removes duplicates
`OutputHandler`	`output/output_handler.py`	Formats and outputs results
`CrawlerBaseException`	`exceptions/crawler_exceptions.py`	Base exception class

Configuration

CrawlerConfig Attributes

Attribute	Type	Default	Description
`search_prompt`	`str`	`""`	The search term for Google Maps
`headless`	`bool`	`False`	Run browser in headless mode
`output_format`	`Literal`	`"json"`	Output format
`chrome_profile_path`	`str`	`"Chrome_Profile"`	Path to Chrome user data directory
`viewport`	`ViewPort`	`1920x1080`	Browser viewport dimensions
`locale`	`str`	`"de-DE"`	Browser localization
`page_timeout`	`int`	`30000`	Maximum navigation timeout in ms
`selector_timeout`	`int`	`15000`	Maximum timeout for selectors in ms
`scroll_timeout`	`int`	`45`	Maximum time for scrolling in seconds
`max_scroll_attempts`	`int`	`5`	Number of scroll attempts before stop
`max_retries`	`int`	`3`	Number of navigation retry attempts
`request_timeout`	`int`	`25000`	Request timeout in ms

Example Output

[
  {
    "name": "Restaurant Name",
    "category": "Restaurant",
    "address": "Musterstrasse 1, 10115 Berlin",
    "phone": "+49 30 12345678",
    "website": "https://www.restaurant-example.de",
    "rating": "4.5",
    "reviews_count": "234",
    "plus_code": "GVMF+8H Berlin",
    "opening_hours": "Mon: 12:00-22:00, Tue: 12:00-22:00, ...",
    "attributes": ["Wheelchair accessible entrance"]
  }
]

Limitations

Google Maps UI changes may break selectors (CSS classes like h1.DUwDvf are Google-specific)
Rate limiting: Google may show CAPTCHAs for fast requests
German localization is hardcoded (hl=de), for other languages browser_manager.py must be modified
Requires a display or headless mode for Chromium

Sherlock Maps

Features

Quick Start

Option 1: Docker (Recommended)

Using docker-compose (Simplest)

Start a crawl via API

Get results

Stop the container

Using Docker CLI (without docker-compose)

Option 2: Without Docker

Prerequisites

Installation

Run the CLI crawler

Run the REST API server

Start a crawl via API

Use as a Python library

CLI Mode

Output Formats

Output Formats Overview

Environment Variables

Python Library

Simple (Convenience Function)

Complete (With Configuration)

Custom Search at Runtime

REST API

Start the API

API Endpoints

Health & Status

Crawler Control

Data Management

Configuration

Browser

API Examples

Request Example

Response Example (Job Status)

Job Status

Interactive API Documentation

How It Works

Architecture

Class Overview

Configuration

CrawlerConfig Attributes

Example Output

Limitations

Resources

Comments

Related Posts

kao273183/mk-qa-master: <em>AI 測試大師 — your AI QA loop, from analyze to advise.</em>

WillyEverGreen/acon: <img src="https://raw.githubusercontent.com/WillyEverGreen/acon/main/logo.

darkamenosa/realbrowser: realbrowser is an agent skill and small local CLI for fast target-firs

ltczding-gif/ref-downloader: > **Stop losing an afternoon to chasing dozens of reference PDFs b