Skip to content

πŸš€ WebRobot Demo ​

Explore WebRobot's capabilities through interactive demonstrations:

  • Execute Example Pipelines: Run publicly available ETL pipelines from our documentation
  • Generate Pipelines: Use AI to generate new pipelines from natural language descriptions
βš™οΈ This demo runs on the Apache Spark ETL subsystem. The agentic Ray runtime (multi-agent crews, adaptive pipelines, LLM oracle cascade) is now live β€” try it on the Agentic Studio page.
πŸ‡ͺπŸ‡Ί Everything you trigger here runs on a European, sovereign Kubernetes cluster hosted on Hetzner servers in the EU. Your data stays in Europe; no third-country processors are in the demo execution path. Object storage (MinIO), Trino, Spark, Camoufox browser pool and the API control plane all live in the same EU region.
πŸ€– The pipeline programming model is intentionally a chain of stages, not full structured programming β€” it's designed to be easy for an LLM to author end-to-end. Stages compose like UNIX pipes; arguments are flat maps or selectors. Higher-order agentic orchestration (with the right skills + planner) handles the intent-to-pipeline translation, so this YAML doesn't need to be state-of-the-art programming to do non-trivial work.
⚑ Prefer the HTTP-only stages β€” wget, wgetExplore, wgetJoin β€” when the target site lets you (server-rendered content, no JS-only pagination): they skip the entire browser automation layer and are significantly faster + cheaper on cluster resources. Use their visit* counterparts only when the page genuinely needs a JS-rendered DOM (single-page apps, infinite scroll, anti-bot challenge). Wikipedia, arXiv, GitHub README pages β†’ wget; eBay search, Amazon detail pages β†’ visit.

πŸ“‹ Execute Example Pipelines

Run publicly available ETL pipelines from our documentation. Preview limited to 5-10 records.

No pipelines available. Please check backend connection.

πŸ› οΈ Build your pipeline

Pick stages from the live catalog and assemble them step by step β€” same shape as webrobot pipeline add-stage on the CLI. No black-box generation.

πŸ“š Stage catalog

no stages match the filters

🧩 Pipeline

empty β€” click a stage in the catalog to add it.
🐍 Python post-processing DataFrame transformations applied AFTER the pipeline. Always last β€” they consume the assembled DF, not the row stream.
No Python post-processing yet. Click + row_transform (per-row UDF), + dataframe_transform (whole-DF driver-side), or + sql_query (SQL on the current DF) to add one.

πŸ“„ YAML preview

(add at least one stage)
⚠️ Save / Validate disabled β€” pending:
  • Pipeline name is required.
  • Add at least one stage.

πŸ” Private Demo

Access client-specific demos and plugins. Requires API key authentication. Demo content is customized based on your account.

Authenticate to Access Private Demos

Enter your API key to view demos available for your account.

πŸ“‘ Use the demo from CLI, SDK or plain curl

Every endpoint this page hits is exposed publicly under https://api.webrobot.eu/api/webrobot/api/demo/*. No JWT, no API key β€” same demo posture you see in the browser is reachable from any HTTP client. OpenAPI spec at api.webrobot.eu/api/openapi.json (216 paths, 25 demo).

1. List available demo pipelines

curl -s https://api.webrobot.eu/api/webrobot/api/demo/list \
  | jq '.demos[] | {pipeline_name, is_draft, requires_input_dataset}'

Returns curated Demo: … entries plus your saved drafts Generated: … (is_draft: true).

2. Run an existing pipeline

EXEC=$(curl -s -X POST \
  https://api.webrobot.eu/api/webrobot/api/demo/execute/01-wiki-us-presidents \
  -H 'content-type: application/json' \
  -d '{"parameters":{"limit":10}}' \
  | jq -r '.execution_id')
echo "execution_id=$EXEC"

Demo runs are capped at 5–10 records; ignore limit outside that range.

3. Poll status (phase-aware)

watch -n 3 "curl -s \
  https://api.webrobot.eu/api/webrobot/api/demo/executions/$EXEC/status \
  | jq '{phase, status, executors_ready, executors_total, duration_seconds}'"

phase = submitting / starting_driver / pulling_executors / running / completed / failed / lost.

4. Read the output

# preview rows (via Trino β†’ MinIO/Parquet)
curl -s "https://api.webrobot.eu/api/webrobot/api/demo/executions/$EXEC/output?limit=10" \
  | jq '{source, columns, rows: (.rows | length)}'

Response includes source: "trino" (preferred) or "minio-direct" (fallback). Format-agnostic.

5. Tail driver / executor logs

curl -s "https://api.webrobot.eu/api/webrobot/api/demo/executions/$EXEC/logs?tail=200&podType=driver"  | jq -r '.logs'
curl -s "https://api.webrobot.eu/api/webrobot/api/demo/executions/$EXEC/logs?tail=200&podType=executor&executorIndex=1" | jq -r '.logs'

Direct kubectl logs-equivalent, sanitized server-side (secrets / pod names / internal classpaths stripped).

6. Save & run your own YAML

cat > /tmp/my.yaml <<'EOF'
pipeline:
  - stage: wget
    args: ["https://en.wikipedia.org/wiki/Apache_Spark"]
  - stage: extract
    args:
      - { selector: "h1#firstHeading", method: "text", as: "title" }
EOF

curl -s -X POST https://api.webrobot.eu/api/webrobot/api/demo/save-generated-pipeline \
  -H 'content-type: application/json' \
  -d "{\"pipeline_name\":\"my-pipe\",\"pipeline_yaml\":$(jq -Rs . < /tmp/my.yaml),\"execute\":true}" \
  | jq '{agent_id, status, execution: .execution.execution_id}'

Same YAML schema the wizard emits. The new agent shows up in /demo/list as Generated: my-pipe with is_draft: true.

7. Validate selectors (no Spark needed)

curl -s -X POST https://api.webrobot.eu/api/webrobot/api/demo/wizard/validate \
  -H 'content-type: application/json' \
  -d "{\"yaml\":$(jq -Rs . < /tmp/my.yaml)}" \
  | jq '{valid, record_count, steps: (.steps | map({stage, status}))}'

Opens an ephemeral Camoufox session, replays the fetch trace, samples up to 5 records. Cheap dry-run before launching the Spark job.

8. Live stage catalog

curl -s https://api.webrobot.eu/api/webrobot/api/demo/catalog/stages \
  | jq '.data[] | {stage_name, args: (.arg_schema | map(.name))}'

62 stages, dynamic. Same source the wizard reads. New stages here without rebuilding any client.

SDKs: github.com/WebRobot-Ltd/sdks generates Python / TypeScript / PHP / Go clients from the same OpenAPI spec (./generate-sdks.sh after a refresh).

CLI: github.com/WebRobot-Ltd/webrobot-cli β€” Scala CLI for the agent / manifest / bundle surface; a dedicated demo subcommand is on the roadmap, in the meantime the curl recipes above cover the demo flow end-to-end.

Released under the MIT License.