Features
WebRobot provides a comprehensive set of features for building and managing agentic ETL pipelines.
Discover the powerful capabilities that make WebRobot the leading platform for agentic ETL pipelines.
Core Features
🚀 Spark-Native Processing
- Distributed Computing: Leverage Apache Spark's distributed processing capabilities
- Scalability: Handle data from gigabytes to petabytes
- Performance: Optimized for speed and efficiency
- Resource Management: Intelligent resource allocation and optimization
🤖 AI-Powered Intelligence
- Intelligent Stages: LLM-powered stages that adapt to changing web structures
- Natural Language Processing: Convert natural language descriptions to executable pipelines
- Auto-Programming: Python extensions for dynamic stage generation
- Context-Aware Extraction: Intelligent data extraction with minimal configuration
🔌 API-First Architecture
- RESTful API: Complete programmatic control via REST API
- SDK Support: Official SDKs for multiple programming languages
- Webhooks: Real-time notifications for job status and events
- Integration Ready: Easy integration with existing tools and workflows
🧩 Maximum Extensibility
- Custom Plugins: Build and deploy custom plugins for technical partners
- Python Extensions: Dynamic row transforms without compilation
- Attribute Resolvers: Custom extraction methods for flexible data extraction
- Custom Actions: Extend browser interactions with custom action factories
🌐 Multi-Source Integration
- Web Sources: Intelligent web scraping with browser automation
- Databases: Connect to PostgreSQL, MySQL, MongoDB, and more
- APIs: REST and GraphQL API integration
- Streaming: Real-time data ingestion from Kafka, MQTT, and more
📊 Enterprise Features
- Monitoring: Comprehensive logging and monitoring capabilities
- Security: Enterprise-grade authentication and authorization
- Multi-tenancy: Support for multiple organizations and projects
- Audit Trail: Complete audit logging for compliance
Advanced Features
Agentic Capabilities
- Pipeline Generation: AI agents that generate pipelines from natural language
- Auto-Setup: Automated configuration and setup of interactive actions
- Context Learning: Agents learn from documentation and examples
- Error Recovery: Intelligent error handling and recovery
Vertical Solutions
- LLM Fine-tuning: Datasets for training and fine-tuning LLMs
- Price Comparison: Real-time price monitoring and comparison
- Sports Betting: Surebet detection and arbitrage opportunities
- Real Estate: Property clustering and market analysis
Developer Experience
- CLI Tools: Command-line interface for pipeline management
- IDE Integration: Support for popular IDEs and editors
- Testing: Built-in testing and validation tools
- Documentation: Comprehensive documentation and examples
AI-Assisted Development
- Claude Code Plugin: MCP server + skill set for AI-assisted pipeline building and administration. Claude Code is our recommended environment for vibe coding, particularly for the development of technical partner plugins.
- Cursor IDE Support: Native MCP tool integration — list jobs, run pipelines, inspect logs from your editor
- Skills:
/webrobot-admin,/webrobot-pipeline,/webrobot-plugin-dev,/webrobot-python-extension - AI Agent Workflow: Generate Python Extensions at runtime, register via API, reference in YAML — no compilation
Partner Plugin System
- Plugin Marketplace: Technical partners can upload custom ETL and API plugins
- Plugin SDK: Scala traits (
WSourceStage,WTransformStage,WSinkStage,WFilterStage,WAggregateStage) + Java REST API plugin interface - CI/CD Integration: Jenkins pipeline with automatic JAR upload to MinIO and DB registration
- Plugin Manifest: Declarative
manifest.jsonwith stage schema, Flyway migrations, and org scoping
Ray Platform (coming soon)
WebroBot is extending its backend with a Ray-based distributed computing layer, complementing the existing Spark engine with capabilities tailored for AI workloads and real-time event-driven architectures.
Training & Fine-tuning
Ray Train and Ray Data will power distributed model training and LLM fine-tuning pipelines, integrated with the same project/job model used for ETL workloads.
Inference & Agentic Execution
Ray Serve will host inference endpoints for custom models. Ray's actor model will support distributed agentic workflows — long-running agents that coordinate across multiple nodes, consume events, and drive pipeline executions autonomously.
Distributed Trading Engine
The Ray layer will serve as the backbone for real-time trading and arbitrage use cases, enabling low-latency event processing and coordination across geographically distributed workers.
Sports Betting — Real-Time Odds Pipeline
The surebet detection vertical will use Ray to monitor live odds from multiple bookmakers in real time. Detected events feed a Kafka queue, which in turn drives a Spark Structured Streaming job for continuous arbitrage calculation and alerting.
Bookmaker APIs → Ray workers (real-time odds collection)
↓
Kafka topic
↓
Spark Structured Streaming
↓
Surebet detection & alertsWhat's Next?
Check out our documentation to see all features and improvements.
