Welcome to the Web Scraper Dashboard
This self-hosted web application automates the collection, processing, and visualization of real-time data. It combines powerful web scraping, API integration, and backend architecture to deliver a reliable and extensible data dashboard — all running on a Raspberry Pi Headless Server.
🛠️ Tech Stack
Component | Technology Used |
---|---|
Backend | Flask, Python |
Web Scraping | Playwright, FinnHub, and CoinMarketCap API |
Database | PostgreSQL (Star schema with normalized views) |
API Backend | REST API using FastAPI in Python, using key role management |
Web Server | Nginx + Gunicorn |
Frontend | HTML, Jinja2, CSS (Dark Mode UI) |
Content Management | Custom CMS (Database-driven layouts & assets) |
Hosting | Self Hosted on a Raspberry Pi |
🌐 Web Scraping & API Integration
The system retrieves structured data from websites using Playwright-based scrapers, distributed across Raspberry Pi devices. External APIs provide real-time market data (like stocks and crypto), fetched via scheduled jobs to ensure freshness.
- 🔄 Automated Scraping: Playwright tasks run across multiple Raspberry Pis on a scheduler.
- 📡 Live API Feeds: Data from financial APIs is fetched and stored for dashboard display.
- 🕵️ Error Handling: Built-in retry logic, logging, and cleanup ensures consistent ingestion.
🗄️ Database Architecture & API Backend
Data is stored in a PostgreSQL backend using a denormalized star schema for fast analytics, supported by normalized views to keep data structured and reusable across endpoints. The database is exposed via secure API endpoints with role-based access control.
- 🧱 Star Schema Tables: Denormalized structure for fast aggregate queries and reporting.
- 🧮 Normalized Views: Clean abstraction for consistent, reusable frontend queries.
- 🔐 API Access: Secure endpoints support
GET
andPOST
with API keys grantingreadonly
orreadwrite
access. - 🚦 Collaborator Ready: API access allows integration with other tools or teams.
🚀 Hosting & Deployment
The entire stack runs on a Raspberry Pi using PostgreSQL, Flask, Nginx, and Gunicorn for efficient performance and scalability. The setup allows seamless CI/CD-style updates with minimal downtime and hands-free startup.
- 🖥️ Reverse Proxy & SSL: Nginx handles secure access to the Flask app.
- 🧪 Deployment Scripts: Automation scripts ensure consistent builds and updates.
🧰 CMS & Dashboard Interface
The dashboard interface is built using Flask and Jinja2 templates, with a responsive layout, pagination, and live filters. Now powered by a custom CMS, all content — pages, layouts, assets, and components — are loaded dynamically from the PostgreSQL database.
- 📊 Web Dashboard: Users can search, filter, and paginate through real-time data.
- 🗄️ Dynamic CMS: Pages and components are loaded from the database, not the filesystem.
- 📋 Template & Asset Management: HTML, CSS, and JS are stored and served from DB tables.
- 🧩 Reusable Components: Layouts, partials, and macros like pagination are all DB-driven.