I need to analyze CSV files containing school scheduling data and create a complete Python database system. Here's what I need: ## **PROJECT OVERVIEW** I'm building a school scheduling system that migrates from spreadsheets to a database. I have CSV files with school data that needs to be analyzed, transformed, and loaded into a relational database, then queried through a Python application. ## **DATA ANALYSIS PHASE** **1. CSV Structure Analysis:** - Analyze these sample CSV files for structure and content: - `students.csv` (student_id, name, grade, class_group) - `teachers.csv` (teacher_id, name, subject, department) - `schedule.csv` (entry_id, student_id, teacher_id, room, subject, day, time_slot, group_type) - `subjects.csv` (subject_id, subject_name, level, requirements) - `rooms.csv` (room_id, room_number, capacity, equipment) **2. Data Quality Check:** - Identify missing values, duplicates, and inconsistencies - Check data types and formatting consistency - Identify relationships between different CSV files - Note any data normalization needs **3. Schema Design:** - Design a normalized SQL database schema based on CSV analysis - Create ERD showing relationships - Specify primary keys, foreign keys, and constraints ## **DATA MIGRATION PHASE** **4. Python Migration Script:** Create a Python script that: - Reads CSV files using pandas - Cleans and transforms data (handle missing values, format dates, normalize text) - Creates SQLite database with proper schema - Inserts data with error handling and validation - Creates indexes for performance - Generates migration logs and reports **5. Migration Features:** - Incremental data loading - Rollback capability - Data validation rules - Performance optimization - Error logging and reporting ## **DATABASE APPLICATION PHASE** **6. Python Database Application:** Build a Python application with these features: **A. Core Query Functions:** 1. Student queries: - Find student by name/id - Get student schedule - Find student location at specific time - Get student's teachers and subjects 2. Teacher queries: - Find teacher schedule - Get teacher's students - Find available rooms for teacher 3. Schedule queries: - Room availability by time - Schedule conflicts detection - Daily/weekly schedule views **B. Application Structure:** - Database connection manager - Query builder/ORM layer - Business logic layer - CLI interface - API endpoints (optional) - Error handling and logging **C. Advanced Features:** - Search functionality (full-text search) - Export to CSV/Excel - Data validation before insertion - Audit logging - Performance metrics ## **IMPLEMENTATION DETAILS** **7. Technology Stack:** - Python 3.8+ - SQLite (or PostgreSQL for production) - pandas for data processing - SQLAlchemy for ORM (optional) - Flask/FastAPI for web interface (optional) - pytest for testing **8. File Structure:** ``` school_scheduler/ ├── data/ │ ├── raw_csv/ # Original CSV files │ ├── processed/ # Cleaned CSV files │ └── database/ # SQLite database ├── src/ │ ├── migration/ │ │ ├── analyzer.py # CSV analysis │ │ ├── cleaner.py # Data cleaning │ │ └── loader.py # Database loading │ ├── database/ │ │ ├── models.py # SQLAlchemy models │ │ ├── connection.py # DB connection │ │ └── queries.py # SQL queries │ ├── app/ │ │ ├── cli.py # Command line interface │ │ ├── api.py # REST API (optional) │ │ └── scheduler.py # Business logic │ └── utils/ │ ├── validators.py # Data validation │ └── logger.py # Logging setup ├── tests/ │ ├── test_migration.py │ ├── test_queries.py │ └── test_app.py ├── requirements.txt ├── config.yaml # Configuration └── README.md ``` **9. Step-by-Step Implementation:** **Phase 1: Analysis & Design** 1. Analyze CSV file structures 2. Design database schema 3. Create ER diagram 4. Plan data transformation rules **Phase 2: Migration Development** 1. Write CSV analyzer 2. Implement data cleaning functions 3. Create database schema 4. Build data loader with error handling 5. Add validation and logging **Phase 3: Application Development** 1. Set up database connection 2. Implement core query functions 3. Create CLI interface 4. Add search and export features 5. Implement error handling **Phase 4: Testing & Optimization** 1. Write unit tests 2. Test with sample data 3. Optimize queries 4. Add performance monitoring ## **SAMPLE DATA EXPECTATIONS** CSV files may contain: - Student data with multiple group affiliations - Complex scheduling with time slots - Room assignments with capacity constraints - Teacher-subject relationships - Overlapping Perfect! Here's the optimized AI prompt for your specific setup: ## **Optimized AI Prompt for SQLite Database Migration** ```text I'm building a school scheduling system migrating from CSV files to SQLite database. Here's my exact setup: ## **CURRENT SETUP** - **Folder Structure**: `sample_data/` folder containing CSV files - **Database Tool**: DB Browser for SQLite - **Target Database**: SQLite database file (.db or .sqlite) ## **PHASE 1: CSV ANALYSIS & DATABASE DESIGN** **1. Analyze CSV files in `sample_data/` folder:** Please examine these CSV files (provide actual file names if known): - `students.csv` - Student information - `teachers.csv` - Teacher information - `schedule.csv` - Scheduling data - `subjects.csv` - Subject catalog - `rooms.csv` - Room information - Any other CSV files present **2. For each CSV file, analyze:** - Column names and data types - Primary key candidates - Data quality issues (nulls, duplicates, inconsistencies) - Foreign key relationships to other files - Sample data patterns **3. Design SQLite database schema:** - Create normalized tables based on CSV analysis - Specify SQLite-compatible data types (TEXT, INTEGER, REAL, BLOB, NULL) - Design primary keys (INTEGER PRIMARY KEY for autoincrement) - Define foreign key constraints - Create indexes for frequent queries **4. Generate SQL schema for DB Browser SQLite:** Provide CREATE TABLE statements that work in DB Browser SQLite. ## **PHASE 2: PYTHON MIGRATION SCRIPT** **5. Create `migrate_to_sqlite.py` script that:** - Reads all CSV files from `sample_data/` folder - Uses pandas for data analysis and cleaning - Creates SQLite database file - Implements this workflow: ```python # Workflow: 1. Connect to SQLite database 2. Create tables from schema 3. For each CSV file: - Read with pandas - Clean data (handle nulls, format, validate) - Insert into corresponding table - Log errors/success 4. Create indexes 5. Generate migration report ``` **6. Specific requirements for SQLite:** - Use `sqlite3` Python module - Handle SQLite constraints (NOT NULL, UNIQUE, CHECK) - Use parameterized queries to prevent SQL injection - Implement transaction rollback on errors - Generate CSV import summary **7. Data cleaning for SQLite compatibility:** - Convert dates to TEXT in ISO format (YYYY-MM-DD HH:MM:SS) - Handle empty strings vs NULL - Ensure integer fields are numeric - Trim whitespace from TEXT fields ## **PHASE 3: PYTHON QUERY APPLICATION** **8. Create `school_scheduler.py` application with:** - SQLite database connection manager - Query functions for common operations: ```python # Essential Queries: 1. Find student: "Where is [student] right now?" 2. Student schedule: "Show [student]'s daily schedule" 3. Teacher schedule: "Show [teacher]'s classes" 4. Room availability: "Is [room] free at [time]?" 5. Schedule conflicts: "Find overlapping schedules" ``` **9. Application features:** - Command-line interface (CLI) with menu - Interactive query builder - Export results to CSV - Error handling for SQLite errors - Connection management (open/close) **10. SQLite-specific optimizations:** - Use connection pooling - Implement query caching if needed - Add EXPLAIN QUERY PLAN for optimization - Create views for complex queries ## **PHASE 4: DB BROWSER SQLITE INTEGRATION** **11. Generate files for DB Browser SQLite:** - SQL schema file (`.sql`) to import into DB Browser - Sample queries for DB Browser's "Execute SQL" tab - Instructions for using DB Browser with the database **12. Create `db_browser_setup.sql` with:** - All CREATE TABLE statements - Sample INSERT statements - Useful SELECT queries - Index creation statements ## **DELIVERABLES REQUESTED** Please provide: **1. Database Schema:** - Complete SQL CREATE TABLE statements for SQLite - ER diagram showing relationships - Index recommendations **2. Python Scripts:** - `analyze_csv.py` - Analyze sample_data CSV files - `migrate_to_sqlite.py` - Migration script - `school_scheduler.py` - Query application - `requirements.txt` - Dependencies **3. DB Browser Files:** - `schema.sql` - Import into DB Browser - `sample_queries.sql` - Pre-written queries **4. Documentation:** - README with setup instructions - Sample queries and expected results - Troubleshooting guide for SQLite issues ## **SAMPLE WORKFLOW** ```bash # Expected workflow: 1. python analyze_csv.py # Analyze CSV files 2. python migrate_to_sqlite.py # Create and populate database 3. Open school_schedule.db in DB Browser # Visual verification 4. python school_scheduler.py # Run query application ``` ## **ADDITIONAL REQUIREMENTS** **For SQLite Database:** - Database file: `school_schedule.db` - Enable foreign keys: `PRAGMA foreign_keys = ON` - Use WAL journal mode for better concurrency - Implement backup mechanism **For CSV Handling:** - Detect CSV encoding automatically - Handle different delimiters (comma, semicolon, tab) - Skip malformed rows with logging - Generate data quality report **For Python Application:** - Use argparse for CLI arguments - Implement --help documentation - Add verbose mode for debugging - Include unit tests for key functions Please analyze the CSV structure first, then provide the complete implementation. ``` ## **Quick Version (If you just want the code):** ```text Build a Python system that: 1. Analyzes CSV files in `sample_data/` folder 2. Creates SQLite database (`school_schedule.db`) 3. Migrates CSV data to SQLite with cleaning 4. Provides query application for school scheduling Key queries needed: - Find student location by time - Show student/teacher schedules - Check room availability - Detect schedule conflicts Use: Python, sqlite3, pandas, DB Browser SQLite compatible. ``` ## **Best Practices to Include in Your Prompt:** 1. **Be specific about folder paths**: `sample_data/` 2. **Mention your tools**: DB Browser for SQLite 3. **Specify file formats**: `.db` or `.sqlite` extension 4. **List expected CSV files** if you know their names 5. **Mention specific queries** you need 6. **Include error handling requirements** 7. **Ask for DB Browser integration files** This prompt will give you a complete, production-ready solution tailored to your exact setup with SQLite and DB Browser.