361 lines
11 KiB
Plaintext
361 lines
11 KiB
Plaintext
I need to analyze CSV files containing school scheduling data and create a complete Python database system. Here's what I need:
|
|
|
|
## **PROJECT OVERVIEW**
|
|
I'm building a school scheduling system that migrates from spreadsheets to a database. I have CSV files with school data that needs to be analyzed, transformed, and loaded into a relational database, then queried through a Python application.
|
|
|
|
## **DATA ANALYSIS PHASE**
|
|
|
|
**1. CSV Structure Analysis:**
|
|
- Analyze these sample CSV files for structure and content:
|
|
- `students.csv` (student_id, name, grade, class_group)
|
|
- `teachers.csv` (teacher_id, name, subject, department)
|
|
- `schedule.csv` (entry_id, student_id, teacher_id, room, subject, day, time_slot, group_type)
|
|
- `subjects.csv` (subject_id, subject_name, level, requirements)
|
|
- `rooms.csv` (room_id, room_number, capacity, equipment)
|
|
|
|
**2. Data Quality Check:**
|
|
- Identify missing values, duplicates, and inconsistencies
|
|
- Check data types and formatting consistency
|
|
- Identify relationships between different CSV files
|
|
- Note any data normalization needs
|
|
|
|
**3. Schema Design:**
|
|
- Design a normalized SQL database schema based on CSV analysis
|
|
- Create ERD showing relationships
|
|
- Specify primary keys, foreign keys, and constraints
|
|
|
|
## **DATA MIGRATION PHASE**
|
|
|
|
**4. Python Migration Script:**
|
|
Create a Python script that:
|
|
- Reads CSV files using pandas
|
|
- Cleans and transforms data (handle missing values, format dates, normalize text)
|
|
- Creates SQLite database with proper schema
|
|
- Inserts data with error handling and validation
|
|
- Creates indexes for performance
|
|
- Generates migration logs and reports
|
|
|
|
**5. Migration Features:**
|
|
- Incremental data loading
|
|
- Rollback capability
|
|
- Data validation rules
|
|
- Performance optimization
|
|
- Error logging and reporting
|
|
|
|
## **DATABASE APPLICATION PHASE**
|
|
|
|
**6. Python Database Application:**
|
|
Build a Python application with these features:
|
|
|
|
**A. Core Query Functions:**
|
|
1. Student queries:
|
|
- Find student by name/id
|
|
- Get student schedule
|
|
- Find student location at specific time
|
|
- Get student's teachers and subjects
|
|
|
|
2. Teacher queries:
|
|
- Find teacher schedule
|
|
- Get teacher's students
|
|
- Find available rooms for teacher
|
|
|
|
3. Schedule queries:
|
|
- Room availability by time
|
|
- Schedule conflicts detection
|
|
- Daily/weekly schedule views
|
|
|
|
**B. Application Structure:**
|
|
- Database connection manager
|
|
- Query builder/ORM layer
|
|
- Business logic layer
|
|
- CLI interface
|
|
- API endpoints (optional)
|
|
- Error handling and logging
|
|
|
|
**C. Advanced Features:**
|
|
- Search functionality (full-text search)
|
|
- Export to CSV/Excel
|
|
- Data validation before insertion
|
|
- Audit logging
|
|
- Performance metrics
|
|
|
|
## **IMPLEMENTATION DETAILS**
|
|
|
|
**7. Technology Stack:**
|
|
- Python 3.8+
|
|
- SQLite (or PostgreSQL for production)
|
|
- pandas for data processing
|
|
- SQLAlchemy for ORM (optional)
|
|
- Flask/FastAPI for web interface (optional)
|
|
- pytest for testing
|
|
|
|
**8. File Structure:**
|
|
```
|
|
school_scheduler/
|
|
├── data/
|
|
│ ├── raw_csv/ # Original CSV files
|
|
│ ├── processed/ # Cleaned CSV files
|
|
│ └── database/ # SQLite database
|
|
├── src/
|
|
│ ├── migration/
|
|
│ │ ├── analyzer.py # CSV analysis
|
|
│ │ ├── cleaner.py # Data cleaning
|
|
│ │ └── loader.py # Database loading
|
|
│ ├── database/
|
|
│ │ ├── models.py # SQLAlchemy models
|
|
│ │ ├── connection.py # DB connection
|
|
│ │ └── queries.py # SQL queries
|
|
│ ├── app/
|
|
│ │ ├── cli.py # Command line interface
|
|
│ │ ├── api.py # REST API (optional)
|
|
│ │ └── scheduler.py # Business logic
|
|
│ └── utils/
|
|
│ ├── validators.py # Data validation
|
|
│ └── logger.py # Logging setup
|
|
├── tests/
|
|
│ ├── test_migration.py
|
|
│ ├── test_queries.py
|
|
│ └── test_app.py
|
|
├── requirements.txt
|
|
├── config.yaml # Configuration
|
|
└── README.md
|
|
```
|
|
|
|
**9. Step-by-Step Implementation:**
|
|
|
|
**Phase 1: Analysis & Design**
|
|
1. Analyze CSV file structures
|
|
2. Design database schema
|
|
3. Create ER diagram
|
|
4. Plan data transformation rules
|
|
|
|
**Phase 2: Migration Development**
|
|
1. Write CSV analyzer
|
|
2. Implement data cleaning functions
|
|
3. Create database schema
|
|
4. Build data loader with error handling
|
|
5. Add validation and logging
|
|
|
|
**Phase 3: Application Development**
|
|
1. Set up database connection
|
|
2. Implement core query functions
|
|
3. Create CLI interface
|
|
4. Add search and export features
|
|
5. Implement error handling
|
|
|
|
**Phase 4: Testing & Optimization**
|
|
1. Write unit tests
|
|
2. Test with sample data
|
|
3. Optimize queries
|
|
4. Add performance monitoring
|
|
|
|
## **SAMPLE DATA EXPECTATIONS**
|
|
|
|
CSV files may contain:
|
|
- Student data with multiple group affiliations
|
|
- Complex scheduling with time slots
|
|
- Room assignments with capacity constraints
|
|
- Teacher-subject relationships
|
|
- Overlapping
|
|
|
|
Perfect! Here's the optimized AI prompt for your specific setup:
|
|
|
|
## **Optimized AI Prompt for SQLite Database Migration**
|
|
|
|
```text
|
|
I'm building a school scheduling system migrating from CSV files to SQLite database. Here's my exact setup:
|
|
|
|
## **CURRENT SETUP**
|
|
- **Folder Structure**: `sample_data/` folder containing CSV files
|
|
- **Database Tool**: DB Browser for SQLite
|
|
- **Target Database**: SQLite database file (.db or .sqlite)
|
|
|
|
## **PHASE 1: CSV ANALYSIS & DATABASE DESIGN**
|
|
|
|
**1. Analyze CSV files in `sample_data/` folder:**
|
|
Please examine these CSV files (provide actual file names if known):
|
|
- `students.csv` - Student information
|
|
- `teachers.csv` - Teacher information
|
|
- `schedule.csv` - Scheduling data
|
|
- `subjects.csv` - Subject catalog
|
|
- `rooms.csv` - Room information
|
|
- Any other CSV files present
|
|
|
|
**2. For each CSV file, analyze:**
|
|
- Column names and data types
|
|
- Primary key candidates
|
|
- Data quality issues (nulls, duplicates, inconsistencies)
|
|
- Foreign key relationships to other files
|
|
- Sample data patterns
|
|
|
|
**3. Design SQLite database schema:**
|
|
- Create normalized tables based on CSV analysis
|
|
- Specify SQLite-compatible data types (TEXT, INTEGER, REAL, BLOB, NULL)
|
|
- Design primary keys (INTEGER PRIMARY KEY for autoincrement)
|
|
- Define foreign key constraints
|
|
- Create indexes for frequent queries
|
|
|
|
**4. Generate SQL schema for DB Browser SQLite:**
|
|
Provide CREATE TABLE statements that work in DB Browser SQLite.
|
|
|
|
## **PHASE 2: PYTHON MIGRATION SCRIPT**
|
|
|
|
**5. Create `migrate_to_sqlite.py` script that:**
|
|
- Reads all CSV files from `sample_data/` folder
|
|
- Uses pandas for data analysis and cleaning
|
|
- Creates SQLite database file
|
|
- Implements this workflow:
|
|
|
|
```python
|
|
# Workflow:
|
|
1. Connect to SQLite database
|
|
2. Create tables from schema
|
|
3. For each CSV file:
|
|
- Read with pandas
|
|
- Clean data (handle nulls, format, validate)
|
|
- Insert into corresponding table
|
|
- Log errors/success
|
|
4. Create indexes
|
|
5. Generate migration report
|
|
```
|
|
|
|
**6. Specific requirements for SQLite:**
|
|
- Use `sqlite3` Python module
|
|
- Handle SQLite constraints (NOT NULL, UNIQUE, CHECK)
|
|
- Use parameterized queries to prevent SQL injection
|
|
- Implement transaction rollback on errors
|
|
- Generate CSV import summary
|
|
|
|
**7. Data cleaning for SQLite compatibility:**
|
|
- Convert dates to TEXT in ISO format (YYYY-MM-DD HH:MM:SS)
|
|
- Handle empty strings vs NULL
|
|
- Ensure integer fields are numeric
|
|
- Trim whitespace from TEXT fields
|
|
|
|
## **PHASE 3: PYTHON QUERY APPLICATION**
|
|
|
|
**8. Create `school_scheduler.py` application with:**
|
|
- SQLite database connection manager
|
|
- Query functions for common operations:
|
|
|
|
```python
|
|
# Essential Queries:
|
|
1. Find student: "Where is [student] right now?"
|
|
2. Student schedule: "Show [student]'s daily schedule"
|
|
3. Teacher schedule: "Show [teacher]'s classes"
|
|
4. Room availability: "Is [room] free at [time]?"
|
|
5. Schedule conflicts: "Find overlapping schedules"
|
|
```
|
|
|
|
**9. Application features:**
|
|
- Command-line interface (CLI) with menu
|
|
- Interactive query builder
|
|
- Export results to CSV
|
|
- Error handling for SQLite errors
|
|
- Connection management (open/close)
|
|
|
|
**10. SQLite-specific optimizations:**
|
|
- Use connection pooling
|
|
- Implement query caching if needed
|
|
- Add EXPLAIN QUERY PLAN for optimization
|
|
- Create views for complex queries
|
|
|
|
## **PHASE 4: DB BROWSER SQLITE INTEGRATION**
|
|
|
|
**11. Generate files for DB Browser SQLite:**
|
|
- SQL schema file (`.sql`) to import into DB Browser
|
|
- Sample queries for DB Browser's "Execute SQL" tab
|
|
- Instructions for using DB Browser with the database
|
|
|
|
**12. Create `db_browser_setup.sql` with:**
|
|
- All CREATE TABLE statements
|
|
- Sample INSERT statements
|
|
- Useful SELECT queries
|
|
- Index creation statements
|
|
|
|
## **DELIVERABLES REQUESTED**
|
|
|
|
Please provide:
|
|
|
|
**1. Database Schema:**
|
|
- Complete SQL CREATE TABLE statements for SQLite
|
|
- ER diagram showing relationships
|
|
- Index recommendations
|
|
|
|
**2. Python Scripts:**
|
|
- `analyze_csv.py` - Analyze sample_data CSV files
|
|
- `migrate_to_sqlite.py` - Migration script
|
|
- `school_scheduler.py` - Query application
|
|
- `requirements.txt` - Dependencies
|
|
|
|
**3. DB Browser Files:**
|
|
- `schema.sql` - Import into DB Browser
|
|
- `sample_queries.sql` - Pre-written queries
|
|
|
|
**4. Documentation:**
|
|
- README with setup instructions
|
|
- Sample queries and expected results
|
|
- Troubleshooting guide for SQLite issues
|
|
|
|
## **SAMPLE WORKFLOW**
|
|
|
|
```bash
|
|
# Expected workflow:
|
|
1. python analyze_csv.py # Analyze CSV files
|
|
2. python migrate_to_sqlite.py # Create and populate database
|
|
3. Open school_schedule.db in DB Browser # Visual verification
|
|
4. python school_scheduler.py # Run query application
|
|
```
|
|
|
|
## **ADDITIONAL REQUIREMENTS**
|
|
|
|
**For SQLite Database:**
|
|
- Database file: `school_schedule.db`
|
|
- Enable foreign keys: `PRAGMA foreign_keys = ON`
|
|
- Use WAL journal mode for better concurrency
|
|
- Implement backup mechanism
|
|
|
|
**For CSV Handling:**
|
|
- Detect CSV encoding automatically
|
|
- Handle different delimiters (comma, semicolon, tab)
|
|
- Skip malformed rows with logging
|
|
- Generate data quality report
|
|
|
|
**For Python Application:**
|
|
- Use argparse for CLI arguments
|
|
- Implement --help documentation
|
|
- Add verbose mode for debugging
|
|
- Include unit tests for key functions
|
|
|
|
Please analyze the CSV structure first, then provide the complete implementation.
|
|
```
|
|
|
|
## **Quick Version (If you just want the code):**
|
|
|
|
```text
|
|
Build a Python system that:
|
|
1. Analyzes CSV files in `sample_data/` folder
|
|
2. Creates SQLite database (`school_schedule.db`)
|
|
3. Migrates CSV data to SQLite with cleaning
|
|
4. Provides query application for school scheduling
|
|
|
|
Key queries needed:
|
|
- Find student location by time
|
|
- Show student/teacher schedules
|
|
- Check room availability
|
|
- Detect schedule conflicts
|
|
|
|
Use: Python, sqlite3, pandas, DB Browser SQLite compatible.
|
|
```
|
|
|
|
## **Best Practices to Include in Your Prompt:**
|
|
|
|
1. **Be specific about folder paths**: `sample_data/`
|
|
2. **Mention your tools**: DB Browser for SQLite
|
|
3. **Specify file formats**: `.db` or `.sqlite` extension
|
|
4. **List expected CSV files** if you know their names
|
|
5. **Mention specific queries** you need
|
|
6. **Include error handling requirements**
|
|
7. **Ask for DB Browser integration files**
|
|
|
|
This prompt will give you a complete, production-ready solution tailored to your exact setup with SQLite and DB Browser. |