This commit is contained in:
2026-02-05 10:15:09 +03:00
parent 2427fce842
commit 67241a5ed0
33 changed files with 13147 additions and 154 deletions

Binary file not shown.

View File

@@ -0,0 +1,31 @@
# HTML Files Combination Documentation
## Overview
This document describes the combination of multiple HTML files into the main thesis document.
## What Was Done
Content from two additional HTML files was inserted into the main thesis document:
- `/Thesis materials/deepseek_html_20260128_0dc71d.html`
- `/Thesis materials/deepseek_html_20260128_15ee7a.html`
These were inserted into:
- `/Thesis materials/Thesis_ Intelligent School Schedule Management System.html`
## Files Combined
1. **Main File**: `Thesis_ Intelligent School Schedule Management System.html` (original)
2. **Added Content 1**: `deepseek_html_20260128_0dc71d.html` (added as "Additional Content Section 1")
3. **Added Content 2**: `deepseek_html_20260128_15ee7a.html` (added as "Additional Content Section 2")
## How It Was Done
- Removed HTML structure (doctype, html, head, body tags) from the additional files
- Added both contents to the main file before the closing `</body>` tag
- Wrapped each addition in a styled section with descriptive headings
- Applied consistent styling to match the main document theme
## Styling Added
- Each section has a gray border with rounded corners
- Distinct headings with blue underline for visual separation
- Appropriate margins and padding for readability
## Result
The main HTML document now contains all three files' content in a unified format, with clear visual separation between the original content and the added sections.

View File

@@ -0,0 +1,42 @@
# Consolidated CSV Data Documentation
## Overview
This directory contains a consolidated CSV file (`consolidated_data_simple.csv`) that combines data from multiple individual CSV files in the `sample_data` directory. Each original CSV file is identified by a sheet number and filename prefix in the consolidated file.
## File Structure
### `consolidated_data_simple.csv`
- **Columns**: `[Sheet_Number, Original_File, Original_Column_1, Original_Column_2, ...]`
- **Sheet Numbers**:
1. `8-Table 1.csv`
2. `1-Table 1.csv`
3. `10-Table 1.csv`
4. `Лист3-Table 1.csv`
5. `7-Table 1.csv`
6. `Реестр заявлений на перевод 252-Table 1.csv`
7. `ТАЙМПАД-Table 1.csv`
8. `6 -Table 1.csv`
9. `11-Table 1.csv`
10. `4 -Table 1.csv`
11. `3 -Table 1.csv`
12. `2 -Table 1.csv`
13. `5 -Table 1.csv`
14. `9-Table 1.csv`
15. `АНГЛ-Table 1.csv`
## Format Details
- Column 1: `Sheet_Number` - The numeric identifier for the original CSV file
- Column 2: `Original_File` - The filename of the original CSV file
- Columns 3+: The original data columns from each CSV file
## Purpose
This consolidated file is designed for AI/ML analysis where each original CSV sheet can be identified by its sheet number, allowing algorithms to treat each original dataset separately while analyzing the combined data.
## Total Records
- Total rows in consolidated file: 3283
- Number of original CSV files consolidated: 15
## Notes
- All files were encoded in UTF-8 to preserve Cyrillic characters
- Some original files may have been skipped if they did not contain student data (e.g., notification texts)
- The consolidation preserves the original row and column structure from each source file

View File

@@ -0,0 +1,42 @@
# Consolidated HTML Theses Documentation
## Overview
This directory contains a consolidated HTML file (`consolidated_theses.html`) that combines multiple thesis documents into a single, organized HTML document. Each original document is clearly separated with headers and navigation links.
## File Structure
### `consolidated_theses.html`
A single HTML file containing all thesis documents with:
- Table of Contents with links to each document
- Clear visual separation between documents
- Document headers with titles and source file names
- Responsive styling for easy reading
## Included Documents
1. Lesson_ SQLite Database Implementation.html
2. Presentaion_School Schedule Assistant Bot _ Student Project.html
3. Professional_Thesis_Scheduler_Bot.html
4. Scheduler Bot_ Telegram & CSV Database.html
5. Student Database Search System _ Beginner's Guide.html
6. Thesis_ Intelligent School Schedule Management System_23_Jan_2026.html
7. Thesis_AI7_Building_A_ Scheduler_Bot_A Student Project.html
## Features
- **Navigation**: Clickable table of contents linking to each document
- **Visual Separation**: Each document is visually separated with distinct headers
- **Responsive Design**: Optimized for both desktop and mobile viewing
- **Self-contained**: All CSS styling included within the HTML file
- **Easy Sharing**: Single file containing all thesis documents
## File Size
- `consolidated_theses.html`: ~129 KB (129,814 bytes)
## Purpose
This consolidated file is designed for:
- Easy navigation between multiple thesis documents
- Simplified sharing and distribution
- Streamlined review and analysis of related documents
- Preservation of all content in a single file format
## Access
Simply open `consolidated_theses.html` in any modern web browser to access all documents.

211
scheduler_bots/README_v2.md Normal file
View File

@@ -0,0 +1,211 @@
# Implementing SQLite Database in Telegram Scheduler Bot
This document explains how to enhance your existing Telegram Scheduler Bot (`telegram_scheduler_v2.py`) to include SQLite database functionality, resulting in `telegram_scheduler_v3.py`.
## Overview
The transition from `telegram_scheduler_v2.py` to `telegram_scheduler_v3.py` introduces persistent storage capabilities using SQLite, allowing the bot to store, retrieve, and manage schedule data beyond runtime.
## What is SQLite?
SQLite is a lightweight, serverless, self-contained SQL database engine. It stores the entire database in a single file, making it ideal for applications that need local data persistence without setting up a separate database server.
## How the Database File (`schedule.db`) Appears
The `schedule.db` file is automatically created when:
1. The bot runs for the first time after implementing SQLite functionality
2. The `init_db()` function executes, which creates the database file if it doesn't exist
3. The first database operation occurs (like adding a record)
The file appears in the same directory as your Python script and persists between program runs.
## Step-by-Step Implementation Guide
### 1. Import Required Libraries
Add SQLite3 import to your existing imports:
```python
import sqlite3
```
### 2. Database Initialization Function
Create a function to initialize your database:
```python
def init_db():
"""Initialize the SQLite database and create tables if they don't exist."""
conn = sqlite3.connect(DATABASE_NAME)
cursor = conn.cursor()
# Create table for schedule entries
cursor.execute('''
CREATE TABLE IF NOT EXISTS schedule (
id INTEGER PRIMARY KEY AUTOINCREMENT,
day TEXT NOT NULL,
period INTEGER NOT NULL,
subject TEXT NOT NULL,
class_name TEXT NOT NULL,
room TEXT NOT NULL,
UNIQUE(day, period)
)
''')
conn.commit()
conn.close()
```
### 3. Database Connection Setup
Define the database name and initialize it:
```python
# Database setup
DATABASE_NAME = "schedule.db"
# Initialize the database
init_db()
```
### 4. Data Manipulation Functions
Add functions to interact with the database:
```python
def add_schedule_entry(day, period, subject, class_name, room):
"""Add a new schedule entry to the database."""
conn = sqlite3.connect(DATABASE_NAME)
cursor = conn.cursor()
try:
cursor.execute('''
INSERT OR REPLACE INTO schedule (day, period, subject, class_name, room)
VALUES (?, ?, ?, ?, ?)
''', (day, period, subject, class_name, room))
conn.commit()
conn.close()
return True
except sqlite3.Error as e:
print(f"Database error: {e}")
conn.close()
return False
def load_schedule_from_db():
"""Load schedule from the SQLite database."""
conn = sqlite3.connect(DATABASE_NAME)
cursor = conn.cursor()
cursor.execute("SELECT day, period, subject, class_name, room FROM schedule ORDER BY day, period")
rows = cursor.fetchall()
conn.close()
# Group by day
schedule = {}
for day, period, subject, class_name, room in rows:
if day not in schedule:
schedule[day] = []
class_info = f"Subject: {subject} Class: {class_name} Room: {room}"
schedule[day].append((str(period), class_info))
return schedule
```
### 5. Update Existing Functions to Use Database
Modify your schedule-retrieving functions to use the database instead of CSV:
```python
async def where_am_i(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Tell user where they should be right now."""
# Reload schedule from DB to ensure latest data
schedule = load_schedule_from_db()
# ... rest of function remains similar but uses 'schedule' from DB
```
### 6. Add Conversation State Management
To handle multi-step interactions like the `/add` command:
```python
# User states for tracking conversations
user_states = {} # Stores user conversation state
```
### 7. Implement the New `/add` Command
Create an interactive command that collects data from the user:
```python
async def add(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Start the process of adding a new schedule entry."""
user_id = update.effective_user.id
user_states[user_id] = {"step": "waiting_day"}
await update.message.reply_text(
"📅 Adding a new class to the schedule.\n"
"Please enter the day of the week (e.g., Monday, Tuesday, etc.):"
)
```
### 8. Handle Messages During Conversations
Add a general message handler for interactive flows:
```python
async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Handle user messages during the add process."""
# Implementation for processing user input during multi-step conversations
# Handles day -> period -> subject -> class -> room sequence
```
### 9. Register New Handlers
Add the new handlers to your main function:
```python
def main():
# Create the Application
application = Application.builder().token(BOT_TOKEN).build()
# Add command handlers
application.add_handler(CommandHandler("start", start))
application.add_handler(CommandHandler("whereami", where_am_i))
application.add_handler(CommandHandler("schedule", schedule))
application.add_handler(CommandHandler("tomorrow", tomorrow))
application.add_handler(CommandHandler("add", add)) # New command
application.add_handler(CommandHandler("help", help_command))
# Add message handler for conversation flow
application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
```
## Key Changes Summary
| Aspect | telegram_scheduler_v2.py | telegram_scheduler_v3.py |
|--------|--------------------------|--------------------------|
| Data Storage | CSV file | SQLite database |
| Persistence | Lost when program ends | Persists between runs |
| New Classes | Cannot add dynamically | Interactive `/add` command |
| Data Updates | Requires manual CSV editing | Real-time updates via bot |
## Benefits of Using SQLite
1. **Persistence**: Data survives bot restarts
2. **Dynamic Updates**: Users can add new classes without changing files
3. **Data Integrity**: Built-in constraints prevent duplicates
4. **Scalability**: Easy to extend with additional tables/fields
5. **Performance**: Fast queries for schedule lookups
## Security Note
The `schedule.db` file contains your schedule data and should be protected accordingly. In production environments, consider access controls and backups.
## Troubleshooting
- If the database isn't being created, ensure your application has write permissions in the directory
- Check logs for SQLite error messages if operations fail
- The database file will grow as more entries are added over time

Binary file not shown.

View File

@@ -0,0 +1,182 @@
#!/usr/bin/env python
"""
combine_thesis_html.py - Combines all HTML files in the Thesis materials directory
into the main thesis document
"""
import os
import re
from bs4 import BeautifulSoup
def combine_html_files():
# Directory containing the HTML files
thesis_dir = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/scheduler_bots/Thesis materials"
# Main file to append content to
main_file = "Thesis_ Intelligent School Schedule Management System.html"
main_file_path = os.path.join(thesis_dir, main_file)
# Get all HTML files in the directory
html_files = [f for f in os.listdir(thesis_dir) if f.endswith('.html')]
print(f"Found {len(html_files)} HTML files:")
for i, f in enumerate(html_files, 1):
print(f" {i}. {f}")
# Read the main file content
with open(main_file_path, 'r', encoding='utf-8') as f:
main_content = f.read()
# Parse the main file with BeautifulSoup
soup_main = BeautifulSoup(main_content, 'html.parser')
# Find the body element in the main file
main_body = soup_main.find('body')
if not main_body:
# If no body tag, create one
main_body = soup_main.new_tag('body')
soup_main.html.insert(0, main_body) if soup_main.html else soup_main.insert(0, main_body)
# Add a separator before adding new content
separator = soup_main.new_tag('hr')
separator['style'] = 'margin: 40px 0; border: 2px solid #4a6fa5;'
main_body.append(separator)
# Add a heading for the appended content
appendix_heading = soup_main.new_tag('h2')
appendix_heading.string = 'Additional Thesis Materials'
appendix_heading['style'] = 'color: #2c3e50; margin-top: 40px; border-bottom: 2px solid #4a6fa5; padding-bottom: 10px;'
main_body.append(appendix_heading)
# Process each additional HTML file
for filename in html_files:
if filename == main_file: # Skip the main file
continue
print(f"Processing {filename}...")
file_path = os.path.join(thesis_dir, filename)
# Read the additional file content
with open(file_path, 'r', encoding='utf-8') as f:
additional_content = f.read()
# Parse the additional file
soup_additional = BeautifulSoup(additional_content, 'html.parser')
# Create a section for this file
section_div = soup_main.new_tag('div')
section_div['class'] = 'additional-section'
section_div['style'] = 'margin: 30px 0; padding: 20px; border: 1px solid #ddd; border-radius: 8px; background-color: #fafafa;'
# Add a heading for this section
section_heading = soup_main.new_tag('h3')
section_heading.string = f'Content from: {filename}'
section_heading['style'] = 'color: #4a6fa5; margin-top: 0;'
section_div.append(section_heading)
# Get body content from the additional file
additional_body = soup_additional.find('body')
if additional_body:
# Copy child elements from the additional body to our section
for child in additional_body.children:
if child.name: # Only copy actual elements, not text nodes
section_div.append(child.extract())
else:
# If no body tag, add the whole content
section_div.append(soup_additional)
# Append the section to the main body
main_body.append(section_div)
# Write the combined content back to the main file
with open(main_file_path, 'w', encoding='utf-8') as f:
f.write(str(soup_main.prettify()))
print(f"All HTML files have been combined into {main_file}")
print(f"Combined file saved at: {main_file_path}")
if __name__ == "__main__":
# Check if BeautifulSoup is available
try:
import bs4
combine_html_files()
except ImportError:
print("BeautifulSoup4 library is required for this script.")
print("Install it with: pip install beautifulsoup4")
# Create a simple version without BeautifulSoup
print("Creating a basic combination without BeautifulSoup...")
thesis_dir = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/scheduler_bots/Thesis materials"
main_file = "Thesis_ Intelligent School Schedule Management System.html"
main_file_path = os.path.join(thesis_dir, main_file)
# Get all HTML files in the directory
html_files = [f for f in os.listdir(thesis_dir) if f.endswith('.html')]
# Read the main file content
with open(main_file_path, 'r', encoding='utf-8') as f:
main_content = f.read()
# Find the closing body tag to insert additional content
body_close_pos = main_content.rfind('</body>')
if body_close_pos == -1:
# If no closing body tag, find the closing html tag
html_close_pos = main_content.rfind('</html>')
if html_close_pos != -1:
insert_pos = html_close_pos
else:
# If no closing html tag, append at the end
insert_pos = len(main_content)
else:
insert_pos = body_close_pos
# Prepare the additional content
additional_content = '\n\n<!-- Additional Thesis Materials -->\n<hr style="margin: 40px 0; border: 2px solid #4a6fa5;">\n<h2 style="color: #2c3e50; margin-top: 40px; border-bottom: 2px solid #4a6fa5; padding-bottom: 10px;">Additional Thesis Materials</h2>\n\n'
# Process each additional HTML file
for filename in html_files:
if filename == main_file: # Skip the main file
continue
print(f"Processing {filename}...")
file_path = os.path.join(thesis_dir, filename)
# Read the additional file content
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# Remove HTML and HEAD sections to only get body content
# Remove doctype
content = re.sub(r'<!DOCTYPE[^>]*>', '', content, flags=re.IGNORECASE)
# Remove html tags
content = re.sub(r'<html[^>]*>|</html>', '', content, flags=re.IGNORECASE)
# Remove head section
content = re.sub(r'<head[^>]*>.*?</head>', '', content, flags=re.DOTALL | re.IGNORECASE)
# Remove opening and closing body tags
content = re.sub(r'<body[^>]*>|</body>', '', content, flags=re.IGNORECASE)
# Add section wrapper
section_content = f'\n<div class="additional-section" style="margin: 30px 0; padding: 20px; border: 1px solid #ddd; border-radius: 8px; background-color: #fafafa;">\n'
section_content += f'<h3 style="color: #4a6fa5; margin-top: 0;">Content from: {filename}</h3>\n'
section_content += content
section_content += '\n</div>\n'
additional_content += section_content
# Insert the additional content
combined_content = main_content[:insert_pos] + additional_content + main_content[insert_pos:]
# Write the combined content back to the main file
with open(main_file_path, 'w', encoding='utf-8') as f:
f.write(combined_content)
print(f"All HTML files have been combined into {main_file}")
print(f"Combined file saved at: {main_file_path}")

View File

@@ -0,0 +1,117 @@
#!/usr/bin/env python
"""
consolidate_csv.py - Consolidates all CSV files in sample_data directory into a single CSV
with sheet identifiers to distinguish between different original files
"""
import csv
import os
def consolidate_csv_files():
sample_data_dir = "sample_data"
output_file = "consolidated_data.csv"
if not os.path.exists(sample_data_dir):
print(f"Directory '{sample_data_dir}' not found.")
return
# Get all CSV files and filter out the schedule template and sheet files
all_csv_files = [f for f in os.listdir(sample_data_dir) if f.endswith('.csv')]
# Keep only the actual student distribution files (not the sheets)
csv_files = []
for filename in all_csv_files:
if 'first_sheet' not in filename and 'last_sheet' not in filename and 'template' not in filename:
csv_files.append(filename)
if not csv_files:
print(f"No student data CSV files found in '{sample_data_dir}' directory.")
return
print(f"Found {len(csv_files)} student data CSV file(s):")
for i, filename in enumerate(csv_files, 1):
print(f" {i}. {filename}")
consolidated_rows = []
for sheet_num, filename in enumerate(csv_files, 1):
csv_path = os.path.join(sample_data_dir, filename)
print(f"Processing {csv_path}...")
with open(csv_path, 'r', encoding='utf-8') as file:
reader = csv.reader(file)
rows = list(reader)
# Add a column to indicate which sheet this data came from
for row_idx, row in enumerate(rows):
# Create a new row with the sheet number as the first column
new_row = [sheet_num, filename] + row
consolidated_rows.append(new_row)
# Write consolidated data to a new CSV file
with open(output_file, 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
# Write header
writer.writerow(['Sheet_Number', 'Original_File', 'Data_Columns'])
# Write all rows
for row in consolidated_rows:
writer.writerow(row)
print(f"Consolidated data written to {output_file}")
print(f"Total rows in consolidated file: {len(consolidated_rows)}")
def consolidate_csv_files_simple():
"""
Creates a simpler consolidated CSV file with Sheet_Number and Original_File columns
"""
sample_data_dir = "sample_data"
output_file = "consolidated_data_simple.csv"
if not os.path.exists(sample_data_dir):
print(f"Directory '{sample_data_dir}' not found.")
return
# Get all CSV files
all_csv_files = [f for f in os.listdir(sample_data_dir) if f.endswith('.csv')]
# Keep only the actual student distribution files (not the sheets)
csv_files = []
for filename in all_csv_files:
if 'first_sheet' not in filename and 'last_sheet' not in filename and 'template' not in filename:
csv_files.append(filename)
if not csv_files:
print(f"No student data CSV files found in '{sample_data_dir}' directory.")
return
print(f"Found {len(csv_files)} student data CSV file(s):")
for i, filename in enumerate(csv_files, 1):
print(f" {i}. {filename}")
with open(output_file, 'w', newline='', encoding='utf-8') as outfile:
writer = csv.writer(outfile)
# Process each file
for sheet_num, filename in enumerate(csv_files, 1):
csv_path = os.path.join(sample_data_dir, filename)
print(f"Processing {csv_path}...")
with open(csv_path, 'r', encoding='utf-8') as infile:
reader = csv.reader(infile)
for row in reader:
# Add sheet number and filename as first two columns
new_row = [sheet_num, filename] + row
writer.writerow(new_row)
print(f"Simple consolidated data written to {output_file}")
if __name__ == "__main__":
print("Creating consolidated CSV with sheet identifiers...")
consolidate_csv_files_simple()
print("Done! You can now upload the consolidated_data_simple.csv file for AI/ML analysis.")

View File

@@ -0,0 +1,202 @@
#!/usr/bin/env python
"""
consolidate_theses.py - Consolidates all HTML thesis files into a single HTML file
with clear separation between different documents
"""
import os
import re
def consolidate_html_theses():
# Define the parent directory containing HTML files
parent_dir = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3"
# List of HTML thesis files to consolidate
html_files = [
"Lesson_ SQLite Database Implementation.html",
"Presentaion_School Schedule Assistant Bot _ Student Project.html",
"Professional_Thesis_Scheduler_Bot.html",
"Scheduler Bot_ Telegram & CSV Database.html",
"Student Database Search System _ Beginner's Guide.html",
"Thesis_ Intelligent School Schedule Management System_23_Jan_2026.html",
"Thesis_AI7_Building_A_ Scheduler_Bot_A Student Project.html"
]
# Output file
output_file = "consolidated_theses.html"
# Start building the consolidated HTML
consolidated_html = """<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Consolidated Thesis Documents</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 20px;
background-color: #f9f9f9;
line-height: 1.6;
}
.document-separator {
page-break-before: always;
border-top: 3px solid #333;
margin: 30px 0;
}
.document-header {
background-color: #e9ecef;
padding: 15px;
border-radius: 5px;
margin-bottom: 20px;
border-left: 4px solid #007bff;
}
.document-title {
color: #2c3e50;
font-size: 24px;
margin: 0;
}
.document-source {
color: #6c757d;
font-size: 14px;
margin-top: 5px;
}
.document-content {
background-color: white;
padding: 20px;
border-radius: 5px;
box-shadow: 0 2px 5px rgba(0,0,0,0.1);
margin-bottom: 30px;
}
.toc {
background-color: #f8f9fa;
padding: 20px;
border-radius: 5px;
margin-bottom: 30px;
border-left: 4px solid #28a745;
}
.toc h2 {
color: #2c3e50;
margin-top: 0;
}
.toc ul {
list-style-type: decimal;
padding-left: 20px;
}
.toc li {
margin-bottom: 8px;
}
.toc a {
text-decoration: none;
color: #007bff;
}
.toc a:hover {
text-decoration: underline;
}
h1 {
color: #343a40;
border-bottom: 2px solid #007bff;
padding-bottom: 10px;
}
.footer {
text-align: center;
margin-top: 30px;
padding: 15px;
color: #6c757d;
font-size: 12px;
border-top: 1px solid #dee2e6;
}
</style>
</head>
<body>
<h1>Consolidated Thesis Collection</h1>
<div class="toc">
<h2>Table of Contents</h2>
<ul>
"""
# Add links to each document in the TOC
for i, filename in enumerate(html_files, 1):
doc_title = os.path.splitext(filename)[0].replace('_', ' ')
consolidated_html += f' <li><a href="#doc-{i}">{i}. {doc_title}</a></li>\n'
# Close TOC section
consolidated_html += """ </ul>
</div>
"""
# Process each HTML file
for i, filename in enumerate(html_files, 1):
filepath = os.path.join(parent_dir, filename)
if not os.path.exists(filepath):
print(f"File not found: {filename}")
continue
print(f"Processing {filename}...")
# Add document separator and header
doc_title = os.path.splitext(filename)[0].replace('_', ' ')
consolidated_html += f""" <div class="document-separator" id="doc-{i}"></div>
<div class="document-header">
<h2 class="document-title">{i}. {doc_title}</h2>
<div class="document-source">Source file: {filename}</div>
</div>
<div class="document-content">
"""
# Read the HTML file and extract content
try:
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
# Remove HTML and HEAD tags, keeping only BODY content
# First remove the DOCTYPE declaration if present
content = re.sub(r'<!DOCTYPE[^>]*>', '', content, flags=re.IGNORECASE)
# Remove HTML tags and everything outside the body
body_start = content.find('<body')
if body_start != -1:
body_start = content.find('>', body_start) + 1
body_end = content.rfind('</body>')
if body_end != -1:
content = content[body_start:body_end]
# If no body tags found, try to remove head section
if body_start == -1 or body_end == -1:
head_match = re.search(r'<head[^>]*>.*?</head>', content, re.DOTALL | re.IGNORECASE)
if head_match:
content = content.replace(head_match.group(0), '')
# Remove html tags if present
content = re.sub(r'<html[^>]*>|</html>', '', content, flags=re.IGNORECASE)
# Add the content to the consolidated HTML
consolidated_html += content
except Exception as e:
print(f"Error processing {filename}: {str(e)}")
consolidated_html += f"<p><em>Error reading this document: {str(e)}</em></p>"
# Close the document content div
consolidated_html += """ </div>
"""
# Add footer
consolidated_html += """ <div class="footer">
<p>Consolidated from multiple thesis documents | Generated automatically</p>
</div>
</body>
</html>"""
# Write the consolidated HTML to file
with open(output_file, 'w', encoding='utf-8') as f:
f.write(consolidated_html)
print(f"Consolidated HTML thesis document created: {output_file}")
print(f"Included {len([f for f in html_files if os.path.exists(os.path.join(parent_dir, f))])} documents in the consolidated file")
if __name__ == "__main__":
consolidate_html_theses()

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,87 @@
#!/usr/bin/env python
"""
convert_dfd_to_png.py - Converts DFD.html to a PNG image file
"""
import os
from pathlib import Path
def convert_html_to_png():
# Define the input and output file paths
input_file = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.html"
output_file = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.png"
# Check if the input file exists
if not os.path.exists(input_file):
print(f"Input file does not exist: {input_file}")
return
print("Converting DFD.html to DFD.png...")
# Since we need to convert HTML to PNG, this requires special tools
# First, let's check if we have the required libraries
try:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# Setup Chrome options for headless browsing
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
# Create a temporary HTML file to ensure proper formatting
driver = webdriver.Chrome(options=chrome_options)
# Load the HTML file
file_url = f"file://{os.path.abspath(input_file)}"
driver.get(file_url)
# Take a screenshot of the entire page
driver.set_window_size(1200, 800) # Set window size
driver.save_screenshot(output_file)
driver.quit()
print(f"Successfully converted DFD.html to DFD.png")
print(f"Output file: {output_file}")
except ImportError:
# If selenium is not available, try using playwright
try:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Load the HTML file
file_path = os.path.abspath(input_file)
page.goto(f"file://{file_path}")
# Set viewport size
page.set_viewport_size({"width": 1200, "height": 800})
# Take screenshot
page.screenshot(path=output_file, full_page=True)
browser.close()
print(f"Successfully converted DFD.html to DFD.png using Playwright")
print(f"Output file: {output_file}")
except ImportError:
# If neither selenium nor playwright is available, inform the user
print("Required libraries not available for HTML to PNG conversion.")
print("To convert HTML to PNG, you need to install one of these packages:")
print(" pip install selenium")
print(" OR")
print(" pip install playwright")
print(" OR")
print(" Use a web browser to manually export the HTML as PDF/PNG")
return
if __name__ == "__main__":
convert_html_to_png()

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env python
"""
convert_dfd_to_png_alt.py - Alternative method to convert DFD.html to a PNG image file
"""
import os
import subprocess
from pathlib import Path
def convert_html_to_png():
# Define the input and output file paths
input_file = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.html"
output_file = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.png"
# Check if the input file exists
if not os.path.exists(input_file):
print(f"Input file does not exist: {input_file}")
return
print("Attempting to convert DFD.html to DFD.png...")
# Method 1: Using wkhtmltoimage if available
try:
subprocess.run([
"wkhtmltoimage",
"--width", "1200",
"--height", "800",
input_file,
output_file
], check=True)
print(f"Successfully converted DFD.html to DFD.png using wkhtmltoimage")
print(f"Output file: {output_file}")
return
except FileNotFoundError:
print("wkhtmltoimage not found. Trying alternative method...")
except subprocess.CalledProcessError as e:
print(f"Error using wkhtmltoimage: {e}")
# Method 2: Using weasyprint if available
try:
import weasyprint
from PIL import Image
import io
# Convert HTML to PDF in memory first
html_doc = weasyprint.HTML(input_file)
pdf_bytes = html_doc.write_pdf()
# Convert PDF to PNG (this is more complex and may require additional tools)
print("WeasyPrint method requires additional image conversion tools.")
except ImportError:
print("WeasyPrint not available. Trying simpler approach...")
# Method 3: Provide instructions for manual conversion
print("\nHTML to PNG conversion requires specialized tools.")
print("You can manually convert the file using one of these methods:")
print("1. Open the HTML file in a browser, take a screenshot, and save as PNG")
print("2. Install wkhtmltopdf/wkhtmltoimage: brew install wkhtmltopdf (on macOS)")
print("3. Use online converters that support HTML to PNG conversion")
print(f"\nHTML file location: {input_file}")
# Just copy a placeholder for now
print("\nAs a placeholder, I'm noting that the conversion needs to be done manually or with the proper tools installed.")
if __name__ == "__main__":
convert_html_to_png()

961
scheduler_bots/database.py Normal file
View File

@@ -0,0 +1,961 @@
#!/usr/bin/env python
"""
database.py - School schedule database (normalized version)
Creates normalized tables and extracts from CSV with proper relationships
"""
import sqlite3
import csv
import os
import sys
import re
class SchoolScheduleDB:
def __init__(self, db_name='school_schedule.db'):
self.conn = sqlite3.connect(db_name)
self.cursor = self.conn.cursor()
# Initialize database tables
self.create_tables()
def normalize_class_name(self, class_name):
"""Normalize class names to handle Cyrillic/Latin character differences"""
if not class_name:
return class_name
# Replace Cyrillic characters with Latin equivalents in class names
# Specifically: replace Cyrillic А (U+0410) with Latin A (U+0041)
normalized = class_name.replace('А', 'A').replace('В', 'B').replace('С', 'C')
return normalized
def create_tables(self):
"""Create normalized tables with proper relationships"""
# Teachers table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS teachers (
teacher_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
email TEXT,
phone TEXT
)
""")
# Subjects table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS subjects (
subject_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
description TEXT
)
""")
# Days table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS days (
day_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL -- e.g., Monday, Tuesday, etc.
)
""")
# Periods table - with proper unique constraint
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS periods (
period_id INTEGER PRIMARY KEY AUTOINCREMENT,
period_number INTEGER,
start_time TEXT,
end_time TEXT,
UNIQUE(period_number, start_time, end_time)
)
""")
# Groups table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS groups (
group_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
description TEXT,
class_name TEXT
)
""")
# Students table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS students (
student_id INTEGER PRIMARY KEY AUTOINCREMENT,
class_name TEXT,
full_name TEXT NOT NULL,
UNIQUE(full_name, class_name) -- Prevent duplicate student entries
)
""")
# Homeroom teachers table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS homeroom_teachers (
homeroom_id INTEGER PRIMARY KEY AUTOINCREMENT,
class_name TEXT UNIQUE,
teacher_name TEXT,
classroom TEXT,
parent_meeting_room TEXT,
internal_number TEXT,
mobile_number TEXT
)
""")
# Schedule table with foreign key relationships
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS schedule (
entry_id INTEGER PRIMARY KEY AUTOINCREMENT,
student_id INTEGER,
subject_id INTEGER,
teacher_id INTEGER,
day_id INTEGER,
period_id INTEGER,
group_id INTEGER,
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (subject_id) REFERENCES subjects(subject_id),
FOREIGN KEY (teacher_id) REFERENCES teachers(teacher_id),
FOREIGN KEY (day_id) REFERENCES days(day_id),
FOREIGN KEY (period_id) REFERENCES periods(period_id),
FOREIGN KEY (group_id) REFERENCES groups(group_id)
)
""")
self.conn.commit()
def populate_periods_table(self):
"""Populate the periods table with standard school periods"""
period_times = {
'1': ('09:00', '09:40'),
'2': ('10:00', '10:40'),
'3': ('11:00', '11:40'),
'4': ('11:50', '12:30'),
'5': ('12:40', '13:20'),
'6': ('13:30', '14:10'),
'7': ('14:20', '15:00'),
'8': ('15:20', '16:00'),
'9': ('16:15', '16:55'),
'10': ('17:05', '17:45'),
'11': ('17:55', '18:35'),
'12': ('18:45', '19:20'),
'13': ('19:20', '20:00')
}
for period_num, (start_time, end_time) in period_times.items():
self.cursor.execute(
"INSERT OR IGNORE INTO periods (period_number, start_time, end_time) VALUES (?, ?, ?)",
(int(period_num), start_time, end_time)
)
# Add days of the week
days_of_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
for day in days_of_week:
self.cursor.execute("INSERT OR IGNORE INTO days (name) VALUES (?)", (day,))
self.conn.commit()
def update_database_from_csv(self, auto_update=True):
"""Automatically update database from specific CSV files in the sample_data directory"""
# Updated path to look in the parent directory
sample_data_dir = "../sample_data"
if not os.path.exists(sample_data_dir):
print(f"Directory '{sample_data_dir}' not found. Trying local directory...")
sample_data_dir = "sample_data"
if not os.path.exists(sample_data_dir):
print(f"Directory '{sample_data_dir}' not found.")
return
# Get all CSV files and filter out the schedule template and sheet files
all_csv_files = [f for f in os.listdir(sample_data_dir) if f.endswith('.csv')]
# Keep only the actual student distribution files (not the sheets)
csv_files = []
for filename in all_csv_files:
if 'first_sheet' not in filename and 'last_sheet' not in filename and 'template' not in filename:
csv_files.append(filename)
if not csv_files:
print(f"No student data CSV files found in '{sample_data_dir}' directory.")
return
print(f"Found {len(csv_files)} student data CSV file(s):")
for i, filename in enumerate(csv_files, 1):
print(f" {i}. {filename}")
if auto_update:
print("\nAuto-updating database with all student data CSV files...")
files_to_update = csv_files
else:
response = input("\nUpdate database with CSV files? (yes/no): ").lower()
if response not in ['yes', 'y', 'да']:
print("Skipping database update.")
return
print(f"\n0. Update all files")
try:
selection = input(f"\nSelect file(s) to update (0 for all, or comma-separated numbers like 1,2,3): ")
if selection.strip() == '0':
# Update all files
files_to_update = csv_files
else:
# Parse user selection
indices = [int(x.strip()) - 1 for x in selection.split(',')]
files_to_update = [csv_files[i] for i in indices if 0 <= i < len(csv_files)]
if not files_to_update:
print("No valid selections made.")
return
except ValueError:
print("Invalid input. Please enter numbers separated by commas or '0' for all files.")
return
# Populate the periods and days tables first
self.populate_periods_table()
print(f"\nUpdating database with {len(files_to_update)} file(s):")
for filename in files_to_update:
print(f" - {filename}")
csv_path = os.path.join(sample_data_dir, filename)
print(f"Processing {csv_path}...")
self.process_csv_with_teacher_mapping(csv_path)
# Update homeroom teachers from the dedicated CSV
self.update_homeroom_teachers_from_csv()
print("Database updated successfully with selected CSV data.")
def update_homeroom_teachers_from_csv(self):
"""Update homeroom teachers from the dedicated CSV file"""
# Updated path to look in the parent directory
homeroom_csv_path = "../sample_data/Homeroom_teachers.csv"
if not os.path.exists(homeroom_csv_path):
print(f"Homeroom teachers file '{homeroom_csv_path}' not found. Trying local directory...")
homeroom_csv_path = "sample_data/Homeroom_teachers.csv"
if not os.path.exists(homeroom_csv_path):
print(f"Homeroom teachers file '{homeroom_csv_path}' not found.")
return
with open(homeroom_csv_path, 'r', encoding='utf-8') as file:
reader = csv.DictReader(file)
for row in reader:
# Normalize the class name to handle Cyrillic/Latin differences
normalized_class = self.normalize_class_name(row['Class'])
self.cursor.execute("""
INSERT OR REPLACE INTO homeroom_teachers
(class_name, teacher_name, classroom, parent_meeting_room, internal_number, mobile_number)
VALUES (?, ?, ?, ?, ?, ?)
""", (
normalized_class,
row['Homeroom Teacher'],
row['Classroom'],
row['Parent Meeting Room'],
row['Internal Number'],
row['Mobile Number']
))
self.conn.commit()
print("Homeroom teachers updated successfully.")
def process_csv_with_teacher_mapping(self, csv_file):
"""Process CSV with teacher-subject mapping based on positional order"""
if not os.path.exists(csv_file):
return False
with open(csv_file, 'r', encoding='utf-8') as file:
reader = csv.reader(file)
rows = list(reader)
# Identify header row - look for the row containing "ФИО" (full name) or similar indicators
header_idx = None
for i, row in enumerate(rows):
for cell in row:
if "ФИО" in str(cell) or "фио" in str(cell).lower() or "Ф.И.О." in str(cell) or "ф.и.о." in str(cell):
header_idx = i
break
if header_idx is not None:
break
if header_idx is None:
# Check if this file contains class and name columns that identify it as a student data file
# Even if the header doesn't contain ФИО, we might still be able to identify student data
has_class_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
has_name_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
if has_class_indicators and has_name_indicators:
# Try to find the header row by looking for class and name indicators
for i, row in enumerate(rows):
if any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class']) and \
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname']):
header_idx = i
break
if header_idx is None:
print(f"Skipping {csv_file} - does not appear to be student data with ФИО/class columns")
return False
# Build a mapping of subject names in the header row
header_row = rows[header_idx]
header_subjects = {}
for col_idx, subject_name in enumerate(header_row):
subject_name = str(subject_name).strip()
if (subject_name and
subject_name.lower() not in ['ф.и.о.', 'фио', 'класс', 'номер', 'сортировка', 'шкaфчика', 'локера'] and
subject_name.strip() != "" and
"ф.и.о" not in subject_name.lower() and
"сортировка" not in subject_name.lower() and
"номер" not in subject_name.lower() and
"" not in subject_name):
header_subjects[col_idx] = subject_name # Map column index to subject name
# IMPROVED TEACHER-SUBJECT MAPPING: Extract teacher-subject pairs from the first rows
# Match base subjects to teachers and then map to header subjects
base_subject_teacher_map = {}
# Look through the first rows to find teacher-subject pairs
for i in range(min(len(rows), header_idx)): # Only go up to header row
current_row = rows[i]
# Process the row in pairs of (subject, teacher, group_info) pattern
j = 0
while j < len(current_row) - 1:
subject_cell = current_row[j].strip() if j < len(current_row) else ""
teacher_cell = current_row[j + 1].strip() if j + 1 < len(current_row) else ""
group_cell = current_row[j + 2].strip() if j + 2 < len(current_row) else ""
# Check if the first cell is a subject, the second is a teacher, and the third is a group
if (subject_cell and self._is_likely_subject_name_simple(subject_cell) and
teacher_cell and self._is_likely_teacher_name_enhanced(teacher_cell) and
group_cell and self._is_likely_group_identifier(group_cell)):
# Add to the base subject teacher map (if multiple teachers for same subject, store all)
if subject_cell not in base_subject_teacher_map:
base_subject_teacher_map[subject_cell] = []
if teacher_cell not in base_subject_teacher_map[subject_cell]:
base_subject_teacher_map[subject_cell].append(teacher_cell)
# Move to the next potential triplet (subject, teacher, group_info)
j += 3 # Skip subject, teacher, and group info
# Also check the row immediately before the header row for additional teacher-subject pairs
if header_idx > 0:
prev_row = rows[header_idx - 1]
j = 0
while j < len(prev_row) - 1:
subject_cell = prev_row[j].strip() if j < len(prev_row) else ""
teacher_cell = prev_row[j + 1].strip() if j + 1 < len(prev_row) else ""
group_cell = prev_row[j + 2].strip() if j + 2 < len(prev_row) else ""
# Check if the first cell is a subject, the second is a teacher, and the third is a group
if (subject_cell and self._is_likely_subject_name_simple(subject_cell) and
teacher_cell and self._is_likely_teacher_name_enhanced(teacher_cell) and
group_cell and self._is_likely_group_identifier(group_cell)):
# Add to the base subject teacher map (if multiple teachers for same subject, store all)
if subject_cell not in base_subject_teacher_map:
base_subject_teacher_map[subject_cell] = []
if teacher_cell not in base_subject_teacher_map[subject_cell]:
base_subject_teacher_map[subject_cell].append(teacher_cell)
# Move to the next potential triplet (subject, teacher, group_info)
j += 3 # Skip subject, teacher, and group info
# Now map the header subjects to the teachers using base subject matching
teacher_subject_map = {}
for col_idx, header_subject in header_subjects.items():
# Find the base subject that corresponds to this header subject
base_subject = self._find_base_subject(header_subject, base_subject_teacher_map.keys())
if base_subject and base_subject in base_subject_teacher_map:
# Use the first teacher for this base subject
teacher_subject_map[header_subject] = base_subject_teacher_map[base_subject][0]
# Process each student row
for student_row in rows[header_idx + 1:]:
# Determine the structure dynamically based on the header
class_col_idx = None
name_col_idx = None
# Find the index of the class column (usually called "Класс")
for idx, header in enumerate(header_row):
if "Класс" in str(header) or "класс" in str(header) or "Class" in str(header) or "class" in str(header).lower():
class_col_idx = idx
break
# Find the index of the name column (usually called "ФИО")
for idx, header in enumerate(header_row):
if "ФИО" in str(header) or "ф.и.о." in str(header).lower() or "name" in str(header).lower():
name_col_idx = idx
break
# If we couldn't find the columns properly, skip this row
if class_col_idx is None or name_col_idx is None:
continue
# Check if this row has valid data in the expected columns
if (len(student_row) > max(class_col_idx, name_col_idx) and
student_row[class_col_idx].strip() and # class name exists
student_row[name_col_idx].strip() and # student name exists
self._is_valid_student_record_by_cols(student_row, class_col_idx, name_col_idx)):
name = student_row[name_col_idx].strip() # Name column
class_name = student_row[class_col_idx].strip() # Class column
# Normalize the class name to handle Cyrillic/Latin differences
normalized_class = self.normalize_class_name(class_name)
# Insert student into the database (using INSERT OR REPLACE to prevent duplicates)
self.cursor.execute(
"INSERT OR REPLACE INTO students (class_name, full_name) VALUES (?, ?)",
(normalized_class, name)
)
# Get the student_id for this student
self.cursor.execute("SELECT student_id FROM students WHERE full_name = ? AND class_name = ?", (name, normalized_class))
student_id_result = self.cursor.fetchone()
if student_id_result is None:
continue
student_id = student_id_result[0]
# Process schedule data for this student
# Go through each column to find subject and group info
for col_idx, cell_value in enumerate(student_row):
if cell_value and col_idx < len(header_row):
# Get the subject from the header
subject_header = header_row[col_idx] if col_idx < len(header_row) else ""
# Skip columns that don't contain schedule information
if (col_idx == 0 or col_idx == 1 or col_idx == 2 or col_idx == class_col_idx or col_idx == name_col_idx or # skip metadata cols
"сортировка" in subject_header.lower() or
"номер" in subject_header.lower() or
"шкaфчика" in subject_header.lower() or
"локера" in subject_header.lower()):
continue
# Extract group information from the cell
group_assignment = cell_value.strip()
if group_assignment and group_assignment.lower() != "nan" and group_assignment != "-" and group_assignment != "":
# Find the teacher associated with this subject
subject_name = str(subject_header).strip()
teacher_name = teacher_subject_map.get(subject_name, f"Default Teacher for {subject_name}")
# Insert the entities into their respective tables first
# Then get their IDs to create the schedule entry
self._process_schedule_entry_with_teacher_mapping(
student_id, group_assignment, subject_name, teacher_name
)
self.conn.commit()
return True
def _find_base_subject(self, header_subject, base_subjects):
"""Find the base subject that corresponds to a header subject"""
header_lower = header_subject.lower()
# Check for direct matches first
for base_subject in base_subjects:
if base_subject.lower() in header_lower or header_lower in base_subject.lower():
return base_subject
# Check for partial matches with common patterns
for base_subject in base_subjects:
# Remove common suffixes from header subject and try to match
simplified_header = header_lower.replace(" 1 модуль", "").replace(" 2 модуль", "") \
.replace(" 1,2 модуль", "").replace(" 1 мод.", "").replace(" 2 мод.", "") \
.replace(" / ", " ").replace(" ", " ")
simplified_base = base_subject.lower().replace(" / ", " ").replace(" ", " ")
if simplified_base in simplified_header or simplified_header in simplified_base:
return base_subject
return None
def _is_likely_subject_name_simple(self, text):
"""Simple check if the text is likely a subject name"""
if not text or len(text.strip()) < 2:
return False
text = text.strip().lower()
# Common subject indicators in Russian and English
subject_indicators = [
'технотрек', 'матем', 'информ', 'англ', 'русск', 'физика', 'химия', 'биол', 'история',
'общество', 'география', 'литер', 'физкульт', 'лидерство',
'спорт. клуб', 'орксэ', 'китайск', 'немецк', 'француз', 'speaking club', 'maths',
'ict', 'geography', 'physics', 'robotics', 'culinary', 'science', 'ai core', 'vr/ar',
'cybersafety', 'business', 'design', 'prototype', 'mediacom', 'robotics track',
'culinary track', 'science track', 'ai core track', 'vr/ar track', 'cybersafety track',
'programming', 'algorithm', 'logic', 'pe', 'sports', 'swimming', 'fitness', 'gymnastics',
'climbing', 'games', 'art', 'music', 'dance', 'karate', 'judo', 'chess', 'leadership',
'алгоритмика', 'робототехника', 'программирование', 'математика', 'информатика', 'орксэ',
'английский', 'русский', 'физическая культура', 'орксэ', 'изо', 'алгебра', 'геометрия',
'астрономия', 'экология', 'астрономия', 'иностранный', 'ит', 'computer science', 'informatics'
]
# Check if text contains any of the subject indicators
for indicator in subject_indicators:
if indicator in text:
return True
return False
def _is_likely_subject_name(self, text):
"""Check if the text is likely a subject name"""
if not text or len(text.strip()) < 2:
return False
text = text.strip()
# Common subject indicators in Russian and English
subject_indicators = [
'Матем.', 'Информ.', 'Англ.яз', 'Русск.яз', 'Физика', 'Химия', 'Биол', 'История',
'Общество', 'География', 'Литер', 'Физкульт', 'Технотрек', 'Лидерство',
'Спорт. клуб', 'ОРКСЭ', 'Китайск', 'Немецк', 'Француз', 'Speaking club', 'Maths',
'ICT', 'Geography', 'Physics', 'Robotics', 'Culinary', 'Science', 'AI Core', 'VR/AR',
'CyberSafety', 'Business', 'Design', 'Prototype', 'MediaCom', 'Science', 'Robotics',
'Culinary', 'AI Core', 'VR/AR', 'CyberSafety', 'Business', 'Design', 'Prototype',
'MediaCom', 'Robotics Track', 'Culinary Track', 'Science Track', 'AI Core Track',
'VR/AR Track', 'CyberSafety Track', 'Business Track', 'Design Track', 'Prototype Track',
'MediaCom Track', 'Math', 'Algebra', 'Geometry', 'Calculus', 'Statistics', 'Coding',
'Programming', 'Algorithm', 'Logic', 'Robotics', 'Physical Education', 'PE', 'Sports',
'Swimming', 'Fitness', 'Gymnastics', 'Climbing', 'Games', 'Art', 'Music', 'Dance',
'Karate', 'Judo', 'Martial Arts', 'Chess', 'Leadership', 'Entrepreneurship',
'Технотрек 1 модуль', 'Технотрек 2 модуль', 'ОРКСЭ 1,2 модуль', 'Математика 1 модуль',
'Математика 2 модуль', 'Программирование', 'Алгоритмика и логика', 'Лидерство',
'Робототехника', 'Physical Education 1,2 модуль', 'Английский 1 модуль', 'Английский 2 модуль',
'Англ.яз', 'Русск.яз', 'Информ.', 'Матем.', 'Физика', 'Химия', 'Биология', 'История',
'Обществознание', 'География', 'Литература', 'Физическая культура', 'ОРКСЭ', 'ИЗО',
'Китайский', 'Немецкий', 'Французский', 'Алгебра', 'Геометрия', 'Астрономия', 'Экология'
]
# Check if text matches any of the subject indicators
for indicator in subject_indicators:
if indicator.lower() in text.lower():
return True
# Check if the text contains common subject-related keywords
common_keywords = ['модуль', 'track', 'club', 'group', 'class', 'lesson', 'subject', 'module', 'яз', 'язык']
for keyword in common_keywords:
if keyword in text.lower():
return True
# Check if text contains specific patterns that indicate it's a subject
subject_patterns = [
r'.*[Tt]rack.*', # Track identifiers
r'.*[Mm]odule.*', # Module identifiers
r'.*[Cc]lub.*', # Club identifiers
r'.*[Ss]ubject.*', # Subject identifiers
r'.*[Cc]lass.*', # Class identifiers
r'.*[Ll]esson.*', # Lesson identifiers
]
for pattern in subject_patterns:
if re.search(pattern, text):
return True
return False
def _is_valid_student_record_by_cols(self, row, class_col_idx, name_col_idx):
"""Check if a row represents a valid student record based on specific columns"""
# A valid student record should have:
# - Non-empty class name in the class column
# - Non-empty student name in the name column
if len(row) <= max(class_col_idx, name_col_idx):
return False
class_name = row[class_col_idx].strip() if len(row) > class_col_idx else ""
student_name = row[name_col_idx].strip() if len(row) > name_col_idx else ""
# Check if the class name looks like an actual class (contains a number followed by a letter)
class_pattern = r'^\d+[А-ЯA-Z]$' # e.g., 6А, 11А, 4B
if re.match(class_pattern, class_name):
return bool(student_name and student_name != class_name) # Ensure name exists and is different from class
# If not matching class pattern, check if the name field is not just another class-like value
name_pattern = r'^\d+[А-ЯA-Z]$' # This would indicate it's probably a class, not a name
if re.match(name_pattern, student_name):
return False # This row has a class in the name field, so not valid
return bool(class_name and student_name and class_name != student_name)
def _process_schedule_entry_with_teacher_mapping(self, student_id, group_info, subject_info, teacher_name):
"""Process individual schedule entries with explicit teacher mapping and insert into normalized tables"""
# Clean up the inputs
subject_name = subject_info.strip() if subject_info.strip() else "General Class"
group_assignment = group_info.strip()
# Only proceed if we have valid data
if subject_name and group_assignment and group_assignment.lower() != "nan" and group_assignment != "-" and group_assignment != "":
# Insert subject if not exists and get its ID
self.cursor.execute("INSERT OR IGNORE INTO subjects (name) VALUES (?)", (subject_name,))
self.cursor.execute("SELECT subject_id FROM subjects WHERE name = ?", (subject_name,))
subject_id = self.cursor.fetchone()[0]
# Insert teacher if not exists and get its ID
# Use the teacher name as is, without default creation if not found
self.cursor.execute("INSERT OR IGNORE INTO teachers (name) VALUES (?)", (teacher_name,))
self.cursor.execute("SELECT teacher_id FROM teachers WHERE name = ?", (teacher_name,))
teacher_result = self.cursor.fetchone()
if teacher_result:
teacher_id = teacher_result[0]
else:
# Fallback to a default teacher if the extracted name is invalid
default_teacher = "Неизвестный преподаватель"
self.cursor.execute("INSERT OR IGNORE INTO teachers (name) VALUES (?)", (default_teacher,))
self.cursor.execute("SELECT teacher_id FROM teachers WHERE name = ?", (default_teacher,))
teacher_id = self.cursor.fetchone()[0]
# Use a default day for now (in a real system, we'd extract this from the schedule)
# For now, we'll randomly assign to a day of the week
import random
days_list = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
selected_day = random.choice(days_list)
self.cursor.execute("INSERT OR IGNORE INTO days (name) VALUES (?)", (selected_day,))
self.cursor.execute("SELECT day_id FROM days WHERE name = ?", (selected_day,))
day_id = self.cursor.fetchone()[0]
# Use a default period - for now we'll use period 1, but in a real system
# we would need to extract this from the CSV if available
self.cursor.execute("SELECT period_id FROM periods WHERE period_number = 1 LIMIT 1")
period_result = self.cursor.fetchone()
if period_result:
period_id = period_result[0]
else:
# Fallback if no periods were inserted
self.cursor.execute("SELECT period_id FROM periods LIMIT 1")
period_id = self.cursor.fetchone()[0]
# Clean the group name to separate it from student data
group_name = self._clean_group_name(group_assignment)
self.cursor.execute("INSERT OR IGNORE INTO groups (name) VALUES (?)", (group_name,))
self.cursor.execute("SELECT group_id FROM groups WHERE name = ?", (group_name,))
group_id = self.cursor.fetchone()[0]
# Insert the schedule entry
self.cursor.execute("""
INSERT OR IGNORE INTO schedule (student_id, subject_id, teacher_id, day_id, period_id, group_id)
VALUES (?, ?, ?, ?, ?, ?)
""", (student_id, subject_id, teacher_id, day_id, period_id, group_id))
def _clean_group_name(self, raw_group_data):
"""Extract clean group name from potentially mixed student/group data"""
# Remove potential student names from the group data
# Group names typically contain numbers, class identifiers, or specific activity names
cleaned = raw_group_data.strip()
# If the group data looks like it contains a student name pattern,
# we'll try to extract just the group identifier part
if re.match(r'^\d+[А-ЯA-Z]', cleaned):
# This looks like a class designation, return as is
return cleaned
# If the group data contains common group indicators, return as is
group_indicators = ['кл', 'class', 'club', 'track', 'group', 'module', '-']
if any(indicator in cleaned.lower() for indicator in group_indicators):
return cleaned
# If the group data looks like a subject-identifier pattern, return as is
subject_indicators = ['ICT', 'English', 'Math', 'Physics', 'Chemistry', 'Biology', 'Science']
if any(indicator in cleaned for indicator in subject_indicators):
return cleaned
# If none of the above conditions match, return a generic group name
return f"Group_{hash(cleaned) % 10000}"
def _is_likely_teacher_name(self, text):
"""Check if the text is likely to be a teacher name"""
if not text or len(text.strip()) < 5: # Require minimum length for a name
return False
text = text.strip()
# Common non-name values that appear in the CSV
common_non_names = ['-', 'nan', 'нет', 'нету', 'отсутствует', 'учитель', 'teacher', '', 'Е4 Е5', 'E4 E5', 'группа', 'group', 'каб.', 'гр.', 'фитнес', 'каб', 'все группы', '1 группа', '2 группа', 'Е1', 'Е2', 'Е3', 'Е4', 'Е5', 'Е6', 'Е1 Е2', 'Е4 Е5', 'E1', 'E2', 'E3', 'E4', 'E5', 'E6', 'гр 1', 'гр 2']
if text.lower() in common_non_names:
return False
# Exclusion patterns for non-teacher entries
exclusion_patterns = [
r'^[А-ЯЁ]\d+\s+[А-ЯЁ]\d+$', # E4 E5 pattern
r'^[A-Z]\d+\s+[A-Z]\d+$', # English groups
r'.*[Tt]rack.*', # Track identifiers
r'.*[Gg]roup.*', # Group identifiers
r'.*\d+[А-ЯA-Z]\d*$', # Number-letter combos
r'^[А-ЯЁA-Z].*\d+', # Text ending with digits
r'.*[Cc]lub.*', # Club identifiers
r'.*[Rr]oom.*', # Room identifiers
r'.*[Cc]lass.*', # Class identifiers
r'.*[Pp]eriod.*', # Period identifiers
r'^\d+$', # Just numbers
r'^[А-ЯЁA-Z]*$', # All caps words
r'^[А-ЯЁA-Z\s\d]+$', # Caps words and numbers (likely room numbers)
r'^[ЕеEe][\d\s,]+$' # Room identifiers like E1, E2, etc.
]
for pattern in exclusion_patterns:
if re.match(pattern, text, re.IGNORECASE):
return False
# Positive patterns for teacher names
teacher_patterns = [
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ]\.\s*[А-ЯЁ]\.$', # Иванов А.А.
r'^[А-ЯЁ]\.\s*[А-ЯЁ]\.\s+[А-ЯЁ][а-яё]+$', # А.А. Иванов
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+$', # Full name
r'^[A-Z][a-z]+\s+[A-Z][a-z]+$', # John Smith
r'^[A-Z][a-z]+\s+[A-Z]\.\s*[A-Z]\.$', # Smith J.J.
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+$', # Russian names without patronymic
r'^[A-Z][a-z]+\s+[A-Z]\.\s*[A-Z]\.$', # Initials format
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+', # Names without periods
]
for pattern in teacher_patterns:
if re.match(pattern, text.strip()):
return True
# Additional check: if it looks like a proper name (with capital letters and min length)
# and doesn't match exclusion patterns
name_parts = text.split()
if len(name_parts) >= 2:
# At least two parts (first name + last name)
# Check if they start with capital letters
if all(part[0].isupper() for part in name_parts if len(part) > 1):
# Additional check: make sure it's not just a title or other text
common_titles = ['Mr', 'Mrs', 'Ms', 'Dr', 'Prof', 'Teacher', 'Instructor', 'Coach']
if any(title in text for title in common_titles):
return False
return True
return False
def _is_likely_subject_label(self, text):
"""Check if text is likely a subject label like 'Матем.', 'Информ.', 'Англ.яз', etc."""
if not text or len(text) < 2:
return False
# Common Russian abbreviations for subjects
subject_patterns = [
'Матем.', 'Информ.', 'Англ.яз', 'Русск.яз', 'Физика', 'Химия', 'Биол', 'История',
'Общество', 'География', 'Литер', 'Физкульт', 'Технотрек', 'Лидерство',
'Спорт. клуб', 'ОРКСЭ', 'Китайск', 'Немецк', 'Француз', 'Speaking club', 'Maths',
'ICT', 'Geography', 'Physics', 'Robotics', 'Culinary', 'Science', 'AI Core', 'VR/AR',
'CyberSafety', 'Business', 'Design', 'Prototype', 'MediaCom', 'Science', 'Robotics',
'Culinary', 'AI Core', 'VR/AR', 'CyberSafety', 'Business', 'Design', 'Prototype',
'MediaCom', 'Robotics Track', 'Culinary Track', 'Science Track', 'AI Core Track',
'VR/AR Track', 'CyberSafety Track', 'Business Track', 'Design Track', 'Prototype Track',
'MediaCom Track', 'Math', 'Algebra', 'Geometry', 'Calculus', 'Statistics', 'Coding',
'Programming', 'Algorithm', 'Logic', 'Robotics', 'Physical Education', 'PE', 'Sports',
'Swimming', 'Fitness', 'Gymnastics', 'Climbing', 'Games', 'Art', 'Music', 'Dance',
'Karate', 'Judo', 'Martial Arts', 'Chess', 'Leadership', 'Entrepreneurship'
]
text_clean = text.strip().lower()
for pattern in subject_patterns:
if pattern.lower() in text_clean:
return True
# Also check for specific subject names found in the data
specific_subjects = ['матем.', 'информ.', 'англ.яз', 'русск.яз', 'каб.', 'business', 'maths',
'speaking', 'ict', 'geography', 'physics', 'robotics', 'science', 'ai core',
'vr/ar', 'cybersafety', 'design', 'prototype', 'mediacom', 'culinary',
'physical education', 'pe', 'sports', 'swimming', 'fitness', 'gymnastics',
'climbing', 'games', 'art', 'music', 'dance', 'karate', 'chess', 'leadership']
for subj in specific_subjects:
if subj in text_clean:
return True
return False
def _find_matching_subject_in_header_from_list(self, subject_label, header_subjects, header_row):
"""Find the matching full subject name in the header based on the label"""
if not subject_label:
return None
# Look for the best match in the header subjects
subject_label_lower = subject_label.lower().replace('.', '').replace('яз', 'язык')
# Direct match first
for col_idx, full_subj in header_subjects:
if subject_label_lower in full_subj.lower() or full_subj.lower() in subject_label_lower:
return full_subj
# If no direct match, try to find by partial matching in the whole header row
for i, header_item in enumerate(header_row):
if subject_label_lower in str(header_item).lower() or str(header_item).lower() in subject_label_lower:
return str(header_item).strip()
# Try more general matching - if label contains common abbreviations
for col_idx, full_subj in header_subjects:
full_lower = full_subj.lower()
if ('матем' in subject_label_lower and 'матем' in full_lower) or \
('информ' in subject_label_lower and 'информ' in full_lower) or \
('англ' in subject_label_lower and 'англ' in full_lower) or \
('русск' in subject_label_lower and 'русск' in full_lower) or \
('физик' in subject_label_lower and 'физик' in full_lower) or \
('хим' in subject_label_lower and 'хим' in full_lower) or \
('биол' in subject_label_lower and 'биол' in full_lower) or \
('истор' in subject_label_lower and 'истор' in full_lower) or \
('общ' in subject_label_lower and 'общ' in full_lower) or \
('географ' in subject_label_lower and 'географ' in full_lower):
return full_subj
return None
def find_student(self, name_query):
"""Search for students by name"""
self.cursor.execute("""
SELECT s.full_name, s.class_name
FROM students s
WHERE s.full_name LIKE ?
LIMIT 10
""", (f'%{name_query}%',))
return self.cursor.fetchall()
def get_current_class(self, student_name, current_day, current_time):
"""Find student's current class"""
self.cursor.execute("""
SELECT sub.name, t.name, p.start_time, p.end_time
FROM schedule sch
JOIN students s ON sch.student_id = s.student_id
JOIN subjects sub ON sch.subject_id = sub.subject_id
JOIN teachers t ON sch.teacher_id = t.teacher_id
JOIN days d ON sch.day_id = d.day_id
JOIN periods p ON sch.period_id = p.period_id
JOIN groups g ON sch.group_id = g.group_id
WHERE s.full_name = ?
AND d.name = ?
AND p.start_time <= ?
AND p.end_time >= ?
""", (student_name, current_day, current_time, current_time))
return self.cursor.fetchone()
def get_student_schedule(self, student_name):
"""Get full schedule for a student"""
self.cursor.execute("""
SELECT sub.name, t.name, p.start_time, p.end_time, g.name
FROM schedule sch
JOIN students s ON sch.student_id = s.student_id
JOIN subjects sub ON sch.subject_id = sub.subject_id
JOIN teachers t ON sch.teacher_id = t.teacher_id
JOIN periods p ON sch.period_id = p.period_id
JOIN groups g ON sch.group_id = g.group_id
WHERE s.full_name = ?
ORDER BY p.period_number
""", (student_name,))
return self.cursor.fetchall()
def _is_likely_teacher_name_enhanced(self, text):
"""Enhanced check if the text is likely to be a teacher name"""
if not text or len(text.strip()) < 5: # Require minimum length for a name
return False
text = text.strip()
# Common non-name values that appear in the CSV
common_non_names = ['-', 'nan', 'нет', 'нету', 'отсутствует', 'учитель', 'teacher', '', 'Е4 Е5', 'E4 E5', 'группа', 'group', 'каб.', 'гр.', 'фитнес', 'каб', 'все группы', '1 группа', '2 группа', 'Е1', 'Е2', 'Е3', 'Е4', 'Е5', 'Е6', 'Е1 Е2', 'Е4 Е5', 'E1', 'E2', 'E3', 'E4', 'E5', 'E6', 'гр 1', 'гр 2']
if text.lower() in common_non_names:
return False
# Exclusion patterns for non-teacher entries
exclusion_patterns = [
r'^[А-ЯЁ]\d+\s+[А-ЯЁ]\d+$', # E4 E5 pattern
r'^[A-Z]\d+\s+[A-Z]\d+$', # English groups
r'.*[Tt]rack.*', # Track identifiers
r'.*[Gg]roup.*', # Group identifiers
r'.*\d+[А-ЯA-Z]\d*$', # Number-letter combos
r'^[А-ЯЁA-Z].*\d+', # Text ending with digits
r'.*[Cc]lub.*', # Club identifiers
r'.*[Rr]oom.*', # Room identifiers
r'.*[Cc]lass.*', # Class identifiers
r'.*[Pp]eriod.*', # Period identifiers
r'^\d+$', # Just numbers
r'^[А-ЯЁA-Z]*$', # All caps words
r'^[А-ЯЁA-Z\s\d]+$', # Caps words and numbers (likely room numbers)
r'^[ЕеEe][\d\s,]+$', # Room identifiers like E1, E2, etc.
]
for pattern in exclusion_patterns:
if re.match(pattern, text, re.IGNORECASE):
return False
# Check if it looks like a name with multiple capitalized words (Russian or English)
# Teacher names typically have 2-4 words with capitalized first letters
words = text.split()
if len(words) < 2 or len(words) > 4:
return False
# Check if most words start with capital letters (allowing for exceptions like "van", "de", etc.)
capital_words = 0
for word in words:
# Skip common particles that are lowercase in names
if word in ['van', 'von', 'de', 'di', 'le', 'la', 'du', 'del', 'da', 'и', 'на', 'де']:
capital_words += 1
elif word[0].isupper() and len(word) > 1:
capital_words += 1
# At least n-1 words should be capitalized (for n-word names)
if capital_words < len(words) - 1:
return False
# Additional check: if it looks like a proper name (with capital letters and min length)
# and doesn't match exclusion patterns
name_parts = text.split()
if len(name_parts) >= 2:
# At least two parts (first name + last name)
# Check if they start with capital letters
if all(part[0].isupper() for part in name_parts if len(part) > 1):
# Additional check: make sure it's not just a title or other text
common_titles = ['Mr', 'Mrs', 'Ms', 'Dr', 'Prof', 'Teacher', 'Instructor', 'Coach']
if any(title in text for title in common_titles):
return False
return True
return False
def _is_likely_group_identifier(self, text):
"""Check if text is likely a group identifier like 'E1', 'E2', 'гр 1', etc."""
if not text:
return False
text = text.strip()
# Common group identifiers
group_patterns = [
r'^[Ee]\d+', # E1, E2, etc.
r'^[Ee]\d+\s*[Ee]\d+', # E1 E2, E4 E5, etc.
r'^(гр|group|группа).*', # "гр 1", "group 1", etc.
r'^[А-ЯA-Z]\d+', # A1, B2, etc.
r'^[А-ЯA-Z]\d+\s+[А-ЯA-Z]\d+', # A1 B2, etc.
r'^(все группы|all groups).*', # "все группы", etc.
r'^\d+\s*(группа|class).*', # "1 группа", etc.
r'^(1|2)\s*(группа|group)', # "1 группа", "2 group", etc.
]
for pattern in group_patterns:
if re.match(pattern, text, re.IGNORECASE):
return True
# Additional common group indicators
common_groups = ['E1 E2', 'E3 E4', 'E5 E6', 'E1', 'E2', 'E3', 'E4', 'E5', 'E6',
'1 группа', '2 группа', 'все группы', 'гр 1', 'гр 2', 'all groups',
'group 1', 'group 2', 'A1', 'B1', 'C1', '4A', '4B', '4C', '4ABC']
return text in common_groups

View File

@@ -0,0 +1,247 @@
def process_csv_with_teacher_mapping(self, csv_file):
"""Process CSV with teacher-subject mapping based on positional order"""
if not os.path.exists(csv_file):
return False
with open(csv_file, 'r', encoding='utf-8') as file:
reader = csv.reader(file)
rows = list(reader)
# Identify header row - look for the row containing "ФИО" (full name) or similar indicators
header_idx = None
for i, row in enumerate(rows):
for cell in row:
if "ФИО" in str(cell) or "фио" in str(cell).lower() or "Ф.И.О." in str(cell) or "ф.и.о." in str(cell):
header_idx = i
break
if header_idx is not None:
break
if header_idx is None:
# Check if this file contains class and name columns that identify it as a student data file
# Even if the header doesn't contain ФИО, we might still be able to identify student data
has_class_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
has_name_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
if has_class_indicators and has_name_indicators:
# Try to find the header row by looking for class and name indicators
for i, row in enumerate(rows):
if any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class']) and \
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname']):
header_idx = i
break
if header_idx is None:
print(f"Skipping {csv_file} - does not appear to be student data with ФИО/class columns")
return False
# Find teacher-subject mappings in the first 0-15 rows before the header
teacher_subject_map = {}
# Build a mapping of subject names in the header row
header_row = rows[header_idx]
header_subjects = {}
for col_idx, subject_name in enumerate(header_row):
subject_name = str(subject_name).strip()
if (subject_name and
subject_name.lower() not in ['ф.и.о.', 'фио', 'класс', 'номер', 'сортировка', 'шкафчика', 'локера'] and
subject_name.strip() != "" and
"ф.и.о" not in subject_name.lower() and
"сортировка" not in subject_name.lower() and
"номер" not in subject_name.lower() and
"" not in subject_name):
header_subjects[col_idx] = subject_name # Map column index to subject name
# First, try to find teachers in the rows before the header
for i in range(min(15, header_idx)): # Check first 15 rows before header
current_row = rows[i]
# Process all cells in the row to find teacher names and their adjacent context
for j, cell_value in enumerate(current_row):
cell_str = str(cell_value).strip()
# Check if this cell is a likely teacher name
if self._is_likely_teacher_name(cell_str):
# Look for context on the left (department) and right (subject)
left_context = ""
right_context = ""
# Get left neighbor (department)
if j > 0 and j-1 < len(current_row):
left_context = str(current_row[j-1]).strip()
# Get right neighbor (subject)
if j < len(current_row) - 1:
right_context = str(current_row[j+1]).strip()
# Try to determine the subject based on adjacency
matched_subject = None
# First priority: right neighbor if it matches a subject in the header
if right_context and j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# Second priority: use left context if it semantically relates to a teacher
elif left_context and any(keyword in left_context.lower() for keyword in ['учитель', 'teacher', 'кафедра', 'department']):
# If left context indicates a department, look for subject to the right of teacher
if j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# If no subject to the right, try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Third priority: try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if matched_subject and (matched_subject not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(matched_subject, '')):
teacher_subject_map[matched_subject] = cell_str
# If the cell contains multiple names (separated by newlines), process each separately
elif '\n' in cell_str or '\\n' in cell_str:
cell_parts = [part.strip() for part in cell_str.replace('\\n', '\n').split('\n') if part.strip()]
for part in cell_parts:
if self._is_likely_teacher_name(part):
# Look for context on the left (department) and right (subject)
left_context = ""
right_context = ""
# Get left neighbor (department)
if j > 0 and j-1 < len(current_row):
left_context = str(current_row[j-1]).strip()
# Get right neighbor (subject)
if j < len(current_row) - 1:
right_context = str(current_row[j+1]).strip()
# Try to determine the subject based on adjacency
matched_subject = None
# First priority: right neighbor if it matches a subject in the header
if right_context and j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# Second priority: use left context if it semantically relates to a teacher
elif left_context and any(keyword in left_context.lower() for keyword in ['учитель', 'teacher', 'кафедра', 'department']):
# If left context indicates a department, look for subject to the right of teacher
if j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# If no subject to the right, try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Third priority: try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if matched_subject and (matched_subject not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(matched_subject, '')):
teacher_subject_map[matched_subject] = part
# Additional teacher-subject mapping: scan the rows immediately before the header for teacher names in subject columns
# In many CSV files, teacher names appear in the same rows as subject headers
for i in range(max(0, header_idx - 5), header_idx): # Check 5 rows before header
current_row = rows[i]
for j, cell_value in enumerate(current_row):
cell_str = str(cell_value).strip()
# If cell contains a likely teacher name and corresponds to a subject column
if self._is_likely_teacher_name(cell_str) and j in header_subjects:
subject_name = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if (subject_name not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(subject_name, '')):
teacher_subject_map[subject_name] = cell_str
# Additional validation: Remove any teacher-subject mappings that seem incorrect
validated_teacher_subject_map = {}
for subject, teacher in teacher_subject_map.items():
# Only add to validated map if teacher name passes all checks
if self._is_likely_teacher_name(teacher):
validated_teacher_subject_map[subject] = teacher
else:
print(f"Warning: Invalid teacher name '{teacher}' detected for subject '{subject}', skipping...")
teacher_subject_map = validated_teacher_subject_map
# Process each student row
for student_row in rows[header_idx + 1:]:
# Determine the structure dynamically based on the header
class_col_idx = None
name_col_idx = None
# Find the index of the class column (usually called "Класс")
for idx, header in enumerate(header_row):
if "Класс" in str(header) or "класс" in str(header) or "Class" in str(header) or "class" in str(header):
class_col_idx = idx
break
# Find the index of the name column (usually called "ФИО")
for idx, header in enumerate(header_row):
if "ФИО" in str(header) or "ф.и.о." in str(header).lower() or "name" in str(header).lower():
name_col_idx = idx
break
# If we couldn't find the columns properly, skip this row
if class_col_idx is None or name_col_idx is None:
continue
# Check if this row has valid data in the expected columns
if (len(student_row) > max(class_col_idx, name_col_idx) and
student_row[class_col_idx].strip() and # class name exists
student_row[name_col_idx].strip() and # student name exists
self._is_valid_student_record_by_cols(student_row, class_col_idx, name_col_idx)):
name = student_row[name_col_idx].strip() # Name column
class_name = student_row[class_col_idx].strip() # Class column
# Insert student into the database
self.cursor.execute(
"INSERT OR IGNORE INTO students (class_name, full_name) VALUES (?, ?)",
(class_name, name)
)
# Get the student_id for this student
self.cursor.execute("SELECT student_id FROM students WHERE full_name = ? AND class_name = ?", (name, class_name))
student_id_result = self.cursor.fetchone()
if student_id_result is None:
continue
student_id = student_id_result[0]
# Process schedule data for this student
# Go through each column to find subject and group info
for col_idx, cell_value in enumerate(student_row):
if cell_value and col_idx < len(header_row):
# Get the subject from the header
subject_header = header_row[col_idx] if col_idx < len(header_row) else ""
# Skip columns that don't contain schedule information
if (col_idx == 0 or col_idx == 1 or col_idx == 2 or col_idx == class_col_idx or col_idx == name_col_idx or # skip metadata cols
"сортировка" in subject_header.lower() or
"номер" in subject_header.lower() or
"шкафчика" in subject_header.lower() or
"локера" in subject_header.lower()):
continue
# Extract group information from the cell
group_assignment = cell_value.strip()
if group_assignment and group_assignment.lower() != "nan" and group_assignment != "-" and group_assignment != "":
# Find the teacher associated with this subject
subject_name = str(subject_header).strip()
teacher_name = teacher_subject_map.get(subject_name, f"Default Teacher for {subject_name}")
# Insert the entities into their respective tables first
# Then get their IDs to create the schedule entry
self._process_schedule_entry_with_teacher_mapping(
student_id, group_assignment, subject_name, teacher_name
)
self.conn.commit()
return True

View File

@@ -0,0 +1,220 @@
self.conn.commit()
return True
def process_csv_with_teacher_mapping(self, csv_file):
"""Process CSV with teacher-subject mapping based on positional order"""
if not os.path.exists(csv_file):
return False
with open(csv_file, 'r', encoding='utf-8') as file:
reader = csv.reader(file)
rows = list(reader)
# Identify header row - look for the row containing "ФИО" (full name) or similar indicators
header_idx = None
for i, row in enumerate(rows):
for cell in row:
if "ФИО" in str(cell) or "фио" in str(cell).lower() or "Ф.И.О." in str(cell) or "ф.и.о." in str(cell):
header_idx = i
break
if header_idx is not None:
break
if header_idx is None:
# Check if this file contains class and name columns that identify it as a student data file
# Even if the header doesn't contain ФИО, we might still be able to identify student data
has_class_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
has_name_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
if has_class_indicators and has_name_indicators:
# Try to find the header row by looking for class and name indicators
for i, row in enumerate(rows):
if any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class']) and \
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname']):
header_idx = i
break
if header_idx is None:
print(f"Skipping {csv_file} - does not appear to be student data with ФИО/class columns")
return False
# Find teacher-subject mappings in the first 0-15 rows before the header
teacher_subject_map = {}
# Build a mapping of subject names in the header row
header_row = rows[header_idx]
header_subjects = {}
for col_idx, subject_name in enumerate(header_row):
subject_name = str(subject_name).strip()
if (subject_name and
subject_name.lower() not in ['ф.и.о.', 'фио', 'класс', 'номер', 'сортировка', 'шкафчика', 'локера'] and
subject_name.strip() != "" and
"ф.и.о" not in subject_name.lower() and
"сортировка" not in subject_name.lower() and
"номер" not in subject_name.lower() and
"" not in subject_name):
header_subjects[col_idx] = subject_name # Map column index to subject name
# Process rows before the header to find teacher names and map them to subjects
for i in range(min(15, header_idx)): # Check first 15 rows before header
current_row = rows[i]
# Process all cells in the row to find teacher names and their adjacent context
for j, cell_value in enumerate(current_row):
cell_str = str(cell_value).strip()
# Check if this cell is a likely teacher name
if self._is_likely_teacher_name(cell_str):
# Look for context on the left (department) and right (subject)
left_context = ""
right_context = ""
# Get left neighbor (department)
if j > 0 and j-1 < len(current_row):
left_context = str(current_row[j-1]).strip()
# Get right neighbor (subject)
if j < len(current_row) - 1:
right_context = str(current_row[j+1]).strip()
# Try to determine the subject based on adjacency
matched_subject = None
# First priority: right neighbor if it matches a subject in the header
if right_context and j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# Second priority: use left context if it semantically relates to a teacher
elif left_context and any(keyword in left_context.lower() for keyword in ['учитель', 'teacher', 'кафедра', 'department']):
# If left context indicates a department, look for subject to the right of teacher
if j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# If no subject to the right, try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Third priority: try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if matched_subject and (matched_subject not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(matched_subject, '')):
teacher_subject_map[matched_subject] = cell_str
# If the cell contains multiple names (separated by newlines), process each separately
elif '\n' in cell_str or '\\n' in cell_str:
cell_parts = [part.strip() for part in cell_str.replace('\\n', '\n').split('\n') if part.strip()]
for part in cell_parts:
if self._is_likely_teacher_name(part):
# Look for context on the left (department) and right (subject)
left_context = ""
right_context = ""
# Get left neighbor (department)
if j > 0 and j-1 < len(current_row):
left_context = str(current_row[j-1]).strip()
# Get right neighbor (subject)
if j < len(current_row) - 1:
right_context = str(current_row[j+1]).strip()
# Try to determine the subject based on adjacency
matched_subject = None
# First priority: right neighbor if it matches a subject in the header
if right_context and j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# Second priority: use left context if it semantically relates to a teacher
elif left_context and any(keyword in left_context.lower() for keyword in ['учитель', 'teacher', 'кафедра', 'department']):
# If left context indicates a department, look for subject to the right of teacher
if j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# If no subject to the right, try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Third priority: try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if matched_subject and (matched_subject not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(matched_subject, '')):
teacher_subject_map[matched_subject] = part
# Additional validation: Remove any teacher-subject mappings that seem incorrect
validated_teacher_subject_map = {}
for subject, teacher in teacher_subject_map.items():
# Only add to validated map if teacher name passes all checks
if self._is_likely_teacher_name(teacher):
validated_teacher_subject_map[subject] = teacher
else:
print(f"Warning: Invalid teacher name '{teacher}' detected for subject '{subject}', skipping...")
teacher_subject_map = validated_teacher_subject_map
# Additional teacher-subject mapping: scan the data rows for teacher names paired with subjects
# In many CSV files, teacher names appear in the same rows as subject data
for i in range(header_idx + 1, min(len(rows), header_idx + 50)): # Check first 50 data rows
current_row = rows[i]
for j, cell_value in enumerate(current_row):
cell_str = str(cell_value).strip()
# If cell contains a likely teacher name and corresponds to a subject column
if self._is_likely_teacher_name(cell_str) and j in header_subjects:
subject_name = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if (subject_name not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(subject_name, '')):
teacher_subject_map[subject_name] = cell_str
# Process each student row
for student_row in rows[header_idx + 1:]:
# Determine the structure dynamically based on the header
class_col_idx = None
name_col_idx = None
# Find the index of the class column (usually called "Класс")
for idx, header in enumerate(header_row):
if "Класс" in str(header) or "класс" in str(header) or "Class" in str(header) or "class" in str(header):
class_col_idx = idx
break
# Find the index of the name column (usually called "ФИО")
for idx, header in enumerate(header_row):
if "ФИО" in str(header) or "ф.и.о." in str(header).lower() or "name" in str(header).lower():
name_col_idx = idx
break
# If we couldn't find the columns properly, skip this row
if class_col_idx is None or name_col_idx is None:
continue
# Check if this row has valid data in the expected columns
if (len(student_row) > max(class_col_idx, name_col_idx) and
student_row[class_col_idx].strip() and # class name exists
student_row[name_col_idx].strip() and # student name exists
self._is_valid_student_record_by_cols(student_row, class_col_idx, name_col_idx)):
name = student_row[name_col_idx].strip() # Name column
class_name = student_row[class_col_idx].strip() # Class column
# Insert student into the database
self.cursor.execute(
"INSERT OR IGNORE INTO students (class_name, full_name) VALUES (?, ?)",
(class_name, name)
)
# Get the student_id for this student
self.cursor.execute("SELECT student_id FROM students WHERE full_name = ? AND class_name = ?", (name, class_name))
student_id_result = self.cursor.fetchone()
if student_id_result is None:
continue
student_id = student_id_result[0]
# Process schedule data for this student

View File

@@ -0,0 +1,721 @@
#!/usr/bin/env python
"""
database.py - School schedule database (normalized version)
Creates normalized tables and extracts from CSV with proper relationships
"""
import sqlite3
import csv
import os
import sys
import re
class SchoolScheduleDB:
def __init__(self, db_name='school_schedule.db'):
self.conn = sqlite3.connect(db_name)
self.cursor = self.conn.cursor()
# Initialize database tables
self.create_tables()
def create_tables(self):
"""Create normalized tables with proper relationships"""
# Teachers table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS teachers (
teacher_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
email TEXT,
phone TEXT
)
""")
# Subjects table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS subjects (
subject_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
description TEXT
)
""")
# Days table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS days (
day_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL -- e.g., Monday, Tuesday, etc.
)
""")
# Periods table - with proper unique constraint
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS periods (
period_id INTEGER PRIMARY KEY AUTOINCREMENT,
period_number INTEGER,
start_time TEXT,
end_time TEXT,
UNIQUE(period_number, start_time, end_time)
)
""")
# Groups table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS groups (
group_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
description TEXT,
class_name TEXT
)
""")
# Students table
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS students (
student_id INTEGER PRIMARY KEY AUTOINCREMENT,
class_name TEXT,
full_name TEXT NOT NULL
)
""")
# Schedule table with foreign key relationships
self.cursor.execute("""
CREATE TABLE IF NOT EXISTS schedule (
entry_id INTEGER PRIMARY KEY AUTOINCREMENT,
student_id INTEGER,
subject_id INTEGER,
teacher_id INTEGER,
day_id INTEGER,
period_id INTEGER,
group_id INTEGER,
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (subject_id) REFERENCES subjects(subject_id),
FOREIGN KEY (teacher_id) REFERENCES teachers(teacher_id),
FOREIGN KEY (day_id) REFERENCES days(day_id),
FOREIGN KEY (period_id) REFERENCES periods(period_id),
FOREIGN KEY (group_id) REFERENCES groups(group_id)
)
""")
self.conn.commit()
def populate_periods_table(self):
"""Populate the periods table with standard school periods"""
period_times = {
'1': ('09:00', '09:40'),
'2': ('10:00', '10:40'),
'3': ('11:00', '11:40'),
'4': ('11:50', '12:30'),
'5': ('12:40', '13:20'),
'6': ('13:30', '14:10'),
'7': ('14:20', '15:00'),
'8': ('15:20', '16:00'),
'9': ('16:15', '16:55'),
'10': ('17:05', '17:45'),
'11': ('17:55', '18:35'),
'12': ('18:45', '19:20'),
'13': ('19:20', '20:00')
}
for period_num, (start_time, end_time) in period_times.items():
self.cursor.execute(
"INSERT OR IGNORE INTO periods (period_number, start_time, end_time) VALUES (?, ?, ?)",
(int(period_num), start_time, end_time)
)
# Add days of the week
days_of_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
for day in days_of_week:
self.cursor.execute("INSERT OR IGNORE INTO days (name) VALUES (?)", (day,))
self.conn.commit()
def update_database_from_csv(self, auto_update=True):
"""Automatically update database from specific CSV files in the sample_data directory"""
sample_data_dir = "sample_data"
if not os.path.exists(sample_data_dir):
print(f"Directory '{sample_data_dir}' not found.")
return
# Get all CSV files and filter out the schedule template and sheet files
all_csv_files = [f for f in os.listdir(sample_data_dir) if f.endswith('.csv')]
# Keep only the actual student distribution files (not the sheets)
csv_files = []
for filename in all_csv_files:
if 'first_sheet' not in filename and 'last_sheet' not in filename and 'template' not in filename:
csv_files.append(filename)
if not csv_files:
print(f"No student data CSV files found in '{sample_data_dir}' directory.")
return
print(f"Found {len(csv_files)} student data CSV file(s):")
for i, filename in enumerate(csv_files, 1):
print(f" {i}. {filename}")
if auto_update:
print("\nAuto-updating database with all student data CSV files...")
files_to_update = csv_files
else:
response = input("\nUpdate database with CSV files? (yes/no): ").lower()
if response not in ['yes', 'y', 'да']:
print("Skipping database update.")
return
print(f"\n0. Update all files")
try:
selection = input(f"\nSelect file(s) to update (0 for all, or comma-separated numbers like 1,2,3): ")
if selection.strip() == '0':
# Update all files
files_to_update = csv_files
else:
# Parse user selection
indices = [int(x.strip()) - 1 for x in selection.split(',')]
files_to_update = [csv_files[i] for i in indices if 0 <= i < len(csv_files)]
if not files_to_update:
print("No valid selections made.")
return
except ValueError:
print("Invalid input. Please enter numbers separated by commas or '0' for all files.")
return
# Populate the periods and days tables first
self.populate_periods_table()
print(f"\nUpdating database with {len(files_to_update)} file(s):")
for filename in files_to_update:
print(f" - {filename}")
csv_path = os.path.join(sample_data_dir, filename)
print(f"Processing {csv_path}...")
self.process_csv_with_teacher_mapping(csv_path)
print("Database updated successfully with selected CSV data.")
def process_csv_with_teacher_mapping(self, csv_file):
"""Process CSV with teacher-subject mapping based on positional order"""
if not os.path.exists(csv_file):
return False
with open(csv_file, 'r', encoding='utf-8') as file:
reader = csv.reader(file)
rows = list(reader)
# Identify header row - look for the row containing "ФИО" (full name) or similar indicators
header_idx = None
for i, row in enumerate(rows):
for cell in row:
if "ФИО" in str(cell) or "фио" in str(cell).lower() or "Ф.И.О." in str(cell) or "ф.и.о." in str(cell):
header_idx = i
break
if header_idx is not None:
break
if header_idx is None:
# Check if this file contains class and name columns that identify it as a student data file
# Even if the header doesn't contain ФИО, we might still be able to identify student data
has_class_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
has_name_indicators = any(
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname'])
for row in rows[:min(len(rows), 10)] # Check first 10 rows
)
if has_class_indicators and has_name_indicators:
# Try to find the header row by looking for class and name indicators
for i, row in enumerate(rows):
if any(indicator in str(cell).lower() for cell in row for indicator in ['класс', 'class']) and \
any(indicator in str(cell).lower() for cell in row for indicator in ['имя', 'name', 'фамилия', 'surname']):
header_idx = i
break
if header_idx is None:
print(f"Skipping {csv_file} - does not appear to be student data with ФИО/class columns")
return False
# Find teacher-subject mappings in the first 0-15 rows before the header
teacher_subject_map = {}
# Build a mapping of subject names in the header row
header_row = rows[header_idx]
header_subjects = {}
for col_idx, subject_name in enumerate(header_row):
subject_name = str(subject_name).strip()
if (subject_name and
subject_name.lower() not in ['ф.и.о.', 'фио', 'класс', 'номер', 'сортировка', 'шкафчика', 'локера'] and
subject_name.strip() != "" and
"ф.и.о" not in subject_name.lower() and
"сортировка" not in subject_name.lower() and
"номер" not in subject_name.lower() and
"" not in subject_name):
header_subjects[col_idx] = subject_name # Map column index to subject name
# First, try to find teachers in the rows before the header
for i in range(min(15, header_idx)): # Check first 15 rows before header
current_row = rows[i]
# Process all cells in the row to find teacher names and their adjacent context
for j, cell_value in enumerate(current_row):
cell_str = str(cell_value).strip()
# Check if this cell is a likely teacher name
if self._is_likely_teacher_name(cell_str):
# Look for context on the left (department) and right (subject)
left_context = ""
right_context = ""
# Get left neighbor (department)
if j > 0 and j-1 < len(current_row):
left_context = str(current_row[j-1]).strip()
# Get right neighbor (subject)
if j < len(current_row) - 1:
right_context = str(current_row[j+1]).strip()
# Try to determine the subject based on adjacency
matched_subject = None
# First priority: right neighbor if it matches a subject in the header
if right_context and j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# Second priority: use left context if it semantically relates to a teacher
elif left_context and any(keyword in left_context.lower() for keyword in ['учитель', 'teacher', 'кафедра', 'department']):
# If left context indicates a department, look for subject to the right of teacher
if j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# If no subject to the right, try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Third priority: try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if matched_subject and (matched_subject not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(matched_subject, '')):
teacher_subject_map[matched_subject] = cell_str
# If the cell contains multiple names (separated by newlines), process each separately
elif '\n' in cell_str or '\\n' in cell_str:
cell_parts = [part.strip() for part in cell_str.replace('\\n', '\n').split('\n') if part.strip()]
for part in cell_parts:
if self._is_likely_teacher_name(part):
# Look for context on the left (department) and right (subject)
left_context = ""
right_context = ""
# Get left neighbor (department)
if j > 0 and j-1 < len(current_row):
left_context = str(current_row[j-1]).strip()
# Get right neighbor (subject)
if j < len(current_row) - 1:
right_context = str(current_row[j+1]).strip()
# Try to determine the subject based on adjacency
matched_subject = None
# First priority: right neighbor if it matches a subject in the header
if right_context and j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# Second priority: use left context if it semantically relates to a teacher
elif left_context and any(keyword in left_context.lower() for keyword in ['учитель', 'teacher', 'кафедра', 'department']):
# If left context indicates a department, look for subject to the right of teacher
if j+1 in header_subjects:
matched_subject = header_subjects[j+1]
# If no subject to the right, try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Third priority: try to map by position
elif j in header_subjects:
matched_subject = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if matched_subject and (matched_subject not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(matched_subject, '')):
teacher_subject_map[matched_subject] = part
# Additional teacher-subject mapping: scan the rows immediately before the header for teacher names in subject columns
# In many CSV files, teacher names appear in the same rows as subject headers
for i in range(max(0, header_idx - 5), header_idx): # Check 5 rows before header
current_row = rows[i]
for j, cell_value in enumerate(current_row):
cell_str = str(cell_value).strip()
# If cell contains a likely teacher name and corresponds to a subject column
if self._is_likely_teacher_name(cell_str) and j in header_subjects:
subject_name = header_subjects[j]
# Only add if we don't have a better teacher name for this subject yet
if (subject_name not in teacher_subject_map or
'Default Teacher for' in teacher_subject_map.get(subject_name, '')):
teacher_subject_map[subject_name] = cell_str
# Additional validation: Remove any teacher-subject mappings that seem incorrect
validated_teacher_subject_map = {}
for subject, teacher in teacher_subject_map.items():
# Only add to validated map if teacher name passes all checks
if self._is_likely_teacher_name(teacher):
validated_teacher_subject_map[subject] = teacher
else:
print(f"Warning: Invalid teacher name '{teacher}' detected for subject '{subject}', skipping...")
teacher_subject_map = validated_teacher_subject_map
# Process each student row
for student_row in rows[header_idx + 1:]:
# Determine the structure dynamically based on the header
class_col_idx = None
name_col_idx = None
# Find the index of the class column (usually called "Класс")
for idx, header in enumerate(header_row):
if "Класс" in str(header) or "класс" in str(header) or "Class" in str(header) or "class" in str(header):
class_col_idx = idx
break
# Find the index of the name column (usually called "ФИО")
for idx, header in enumerate(header_row):
if "ФИО" in str(header) or "ф.и.о." in str(header).lower() or "name" in str(header).lower():
name_col_idx = idx
break
# If we couldn't find the columns properly, skip this row
if class_col_idx is None or name_col_idx is None:
continue
# Check if this row has valid data in the expected columns
if (len(student_row) > max(class_col_idx, name_col_idx) and
student_row[class_col_idx].strip() and # class name exists
student_row[name_col_idx].strip() and # student name exists
self._is_valid_student_record_by_cols(student_row, class_col_idx, name_col_idx)):
name = student_row[name_col_idx].strip() # Name column
class_name = student_row[class_col_idx].strip() # Class column
# Insert student into the database
self.cursor.execute(
"INSERT OR IGNORE INTO students (class_name, full_name) VALUES (?, ?)",
(class_name, name)
)
# Get the student_id for this student
self.cursor.execute("SELECT student_id FROM students WHERE full_name = ? AND class_name = ?", (name, class_name))
student_id_result = self.cursor.fetchone()
if student_id_result is None:
continue
student_id = student_id_result[0]
# Process schedule data for this student
# Go through each column to find subject and group info
for col_idx, cell_value in enumerate(student_row):
if cell_value and col_idx < len(header_row):
# Get the subject from the header
subject_header = header_row[col_idx] if col_idx < len(header_row) else ""
# Skip columns that don't contain schedule information
if (col_idx == 0 or col_idx == 1 or col_idx == 2 or col_idx == class_col_idx or col_idx == name_col_idx or # skip metadata cols
"сортировка" in subject_header.lower() or
"номер" in subject_header.lower() or
"шкафчика" in subject_header.lower() or
"локера" in subject_header.lower()):
continue
# Extract group information from the cell
group_assignment = cell_value.strip()
if group_assignment and group_assignment.lower() != "nan" and group_assignment != "-" and group_assignment != "":
# Find the teacher associated with this subject
subject_name = str(subject_header).strip()
teacher_name = teacher_subject_map.get(subject_name, f"Default Teacher for {subject_name}")
# Insert the entities into their respective tables first
# Then get their IDs to create the schedule entry
self._process_schedule_entry_with_teacher_mapping(
student_id, group_assignment, subject_name, teacher_name
)
self.conn.commit()
return True
def _is_valid_student_record_by_cols(self, row, class_col_idx, name_col_idx):
"""Check if a row represents a valid student record based on specific columns"""
# A valid student record should have:
# - Non-empty class name in the class column
# - Non-empty student name in the name column
if len(row) <= max(class_col_idx, name_col_idx):
return False
class_name = row[class_col_idx].strip() if len(row) > class_col_idx else ""
student_name = row[name_col_idx].strip() if len(row) > name_col_idx else ""
# Check if the class name looks like an actual class (contains a number followed by a letter)
class_pattern = r'^\d+[А-ЯA-Z]$' # e.g., 6А, 11А, 4B
if re.match(class_pattern, class_name):
return bool(student_name and student_name != class_name) # Ensure name exists and is different from class
# If not matching class pattern, check if the name field is not just another class-like value
name_pattern = r'^\d+[А-ЯA-Z]$' # This would indicate it's probably a class, not a name
if re.match(name_pattern, student_name):
return False # This row has a class in the name field, so not valid
return bool(class_name and student_name and class_name != student_name)
def _process_schedule_entry_with_teacher_mapping(self, student_id, group_info, subject_info, teacher_name):
"""Process individual schedule entries with explicit teacher mapping and insert into normalized tables"""
# Clean up the inputs
subject_name = subject_info.strip() if subject_info.strip() else "General Class"
group_assignment = group_info.strip()
# Only proceed if we have valid data
if subject_name and group_assignment and group_assignment.lower() != "nan" and group_assignment != "-" and group_assignment != "":
# Insert subject if not exists and get its ID
self.cursor.execute("INSERT OR IGNORE INTO subjects (name) VALUES (?)", (subject_name,))
self.cursor.execute("SELECT subject_id FROM subjects WHERE name = ?", (subject_name,))
subject_id = self.cursor.fetchone()[0]
# Insert teacher if not exists and get its ID
# Use the teacher name as is, without default creation if not found
self.cursor.execute("INSERT OR IGNORE INTO teachers (name) VALUES (?)", (teacher_name,))
self.cursor.execute("SELECT teacher_id FROM teachers WHERE name = ?", (teacher_name,))
teacher_result = self.cursor.fetchone()
if teacher_result:
teacher_id = teacher_result[0]
else:
# Fallback to a default teacher if the extracted name is invalid
default_teacher = "Неизвестный преподаватель"
self.cursor.execute("INSERT OR IGNORE INTO teachers (name) VALUES (?)", (default_teacher,))
self.cursor.execute("SELECT teacher_id FROM teachers WHERE name = ?", (default_teacher,))
teacher_id = self.cursor.fetchone()[0]
# Use a default day for now (in a real system, we'd extract this from the schedule)
# For now, we'll randomly assign to a day of the week
import random
days_list = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
selected_day = random.choice(days_list)
self.cursor.execute("INSERT OR IGNORE INTO days (name) VALUES (?)", (selected_day,))
self.cursor.execute("SELECT day_id FROM days WHERE name = ?", (selected_day,))
day_id = self.cursor.fetchone()[0]
# Use a default period - for now we'll use period 1, but in a real system
# we would need to extract this from the CSV if available
self.cursor.execute("SELECT period_id FROM periods WHERE period_number = 1 LIMIT 1")
period_result = self.cursor.fetchone()
if period_result:
period_id = period_result[0]
else:
# Fallback if no periods were inserted
self.cursor.execute("SELECT period_id FROM periods LIMIT 1")
period_id = self.cursor.fetchone()[0]
# Clean the group name to separate it from student data
group_name = self._clean_group_name(group_assignment)
self.cursor.execute("INSERT OR IGNORE INTO groups (name) VALUES (?)", (group_name,))
self.cursor.execute("SELECT group_id FROM groups WHERE name = ?", (group_name,))
group_id = self.cursor.fetchone()[0]
# Insert the schedule entry
self.cursor.execute("""
INSERT OR IGNORE INTO schedule (student_id, subject_id, teacher_id, day_id, period_id, group_id)
VALUES (?, ?, ?, ?, ?, ?)
""", (student_id, subject_id, teacher_id, day_id, period_id, group_id))
def _clean_group_name(self, raw_group_data):
"""Extract clean group name from potentially mixed student/group data"""
# Remove potential student names from the group data
# Group names typically contain numbers, class identifiers, or specific activity names
cleaned = raw_group_data.strip()
# If the group data looks like it contains a student name pattern,
# we'll try to extract just the group identifier part
if re.match(r'^\d+[А-ЯA-Z]', cleaned):
# This looks like a class designation, return as is
return cleaned
# If the group data contains common group indicators, return as is
group_indicators = ['кл', 'class', 'club', 'track', 'group', 'module', '-']
if any(indicator in cleaned.lower() for indicator in group_indicators):
return cleaned
# If the group data looks like a subject-identifier pattern, return as is
subject_indicators = ['ICT', 'English', 'Math', 'Physics', 'Chemistry', 'Biology', 'Science']
if any(indicator in cleaned for indicator in subject_indicators):
return cleaned
# If none of the above conditions match, return a generic group name
return f"Group_{hash(cleaned) % 10000}"
def _is_likely_teacher_name(self, text):
"""Check if the text is likely to be a teacher name"""
if not text or len(text.strip()) < 5: # Require minimum length for a name
return False
text = text.strip()
# Common non-name values that appear in the CSV
common_non_names = ['-', 'nan', 'нет', 'нету', 'отсутствует', 'учитель', 'teacher', '', 'Е4 Е5', 'E4 E5', 'группа', 'group']
if text.lower() in common_non_names:
return False
# Exclusion patterns for non-teacher entries
exclusion_patterns = [
r'^[А-ЯЁ]\d+\s+[А-ЯЁ]\d+$', # E4 E5 pattern
r'^[A-Z]\d+\s+[A-Z]\d+$', # English groups
r'.*[Tt]rack.*', # Track identifiers
r'.*[Gg]roup.*', # Group identifiers
r'.*\d+[А-ЯA-Z]\d*$', # Number-letter combos
r'^[А-ЯЁA-Z].*\d+', # Text ending with digits
r'.*[Cc]lub.*', # Club identifiers
]
for pattern in exclusion_patterns:
if re.match(pattern, text, re.IGNORECASE):
return False
# Positive patterns for teacher names
teacher_patterns = [
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ]\.\s*[А-ЯЁ]\.$', # Иванов А.А.
r'^[А-ЯЁ]\.\s*[А-ЯЁ]\.\s+[А-ЯЁ][а-яё]+$', # А.А. Иванов
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+$', # Full name
r'^[A-Z][a-z]+\s+[A-Z][a-z]+$', # John Smith
r'^[A-Z][a-z]+\s+[A-Z]\.\s*[A-Z]\.$', # Smith J.J.
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+$', # Russian names without patronymic
]
for pattern in teacher_patterns:
if re.match(pattern, text.strip()):
return True
# Additional check: if it looks like a proper name (with capital letters and min length)
# and doesn't match exclusion patterns
name_parts = text.split()
if len(name_parts) >= 2:
# At least two parts (first name + last name)
# Check if they start with capital letters
if all(part[0].isupper() for part in name_parts if len(part) > 1):
return True
return False
def _is_likely_subject_label(self, text):
"""Check if text is likely a subject label like 'Матем.', 'Информ.', 'Англ.яз', etc."""
if not text or len(text) < 2:
return False
# Common Russian abbreviations for subjects
subject_patterns = [
'Матем.', 'Информ.', 'Англ.яз', 'Русск.яз', 'Физика', 'Химия', 'Биол', 'История',
'Общество', 'География', 'Литер', 'Физкульт', 'Технотрек', 'Лидерство',
'Спорт. клуб', 'ОРКСЭ', 'Китайск', 'Немецк', 'Француз', 'Speaking club', 'Maths',
'ICT', 'Geography', 'Physics', 'Robotics', 'Culinary', 'Science', 'AI Core', 'VR/AR',
'CyberSafety', 'Business', 'Design', 'Prototype', 'MediaCom', 'Science', 'Robotics',
'Culinary', 'AI Core', 'VR/AR', 'CyberSafety', 'Business', 'Design', 'Prototype',
'MediaCom', 'Robotics Track', 'Culinary Track', 'Science Track', 'AI Core Track',
'VR/AR Track', 'CyberSafety Track', 'Business Track', 'Design Track', 'Prototype Track',
'MediaCom Track', 'Math', 'Algebra', 'Geometry', 'Calculus', 'Statistics', 'Coding',
'Programming', 'Algorithm', 'Logic', 'Robotics', 'Physical Education', 'PE', 'Sports',
'Swimming', 'Fitness', 'Gymnastics', 'Climbing', 'Games', 'Art', 'Music', 'Dance',
'Karate', 'Judo', 'Martial Arts', 'Chess', 'Leadership', 'Entrepreneurship'
]
text_clean = text.strip().lower()
for pattern in subject_patterns:
if pattern.lower() in text_clean:
return True
# Also check for specific subject names found in the data
specific_subjects = ['матем.', 'информ.', 'англ.яз', 'русск.яз', 'каб.', 'business', 'maths',
'speaking', 'ict', 'geography', 'physics', 'robotics', 'science', 'ai core',
'vr/ar', 'cybersafety', 'design', 'prototype', 'mediacom', 'culinary',
'physical education', 'pe', 'sports', 'swimming', 'fitness', 'gymnastics',
'climbing', 'games', 'art', 'music', 'dance', 'karate', 'chess', 'leadership']
for subj in specific_subjects:
if subj in text_clean:
return True
return False
def _find_matching_subject_in_header_from_list(self, subject_label, header_subjects, header_row):
"""Find the matching full subject name in the header based on the label"""
if not subject_label:
return None
# Look for the best match in the header subjects
subject_label_lower = subject_label.lower().replace('.', '').replace('яз', 'язык')
# Direct match first
for col_idx, full_subj in header_subjects:
if subject_label_lower in full_subj.lower() or full_subj.lower() in subject_label_lower:
return full_subj
# If no direct match, try to find by partial matching in the whole header row
for i, header_item in enumerate(header_row):
if subject_label_lower in str(header_item).lower() or str(header_item).lower() in subject_label_lower:
return str(header_item).strip()
# Try more general matching - if label contains common abbreviations
for col_idx, full_subj in header_subjects:
full_lower = full_subj.lower()
if ('матем' in subject_label_lower and 'матем' in full_lower) or \
('информ' in subject_label_lower and 'информ' in full_lower) or \
('англ' in subject_label_lower and 'англ' in full_lower) or \
('русск' in subject_label_lower and 'русск' in full_lower) or \
('физик' in subject_label_lower and 'физик' in full_lower) or \
('хим' in subject_label_lower and 'хим' in full_lower) or \
('биол' in subject_label_lower and 'биол' in full_lower) or \
('истор' in subject_label_lower and 'истор' in full_lower) or \
('общ' in subject_label_lower and 'общ' in full_lower) or \
('географ' in subject_label_lower and 'географ' in full_lower):
return full_subj
return None
def find_student(self, name_query):
"""Search for students by name"""
self.cursor.execute("""
SELECT s.full_name, s.class_name
FROM students s
WHERE s.full_name LIKE ?
LIMIT 10
""", (f'%{name_query}%',))
return self.cursor.fetchall()
def get_current_class(self, student_name, current_day, current_time):
"""Find student's current class"""
self.cursor.execute("""
SELECT sub.name, t.name, p.start_time, p.end_time
FROM schedule sch
JOIN students s ON sch.student_id = s.student_id
JOIN subjects sub ON sch.subject_id = sub.subject_id
JOIN teachers t ON sch.teacher_id = t.teacher_id
JOIN days d ON sch.day_id = d.day_id
JOIN periods p ON sch.period_id = p.period_id
JOIN groups g ON sch.group_id = g.group_id
WHERE s.full_name = ?
AND d.name = ?
AND p.start_time <= ?
AND p.end_time >= ?
""", (student_name, current_day, current_time, current_time))
return self.cursor.fetchone()
def close(self):
"""Close database connection"""
self.conn.close()
# Main execution - just setup database
if __name__ == "__main__":
db = SchoolScheduleDB()
# Check if auto-update flag is passed as argument
auto_update = len(sys.argv) > 1 and sys.argv[1] == '--auto'
db.update_database_from_csv(auto_update=auto_update)
db.close()

View File

@@ -0,0 +1,134 @@
# DFD.html to PNG Conversion Guide
## Overview
This document provides instructions for converting the DFD.html file to a PNG image.
## File Information
- **Input file**: `/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.html`
- **Expected output**: `DFD.png` in the same directory
## Method 1: Using Command Line Tools
### Option A: Using wkhtmltopdf
1. Install wkhtmltopdf:
```bash
# On macOS
brew install wkhtmltopdf
# On Ubuntu/Debian
sudo apt-get install wkhtmltopdf
```
2. Convert HTML to PNG:
```bash
wkhtmltoimage --width 1200 --height 800 "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.html" "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.png"
```
### Option B: Using Puppeteer (Node.js)
1. Install Node.js and npm if not already installed
2. Install Puppeteer:
```bash
npm install puppeteer
```
3. Create a conversion script:
```javascript
const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Read the HTML file
const htmlContent = fs.readFileSync('/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.html', 'utf8');
await page.setContent(htmlContent);
// Take screenshot
await page.screenshot({
path: '/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.png',
fullPage: true
});
await browser.close();
console.log('Conversion completed!');
})();
```
## Method 2: Using Python Libraries
### Option A: Using Selenium
1. Install required packages:
```bash
pip install selenium
```
2. Make sure you have ChromeDriver installed
3. Run the following script:
```python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import os
# Setup Chrome options
chrome_options = Options()
chrome_options.add_argument("--headless") # Run in background
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
# Initialize the driver
driver = webdriver.Chrome(options=chrome_options)
# Load the HTML file
file_url = "file://" + os.path.abspath("/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.html")
driver.get(file_url)
# Set window size and take screenshot
driver.set_window_size(1200, 800)
driver.save_screenshot("/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.png")
driver.quit()
print("Conversion completed!")
```
### Option B: Using Playwright
1. Install required packages:
```bash
pip install playwright
playwright install chromium
```
2. Run the following script:
```python
from playwright.sync_api import sync_playwright
import os
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Load the HTML file
file_path = os.path.abspath("/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.html")
page.goto(f"file://{file_path}")
# Set viewport size and take screenshot
page.set_viewport_size({"width": 1200, "height": 800})
page.screenshot(path="/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/Thesis materials/DFD.png", full_page=True)
browser.close()
print("Conversion completed!")
```
## Method 3: Manual Conversion
1. Open the DFD.html file in your web browser
2. Take a screenshot of the page (using Cmd+Shift+4 on macOS or PrtScn on Windows)
3. Crop the screenshot to include only the relevant content
4. Save the image as DFD.png in the Thesis materials directory
## Verification
After conversion, verify that:
- The PNG file exists in the Thesis materials directory
- The image clearly displays the content from the DFD.html file
- The image quality is sufficient for your needs

View File

@@ -0,0 +1,16 @@
import csv
import os
files = [f for f in os.listdir('sample_data') if f.endswith('.csv')]
print('CSV files:', files)
print()
for filename in files:
print(f"=== Examining {filename} ===")
with open(f'sample_data/{filename}', 'r', encoding='utf-8') as f:
reader = csv.reader(f)
for i, row in enumerate(reader):
print(f'Row {i}: {row[:10]}') # Print first 10 columns
if i == 5: # Print first 6 rows
break
print()

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,84 @@
#!/usr/bin/env python
"""
simple_combine.py - Simple script to insert content from two HTML files into the main HTML file
"""
import os
import re
def simple_combine():
# Define file paths
main_file_path = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/scheduler_bots/Thesis materials/Thesis_ Intelligent School Schedule Management System.html"
file1_path = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/scheduler_bots/Thesis materials/deepseek_html_20260128_0dc71d.html"
file2_path = "/Users/home/YandexDisk/TECHNOLYCEUM/ict/Year/2025/ai/ai7/ai7-m3/scheduler_bots/Thesis materials/deepseek_html_20260128_15ee7a.html"
# Read the main file content
with open(main_file_path, 'r', encoding='utf-8') as f:
main_content = f.read()
# Read the content from the first file
with open(file1_path, 'r', encoding='utf-8') as f:
file1_content = f.read()
# Read the content from the second file
with open(file2_path, 'r', encoding='utf-8') as f:
file2_content = f.read()
# Remove HTML structure from the additional files (doctype, html, head, body tags)
def clean_html_content(content):
# Remove doctype
content = re.sub(r'<!DOCTYPE[^>]*>', '', content, flags=re.IGNORECASE)
# Remove html tags
content = re.sub(r'<html[^>]*>|</html>', '', content, flags=re.IGNORECASE)
# Remove head section
content = re.sub(r'<head[^>]*>.*?</head>', '', content, flags=re.DOTALL | re.IGNORECASE)
# Remove body tags
content = re.sub(r'<body[^>]*>|</body>', '', content, flags=re.IGNORECASE)
return content.strip()
# Clean the content from both files
clean_file1_content = clean_html_content(file1_content)
clean_file2_content = clean_html_content(file2_content)
# Find the closing body tag to insert additional content
body_close_pos = main_content.rfind('</body>')
if body_close_pos == -1:
# If no closing body tag, find the closing html tag
html_close_pos = main_content.rfind('</html>')
if html_close_pos != -1:
insert_pos = html_close_pos
else:
# If no closing html tag, append at the end
insert_pos = len(main_content)
else:
insert_pos = body_close_pos
# Prepare the additional content to insert
additional_content = f'''
<!-- Additional Content from deepseek_html_20260128_0dc71d.html -->
<section class="additional-content" style="margin: 40px 0; padding: 20px; border: 1px solid #ccc; border-radius: 8px;">
<h2 style="color: #2c3e50; border-bottom: 2px solid #3498db; padding-bottom: 10px;">Additional Content Section 1</h2>
{clean_file1_content}
</section>
<!-- Additional Content from deepseek_html_20260128_15ee7a.html -->
<section class="additional-content" style="margin: 40px 0; padding: 20px; border: 1px solid #ccc; border-radius: 8px;">
<h2 style="color: #2c3e50; border-bottom: 2px solid #3498db; padding-bottom: 10px;">Additional Content Section 2</h2>
{clean_file2_content}
</section>
'''
# Insert the additional content into the main file
combined_content = main_content[:insert_pos] + additional_content + main_content[insert_pos:]
# Write the combined content back to the main file
with open(main_file_path, 'w', encoding='utf-8') as f:
f.write(combined_content)
print("Content from both files has been successfully inserted into the main HTML file.")
print(f"Updated file: {main_file_path}")
if __name__ == "__main__":
simple_combine()

View File

@@ -25,8 +25,8 @@ def load_schedule():
Returns a dictionary with day-wise schedule
"""
try:
# Read CSV file
df = pd.read_csv('schedule.csv')
# Read CSV file - Updated to use the provided file name
df = pd.read_csv('schedule_template RS.csv')
schedule = {}
# Process each row (each day)
@@ -34,20 +34,27 @@ def load_schedule():
day = row['Day']
schedule[day] = []
# Process each time slot column
time_slots = ['Period_1', 'Period_2', 'Period_3', 'Period_4', 'Period_5','Period_6', 'Period_7']
# Process each period column - Updated to match the actual CSV structure
# The CSV has columns labeled '1 (9:00-9:40)', '2 (10:00-10:40)', etc.
period_columns = [
'1 (9:00-9:40)', '2 (10:00-10:40)', '3 (11:00-11:40)', '4 (11:50-12:30)',
'5 (12:40-13:20)', '6 (13:30-14:10)', '7 (14:20-15:00)', '8 (15:20-16:00)',
'9 (16:15-16:55)', '10 (17:05-17:45)', '11 (17:55-18:35)', '12 (18:45-19:20)', '13 (19:20-20:00)'
]
for slot in time_slots:
# Check if class exists for this time slot
if pd.notna(row[slot]) and str(row[slot]).strip():
class_info = str(row[slot])
schedule[day].append((slot, class_info))
for i, col_name in enumerate(period_columns):
period_num = str(i + 1) # '1', '2', '3', etc.
# Check if the column exists and if class exists for this time slot
if col_name in row and pd.notna(row[col_name]) and str(row[col_name]).strip() != '':
class_info = str(row[col_name])
schedule[day].append((period_num, class_info)) # Store both period number and class info
return schedule
except FileNotFoundError:
print("❌ Error: schedule.csv file not found!")
print("Please create schedule.csv in the same folder")
print("❌ Error: schedule_template RS.csv file not found!")
print("Please make sure schedule_template RS.csv is in the same folder")
return {}
except Exception as e:
print(f"❌ Error loading schedule: {e}")
@@ -57,18 +64,26 @@ def load_schedule():
# Load schedule at startup
SCHEDULE = load_schedule()
# Time mapping for periods
PERIOD_TIMES = {
'Period_1': ('09:00', '09:40'),
'Period_2': ('10:00', '10:40'),
'Period_3': ('11:00', '11:40'),
'Period_4': ('11:50', '12:30'),
'Period_5': ('12:40', '13:20'),
'Period_6': ('13:30', '14:10'),
'Period_7': ('10:00', '10:40'),
# Map period numbers to times - Updated as requested
period_times = {
'1': ('09:00', '09:40'),
'2': ('10:00', '10:40'),
'3': ('11:00', '11:40'),
'4': ('11:50', '12:30'),
'5': ('12:40', '13:20'),
'6': ('13:30', '14:10'),
'7': ('14:20', '15:00'),
'8': ('15:20', '16:00'),
'9': ('16:15', '16:55'),
'10': ('17:05', '17:45'),
'11': ('17:55', '18:35'),
'12': ('18:45', '19:20'),
'13': ('19:20', '20:00')
}
# Time mapping for periods - Updated to use the new mapping
PERIOD_TIMES = period_times
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Send welcome message when command /start is issued."""
@@ -84,7 +99,7 @@ async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
async def where_am_i(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Tell user where they should be right now."""
if not SCHEDULE:
await update.message.reply_text("❌ Schedule not loaded. Check schedule.csv file.")
await update.message.reply_text("❌ Schedule not loaded. Check schedule_template RS.csv file.")
return
now = datetime.datetime.now()
@@ -101,9 +116,9 @@ async def where_am_i(update: Update, context: ContextTypes.DEFAULT_TYPE):
# Find current class
found_class = False
for period, class_info in SCHEDULE[current_day]:
start_time, end_time = PERIOD_TIMES[period]
for period_num, class_info in SCHEDULE[current_day]:
start_time, end_time = PERIOD_TIMES[period_num]
if start_time <= current_time <= end_time:
await update.message.reply_text(f"🎯 You should be in: {class_info}")
found_class = True
@@ -114,21 +129,25 @@ async def where_am_i(update: Update, context: ContextTypes.DEFAULT_TYPE):
async def schedule(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Show today's full schedule."""
"""Show the complete weekly schedule."""
if not SCHEDULE:
await update.message.reply_text("❌ Schedule not loaded. Check schedule.csv file.")
await update.message.reply_text("❌ Schedule not loaded. Check schedule_template RS.csv file.")
return
current_day = datetime.datetime.now().strftime("%A")
if current_day not in SCHEDULE or not SCHEDULE[current_day]:
await update.message.reply_text("😊 No classes scheduled for today!")
return
schedule_text = f"📚 {current_day}'s Schedule:\n\n"
for period, class_info in SCHEDULE[current_day]:
start, end = PERIOD_TIMES[period]
schedule_text += f"{start}-{end}: {class_info}\n"
schedule_text = "📚 Weekly Schedule:\n\n"
# Define the standard order of days in a week
days_of_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
for day in days_of_week:
if day in SCHEDULE and SCHEDULE[day]: # Check if the day exists in the schedule and has classes
schedule_text += f"*{day}'s Schedule:*\n"
for period_num, class_info in SCHEDULE[day]:
start, end = PERIOD_TIMES[period_num]
schedule_text += f"{start}-{end}: {class_info}\n"
schedule_text += "\n"
else:
schedule_text += f"{day}: No classes scheduled\n\n"
await update.message.reply_text(schedule_text)
@@ -136,7 +155,7 @@ async def schedule(update: Update, context: ContextTypes.DEFAULT_TYPE):
async def tomorrow(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Show tomorrow's schedule."""
if not SCHEDULE:
await update.message.reply_text("❌ Schedule not loaded. Check schedule.csv file.")
await update.message.reply_text("❌ Schedule not loaded. Check schedule_template RS.csv file.")
return
tomorrow_date = datetime.datetime.now() + datetime.timedelta(days=1)
@@ -147,8 +166,8 @@ async def tomorrow(update: Update, context: ContextTypes.DEFAULT_TYPE):
return
schedule_text = f"📚 {tomorrow_day}'s Schedule:\n\n"
for period, class_info in SCHEDULE[tomorrow_day]:
start, end = PERIOD_TIMES[period]
for period_num, class_info in SCHEDULE[tomorrow_day]:
start, end = PERIOD_TIMES[period_num]
schedule_text += f"{start}-{end}: {class_info}\n"
await update.message.reply_text(schedule_text)
@@ -160,7 +179,7 @@ async def help_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
"Available commands:\n"
"/start - Start the bot\n"
"/whereami - Find your current class\n"
"/schedule - Show today's schedule\n"
"/schedule - Show today's full schedule\n"
"/tomorrow - Show tomorrow's schedule\n"
"/help - Show this help message"
)
@@ -169,8 +188,8 @@ async def help_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
def main():
"""Start the bot."""
if not SCHEDULE:
print("❌ Failed to load schedule. Please check schedule.csv file.")
print("Make sure schedule.csv exists in the same folder")
print("❌ Failed to load schedule. Please check schedule_template RS.csv file.")
print("Make sure schedule_template RS.csv exists in the same folder")
return
# Create the Application

View File

@@ -0,0 +1,344 @@
#!/usr/bin/env python
"""
Enhanced Scheduler Bot with SQLite database support
"""
import sqlite3
import datetime
from telegram import Update
from telegram.ext import Application, CommandHandler, ContextTypes, MessageHandler, filters
# 🔑 REPLACE THIS with your bot token from @BotFather
BOT_TOKEN = "8248686383:AAGN5UJ73H9i7LQzIBR3TjuJgUGNTFyRHk8"
# Database setup
DATABASE_NAME = "schedule.db"
def init_db():
"""Initialize the SQLite database and create tables if they don't exist."""
conn = sqlite3.connect(DATABASE_NAME)
cursor = conn.cursor()
# Create table for schedule entries
cursor.execute('''
CREATE TABLE IF NOT EXISTS schedule (
id INTEGER PRIMARY KEY AUTOINCREMENT,
day TEXT NOT NULL,
period INTEGER NOT NULL,
subject TEXT NOT NULL,
class_name TEXT NOT NULL,
room TEXT NOT NULL,
UNIQUE(day, period)
)
''')
conn.commit()
conn.close()
def add_schedule_entry(day, period, subject, class_name, room):
"""Add a new schedule entry to the database."""
conn = sqlite3.connect(DATABASE_NAME)
cursor = conn.cursor()
try:
cursor.execute('''
INSERT OR REPLACE INTO schedule (day, period, subject, class_name, room)
VALUES (?, ?, ?, ?, ?)
''', (day, period, subject, class_name, room))
conn.commit()
conn.close()
return True
except sqlite3.Error as e:
print(f"Database error: {e}")
conn.close()
return False
def load_schedule_from_db():
"""Load schedule from the SQLite database."""
conn = sqlite3.connect(DATABASE_NAME)
cursor = conn.cursor()
cursor.execute("SELECT day, period, subject, class_name, room FROM schedule ORDER BY day, period")
rows = cursor.fetchall()
conn.close()
# Group by day
schedule = {}
for day, period, subject, class_name, room in rows:
if day not in schedule:
schedule[day] = []
class_info = f"Subject: {subject} Class: {class_name} Room: {room}"
schedule[day].append((str(period), class_info))
return schedule
# Initialize the database
init_db()
# Map period numbers to times - Updated as requested
period_times = {
'1': ('09:00', '09:40'),
'2': ('10:00', '10:40'),
'3': ('11:00', '11:40'),
'4': ('11:50', '12:30'),
'5': ('12:40', '13:20'),
'6': ('13:30', '14:10'),
'7': ('14:20', '15:00'),
'8': ('15:20', '16:00'),
'9': ('16:15', '16:55'),
'10': ('17:05', '17:45'),
'11': ('17:55', '18:35'),
'12': ('18:45', '19:20'),
'13': ('19:20', '20:00')
}
# User states for tracking conversations
user_states = {} # Stores user conversation state
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Send welcome message when command /start is issued."""
await update.message.reply_text(
"🤖 Hello! I'm your enhanced class scheduler bot with database support!\n"
"Use /whereami to find your current class\n"
"Use /schedule to see today's full schedule\n"
"Use /tomorrow to see tomorrow's schedule\n"
"Use /add to add a new class to the schedule\n"
"Use /help for all commands"
)
async def where_am_i(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Tell user where they should be right now."""
# Reload schedule from DB to ensure latest data
schedule = load_schedule_from_db()
if not schedule:
await update.message.reply_text("❌ Schedule not loaded from database.")
return
now = datetime.datetime.now()
current_time = now.strftime("%H:%M")
current_day = now.strftime("%A")
await update.message.reply_text(f"📅 Today is {current_day}")
await update.message.reply_text(f"⏰ Current time: {current_time}")
# Check if we have schedule for today
if current_day not in schedule:
await update.message.reply_text("😊 No classes scheduled for today!")
return
# Find current class
found_class = False
for period_num, class_info in schedule[current_day]:
start_time, end_time = period_times[period_num]
if start_time <= current_time <= end_time:
await update.message.reply_text(f"🎯 You should be in: {class_info}")
found_class = True
break
if not found_class:
await update.message.reply_text("😊 No class right now! Free period.")
async def schedule(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Show the complete weekly schedule."""
# Reload schedule from DB to ensure latest data
schedule = load_schedule_from_db()
if not schedule:
await update.message.reply_text("❌ Schedule not loaded from database.")
return
schedule_text = "📚 Weekly Schedule:\n\n"
# Define the standard order of days in a week
days_of_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
for day in days_of_week:
if day in schedule and schedule[day]: # Check if the day exists in the schedule and has classes
schedule_text += f"*{day}'s Schedule:*\n"
for period_num, class_info in schedule[day]:
start, end = period_times[period_num]
schedule_text += f"{start}-{end}: {class_info}\n"
schedule_text += "\n"
else:
schedule_text += f"{day}: No classes scheduled\n\n"
await update.message.reply_text(schedule_text)
async def tomorrow(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Show tomorrow's schedule."""
# Reload schedule from DB to ensure latest data
schedule = load_schedule_from_db()
if not schedule:
await update.message.reply_text("❌ Schedule not loaded from database.")
return
tomorrow_date = datetime.datetime.now() + datetime.timedelta(days=1)
tomorrow_day = tomorrow_date.strftime("%A")
if tomorrow_day not in schedule or not schedule[tomorrow_day]:
await update.message.reply_text(f"😊 No classes scheduled for {tomorrow_day}!")
return
schedule_text = f"📚 {tomorrow_day}'s Schedule:\n\n"
for period_num, class_info in schedule[tomorrow_day]:
start, end = period_times[period_num]
schedule_text += f"{start}-{end}: {class_info}\n"
await update.message.reply_text(schedule_text)
async def add(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Start the process of adding a new schedule entry."""
user_id = update.effective_user.id
user_states[user_id] = {"step": "waiting_day"}
await update.message.reply_text(
"📅 Adding a new class to the schedule.\n"
"Please enter the day of the week (e.g., Monday, Tuesday, etc.):"
)
async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Handle user messages during the add process."""
user_id = update.effective_user.id
if user_id not in user_states:
# Not in a conversation, ignore
return
state_info = user_states[user_id]
message_text = update.message.text.strip()
if state_info["step"] == "waiting_day":
# Validate day input
valid_days = ["monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday"]
if message_text.lower() not in valid_days:
await update.message.reply_text(
f"'{message_text}' is not a valid day of the week.\n"
"Please enter a valid day (e.g., Monday, Tuesday, etc.):"
)
return
state_info["day"] = message_text.capitalize()
state_info["step"] = "waiting_period"
await update.message.reply_text(
f"Got it! Day: {state_info['day']}\n"
"Now please enter the period number (1-13):"
)
elif state_info["step"] == "waiting_period":
try:
period = int(message_text)
if period < 1 or period > 13:
raise ValueError("Period must be between 1 and 13")
state_info["period"] = period
state_info["step"] = "waiting_subject"
await update.message.reply_text(
f"Got it! Period: {period}\n"
"Now please enter the subject name:"
)
except ValueError:
await update.message.reply_text(
f"'{message_text}' is not a valid period number.\n"
"Please enter a number between 1 and 13:"
)
elif state_info["step"] == "waiting_subject":
state_info["subject"] = message_text
state_info["step"] = "waiting_class"
await update.message.reply_text(
f"Got it! Subject: {message_text}\n"
"Now please enter the class name (e.g., 10ABC, 6A/6B, etc.):"
)
elif state_info["step"] == "waiting_class":
state_info["class_name"] = message_text
state_info["step"] = "waiting_room"
await update.message.reply_text(
f"Got it! Class: {message_text}\n"
"Finally, please enter the room number:"
)
elif state_info["step"] == "waiting_room":
state_info["room"] = message_text
# Add to database
success = add_schedule_entry(
state_info["day"],
state_info["period"],
state_info["subject"],
state_info["class_name"],
message_text
)
if success:
await update.message.reply_text(
f"✅ Successfully added to schedule!\n\n"
f"Day: {state_info['day']}\n"
f"Period: {state_info['period']}\n"
f"Subject: {state_info['subject']}\n"
f"Class: {state_info['class_name']}\n"
f"Room: {state_info['room']}"
)
else:
await update.message.reply_text(
f"❌ Failed to add to schedule. Please try again."
)
# Clean up user state
del user_states[user_id]
async def help_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Send help message with all commands."""
await update.message.reply_text(
"Available commands:\n"
"/start - Start the bot\n"
"/whereami - Find your current class\n"
"/schedule - Show today's full schedule\n"
"/tomorrow - Show tomorrow's schedule\n"
"/add - Add a new class to the schedule\n"
"/help - Show this help message"
)
def main():
"""Start the bot."""
# Create the Application
application = Application.builder().token(BOT_TOKEN).build()
# Add command handlers
application.add_handler(CommandHandler("start", start))
application.add_handler(CommandHandler("whereami", where_am_i))
application.add_handler(CommandHandler("schedule", schedule))
application.add_handler(CommandHandler("tomorrow", tomorrow))
application.add_handler(CommandHandler("add", add))
application.add_handler(CommandHandler("help", help_command))
# Add message handler for conversation flow
application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
# Start the Bot
print("🤖 Enhanced scheduler bot with database support is running...")
print("📊 Database initialized successfully!")
print("Press Ctrl+C to stop the bot")
application.run_polling()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,71 @@
#!/usr/bin/env python
"""
scheduler.py - Simple school schedule checker
No sample data, just real CSV data
"""
import datetime
from database import SchoolScheduleDB
def main():
db = SchoolScheduleDB()
print("🏫 School Schedule Checker")
# Ask for student name
name_query = input("\nEnter your name (or part of it): ").strip()
# Search for student
students = db.find_student(name_query)
if not students:
print("No student found.")
return
# Show found students
print("\nFound students:")
for i, (full_name, class_name) in enumerate(students, 1):
print(f"{i}. {full_name} ({class_name})")
# Let user select
if len(students) > 1:
choice = input(f"\nSelect student (1-{len(students)}): ")
try:
idx = int(choice) - 1
if 0 <= idx < len(students):
full_name, class_name = students[idx]
else:
print("Invalid choice.")
return
except:
print("Invalid input.")
return
else:
full_name, class_name = students[0]
print(f"\n👤 Student: {full_name} ({class_name})")
# Get current time
now = datetime.datetime.now()
current_day = now.strftime("%A")
current_time = now.strftime("%H:%M")
print(f"📅 Today: {current_day}")
print(f"⏰ Time: {current_time}")
# Find current class
current_class = db.get_current_class(full_name, current_day, current_time)
if current_class:
subject, teacher, start, end = current_class
print(f"\n🎯 CURRENT CLASS:")
print(f"📚 {subject}")
print(f"👨‍🏫 {teacher}")
print(f"🕐 {start}-{end}")
else:
print("\n😊 Free period!")
db.close()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,105 @@
#!/usr/bin/env python
"""
telegram_scheduler_v5.py - Advanced school schedule checker with homeroom teacher info using database
"""
import datetime
import csv
import re
import os
from database import SchoolScheduleDB
def find_student_location_and_teacher(student_name_query):
"""
Find where a student should be based on their name using database
"""
# Create database instance
db = SchoolScheduleDB()
# Search for student by name
students = db.find_student(student_name_query)
if not students:
print(f"No student found matching '{student_name_query}'")
db.close()
return
# Handle multiple student matches
if len(students) > 1:
print(f"\nFound {len(students)} students matching '{student_name_query}':")
for i, (full_name, class_name) in enumerate(students, 1):
print(f"{i}. {full_name} ({class_name})")
try:
choice = int(input(f"\nSelect student (1-{len(students)}): ")) - 1
if 0 <= choice < len(students):
full_name, class_name = students[choice]
else:
print("Invalid selection.")
db.close()
return
except ValueError:
print("Invalid input.")
db.close()
return
else:
full_name, class_name = students[0]
# Find the homeroom teacher for this student's class using the database
homeroom_teacher = db.get_homeroom_teacher(class_name)
# Get current schedule for the student
current_schedule = get_current_schedule_for_student_db(db, full_name, class_name)
# Display the results
print(f"\n🔍 STUDENT INFORMATION:")
print(f"👤 Student: {full_name}")
print(f"🎒 Class: {class_name}")
if current_schedule:
print(f"\n📋 TODAY'S SCHEDULE:")
for period_info in current_schedule:
subject, teacher, start_time, end_time, room_or_group = period_info
print(f" {start_time}-{end_time} | 📚 {subject} | 👨‍🏫 {teacher} | 🚪 {room_or_group}")
else:
print(f"\n😊 No scheduled classes for today!")
if homeroom_teacher:
print(f"\n🏫 HOMEROOM TEACHER INFORMATION:")
print(f"👨‍🏫 {homeroom_teacher['name']}")
print(f"📞 Internal Number: {homeroom_teacher['internal_number']}")
if homeroom_teacher['mobile_number']:
print(f"📱 Mobile: {homeroom_teacher['mobile_number']}")
print(f"🏢 Classroom: {homeroom_teacher['classroom']}")
print(f"🏛️ Parent Meeting Room: {homeroom_teacher['parent_meeting_room']}")
else:
print(f"\n❌ Could not find homeroom teacher for class {class_name}")
db.close()
def get_current_schedule_for_student_db(db, student_name, class_name):
"""
Get the full schedule for a student for the current day from database
"""
# Get all schedule records for the student
return db.get_student_schedule(student_name)
def main():
print("🏫 Advanced School Schedule Checker (Database Version)")
print("🔍 Find where a student should be and their homeroom teacher info")
# Ask for student name
name_query = input("\nEnter student name (or part of it): ").strip()
if not name_query:
print("Please enter a valid name.")
return
find_student_location_and_teacher(name_query)
if __name__ == "__main__":
main()

232
scheduler_bots/verify_db.py Normal file
View File

@@ -0,0 +1,232 @@
#!/usr/bin/env python
"""
verify_db.py - Verification script for the school schedule database
Checks data quality in teachers, groups, and students tables
"""
import sqlite3
import re
def connect_db(db_name='school_schedule.db'):
"""Connect to the database"""
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
return conn, cursor
def check_teachers_table(cursor):
"""Check the teachers table for data quality issues"""
print("Checking teachers table...")
cursor.execute("SELECT COUNT(*) FROM teachers")
total_count = cursor.fetchone()[0]
print(f"Total teachers: {total_count}")
# Find teachers with default names
cursor.execute("SELECT name FROM teachers WHERE name LIKE '%Default Teacher%' OR name LIKE '%Неизвестный%'")
default_teachers = cursor.fetchall()
print(f"Teachers with default names: {len(default_teachers)}")
for teacher in default_teachers:
print(f" - {teacher[0]}")
# Find potentially invalid teacher names
invalid_teachers = []
cursor.execute("SELECT name FROM teachers")
all_teachers = cursor.fetchall()
for (teacher_name,) in all_teachers:
if not is_valid_teacher_name(teacher_name):
invalid_teachers.append(teacher_name)
print(f"Potentially invalid teacher names: {len(invalid_teachers)}")
for teacher in invalid_teachers:
print(f" - {teacher}")
print()
def is_valid_teacher_name(name):
"""Check if a name looks like a valid teacher name"""
# Skip default names as they're intentionally different
if 'Default Teacher' in name or 'Неизвестный' in name:
return True # Considered valid as intentional placeholders
# Check for common invalid patterns
invalid_patterns = [
r'^\d+[А-ЯA-Z]$', # Class pattern like "8А", "11B"
r'^[А-ЯЁA-Z]\d+\s+[А-ЯЁA-Z]\d+$', # "E4 E5" pattern
r'.*[Gg]roup.*', # Group identifiers
r'.*[Tt]rack.*', # Track identifiers
r'^[А-ЯЁA-Z]\d+$', # Single group identifiers like "E4"
r'.*[Cc]lub.*', # Club identifiers
]
for pattern in invalid_patterns:
if re.match(pattern, name, re.IGNORECASE):
return False
# Valid teacher name patterns
valid_patterns = [
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+', # Russian names
r'^[A-Z][a-z]+\s+[A-Z][a-z]+', # English names
r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ]\.', # Name with initial
r'^[A-Z][a-z]+\s+[A-Z]\.', # Name with initial (English)
]
for pattern in valid_patterns:
if re.match(pattern, name):
return True
# If it's a reasonably long string with spaces and proper capitalization
parts = name.split()
if len(parts) >= 2 and len(name) >= 5:
# Check if parts start with capital letters
if all(len(part) > 0 and part[0].isupper() for part in parts):
return True
return False
def check_groups_table(cursor):
"""Check the groups table for data quality issues"""
print("Checking groups table...")
cursor.execute("SELECT COUNT(*) FROM groups")
total_count = cursor.fetchone()[0]
print(f"Total groups: {total_count}")
# Get all group names
cursor.execute("SELECT name FROM groups")
all_groups = cursor.fetchall()
# Check for potential student names in group names
potential_student_names = []
for (group_name,) in all_groups:
if looks_like_student_name(group_name):
potential_student_names.append(group_name)
print(f"Groups that look like student names: {len(potential_student_names)}")
for group in potential_student_names[:10]: # Show first 10
print(f" - {group}")
print()
def looks_like_student_name(name):
"""Check if a name looks like a student name instead of a group"""
# Class patterns like "8А", "11B" are OK as groups
class_pattern = r'^\d+[А-ЯA-Z]$'
if re.match(class_pattern, name):
return False
# Student names typically follow name patterns
name_pattern = r'^[А-ЯЁ][а-яё]+\s+[А-ЯЁ][а-яё]+' # Russian name
if re.match(name_pattern, name):
return True
name_pattern = r'^[A-Z][a-z]+\s+[A-Z][a-z]+' # English name
if re.match(name_pattern, name):
return True
# If it contains common group identifiers, it's likely a valid group
group_indicators = ['club', 'track', 'group', 'module', '-', 'class']
if any(indicator in name.lower() for indicator in group_indicators):
return False
return False
def check_students_table(cursor):
"""Check the students table"""
print("Checking students table...")
cursor.execute("SELECT COUNT(*) FROM students")
total_count = cursor.fetchone()[0]
print(f"Total students: {total_count}")
# Get sample students
cursor.execute("SELECT full_name, class_name FROM students LIMIT 5")
samples = cursor.fetchall()
print("Sample students:")
for student in samples:
print(f" - {student[0]} (Class: {student[1]})")
print()
def check_schedule_integrity(cursor):
"""Check the schedule table for data consistency"""
print("Checking schedule table integrity...")
# Count total schedule entries
cursor.execute("SELECT COUNT(*) FROM schedule")
total_schedules = cursor.fetchone()[0]
print(f"Total schedule entries: {total_schedules}")
# Count entries with valid relationships
cursor.execute("""
SELECT COUNT(*)
FROM schedule s
JOIN students st ON s.student_id = st.student_id
JOIN subjects su ON s.subject_id = su.subject_id
JOIN teachers t ON s.teacher_id = t.teacher_id
JOIN groups g ON s.group_id = g.group_id
""")
valid_relationships = cursor.fetchone()[0]
print(f"Schedules with valid relationships: {valid_relationships}")
# Check for orphaned records
print("Checking for orphaned records...")
# Students in schedule but not in students table
cursor.execute("""
SELECT COUNT(*) FROM schedule s
LEFT JOIN students st ON s.student_id = st.student_id
WHERE st.student_id IS NULL
""")
orphaned_students = cursor.fetchone()[0]
print(f"Orphaned student references: {orphaned_students}")
# Subjects in schedule but not in subjects table
cursor.execute("""
SELECT COUNT(*) FROM schedule s
LEFT JOIN subjects su ON s.subject_id = su.subject_id
WHERE su.subject_id IS NULL
""")
orphaned_subjects = cursor.fetchone()[0]
print(f"Orphaned subject references: {orphaned_subjects}")
# Teachers in schedule but not in teachers table
cursor.execute("""
SELECT COUNT(*) FROM schedule s
LEFT JOIN teachers t ON s.teacher_id = t.teacher_id
WHERE t.teacher_id IS NULL
""")
orphaned_teachers = cursor.fetchone()[0]
print(f"Orphaned teacher references: {orphaned_teachers}")
# Groups in schedule but not in groups table
cursor.execute("""
SELECT COUNT(*) FROM schedule s
LEFT JOIN groups g ON s.group_id = g.group_id
WHERE g.group_id IS NULL
""")
orphaned_groups = cursor.fetchone()[0]
print(f"Orphaned group references: {orphaned_groups}")
print()
def main():
"""Main function to run all checks"""
print("School Schedule Database Verification")
print("="*40)
try:
conn, cursor = connect_db()
check_teachers_table(cursor)
check_groups_table(cursor)
check_students_table(cursor)
check_schedule_integrity(cursor)
conn.close()
print("Verification complete!")
except Exception as e:
print(f"Error during verification: {str(e)}")
if __name__ == "__main__":
main()