Convert a school schedule from PDF format to structured CSV data
Real-world skill: PDF data extraction is a common task in data analysis, administrative work, and automation projects.
Download both files before starting the challenge
Focus: Understanding the data and planning the extraction
Focus: Cleaning data and creating the final CSV
Pro tip: Take notes during Lesson 1 about the PDF structure. This will save time in Lesson 2.
Transform unstructured schedule data from a PDF into a structured CSV file.
Get data out of the PDF file using Python libraries or tools
Organize the messy, unstructured text into logical groups
Convert the data to match the CSV template format
Check that your CSV matches the expected structure
This PDF contains unstructured school schedule data with:
Challenge: The data is unstructured - you'll need to find patterns to extract it correctly.
Your goal is to create a CSV file matching this structure:
Note: Notice how class information is formatted as "Subject: ... Class: ... Room: ..."
Choose from these options for the data extraction:
Basic PDF text extraction
Advanced table extraction
Extract tables from PDF
Data cleaning & CSV export
Visual table extraction tool
Copy-paste & clean in spreadsheet
Suggestion: Start with pdfplumber for Python or Tabula GUI if you're new to PDF extraction.
Convert Пн, Вт, Ср, Чт, Пт to Monday, Tuesday, Wednesday, Thursday, Friday
Match class information to the correct time slots (1-13 with specific times)
Follow the exact format: "Subject: ... Class: ... Room: ..." in CSV cells
Leave cells empty for time slots with no classes
After completing the challenge, consider these questions:
Learning outcome: This challenge develops problem-solving, pattern recognition, and data transformation skills applicable to many real-world scenarios.