diff --git a/.DS_Store b/.DS_Store index b1f811b..21c159c 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/PDF to CSV Challenge_ School Schedule Transformation.html b/PDF to CSV Challenge_ School Schedule Transformation.html new file mode 100644 index 0000000..dda94d5 --- /dev/null +++ b/PDF to CSV Challenge_ School Schedule Transformation.html @@ -0,0 +1,936 @@ + + +
+ + +Convert a school schedule from PDF format to structured CSV data
+ +Real-world skill: PDF data extraction is a common task in data analysis, administrative work, and automation projects.
++ Download both files before starting the challenge +
+Focus: Understanding the data and planning the extraction
+Focus: Cleaning data and creating the final CSV
+Pro tip: Take notes during Lesson 1 about the PDF structure. This will save time in Lesson 2.
+Transform unstructured schedule data from a PDF into a structured CSV file.
+ +Get data out of the PDF file using Python libraries or tools
+Organize the messy, unstructured text into logical groups
+Convert the data to match the CSV template format
+Check that your CSV matches the expected structure
+This PDF contains unstructured school schedule data with:
+ +Challenge: The data is unstructured - you'll need to find patterns to extract it correctly.
+Your goal is to create a CSV file matching this structure:
+ +Note: Notice how class information is formatted as "Subject: ... Class: ... Room: ..."
+Choose from these options for the data extraction:
+ +Basic PDF text extraction
+Advanced table extraction
+Extract tables from PDF
+Data cleaning & CSV export
+Visual table extraction tool
+Copy-paste & clean in spreadsheet
+Suggestion: Start with pdfplumber for Python or Tabula GUI if you're new to PDF extraction.
+Convert Пн, Вт, Ср, Чт, Пт to Monday, Tuesday, Wednesday, Thursday, Friday
+Match class information to the correct time slots (1-13 with specific times)
+Follow the exact format: "Subject: ... Class: ... Room: ..." in CSV cells
+Leave cells empty for time slots with no classes
+After completing the challenge, consider these questions:
+ +Learning outcome: This challenge develops problem-solving, pattern recognition, and data transformation skills applicable to many real-world scenarios.
+