Natural Language Processing

(a.k.a. Natural Language Understanding)

Course Number: AT82.05
Credits: 3
Links:   Slides,  GithubYoutube
Similar Course (recommended):   Stanford CS224N

Course Description
This natural language processing course explores deep learning techniques with the focus on language. Students will gain both theoretical understanding and practical experience with state-of-the-art models including Word Embeddings, LSTM,  Transformer, Pretrained Models, Instruction Tuning, Reasoning and Agentic AI,  PEFT and Quantization, RAG,  Speech, Multimodal Learning. Through hands-on projects, students will implement and experiment with these architectures to solve real-world problems.

Instructor Information

Instructor Chaklam Silpasuwanchai
Office CS101
Email chaklam@ait.asia
Teacher Assistant Todsavad Tangtortan (st124859@ait.asia)

Prerequisites
There are no prerequisites but is recommended to take these courses prior:

  • AT82.03 Machine Learning
  • AT82.01 Computer Programming for Data Science and Artificial Intelligence

Course Objectives
By the end of this course, students will be able to: 

  1. Design, train, test, and deploy natural language processing techniques including embeddings,  RNN, LSTM, Transformers, Tokenization, Pretrained Models, .
  2. Develop an end-to-end deep learning project incorporating multiple techniques covered in the course, demonstrating ability to move from research concepts to practical applications

Recommended Textbook(s)

  1. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech RecognitionDaniel Jurafsky and James H. Martin

Grading

Component Percentage
Quiz 30
Project 30
Assignments 40

Grading scale is based on normal distribution.

Course Schedule

Weeks Topic Readings Extra Resources Assignments
1 Word2Vec, GloVe Word2Vec, GloVe PyTorch Word Embeddings Tutorial, Gensim's word2vec, Stanford GloVe, Spacy A1: Search Engine
2 RNN, LSTM LSTM, Sequence Generation with RNNs, Understanding LSTM Networks PyTorch Seq2Seq Tutorials, PyTorch Translation Tutorial, PyTorch Language Model Example A2: Language Model
3 Attention, Transformers Attention is All You Need, Neural Machine Translation by Attention, The Illustrated Transformer HuggingFace PyTorch Examples, HuggingFace Course, Annotated Transformer A3: Machine Translation
4 Tokenization, FastText, ELMo, Pretrained Models Deep contextualized word representations (ELMo), Enriching Word Vectors with Subword Information  HuggingFace Tokenizers A4: Resume Parser
5 Pretrained Models, Prompt-based Learning BERT, GPT-3Prefix-Tuning HF Transformers, HF CourseKarpathy's GPT tutorial A5: Sentence Embeddings
6 Instruction Tuning, RLHF InstructGPT, Best Practices, Self-Instruct, Constitutional AI, Scaling SFT Trainer, Ruder's on EMNLP 2023, Ruder's blog on Instruction Tuning  
7 Benchmark & Evaluation      
8 Reasoning, Agents, Agentic AI | Distillation, Pruning Distillation, Lottery Ticket Hypothesis, LoRA PEFT Library, PyTorch Quantization, DeepSpeed, SparseGPT A6: Student Layers
9 PEFT, Quantization      
10 RAG, Question Answering, Generation RAG, T5, BART LangChain, Lilian blog on hallucination, agents, and prompt engineering, Ruder's blog on LLM, Chip's blog on best practices A7: AITGPT
11       A8: Instruction Tuning
10 Multimodal Learning CLIP, Flamingo, GPT4o, BLIP OpenAI CLIP, LLaVA, LAVIS, OpenCLIP, Ruder's on ML, Chip on ML  
11 Project Proposal Presentation ACL Format, ACL Style Files, Insights by Chip  
12 No class N/A N/A  
13 Project Progress No Presentation    
14 No class N/A N/A  
15 Final Project Presentation    

Note:  If you have any good resources,  please email me chaklam@ait.asia!   Many of these resources come from the students.

Projects

1. Project Objectives
- Demonstrate practical application of NLP techniques
- Develop research and technical writing skills
- Create a scholarly paper following academic conference standards

2. Deliverables
- Report in ACL format (proposal, progress, defense)
- Github link with comprehensive README (progress, defense)
- 1-min demo video (progress, defense)

3. Project Requirements

3.1 Project Scope
- 2-3 members maximum
- Original research or significant improvement on existing approaches
- Must involve substantive NLP methodology or application

3.2 Technical Expectations
- Implement at least one of the following:
  - Novel NLP model or architecture
  - Innovative application of existing NLP techniques
  - Comprehensive comparative study of NLP methods
  - Dataset creation or significant augmentation
  - Advanced NLP pipeline or system

3.3 Implementation Requirements
- Use modern NLP libraries/frameworks (e.g., spaCy, NLTK, Transformers)
- Demonstrate understanding of:
  - Appropriate data preprocessing
  - Model selection and justification
  - Comprehensive evaluation metrics
  - Statistical significance of results

4. ACL Paper Format Guidelines

Papers must follow the standard ACL paper format with the following sections:

Title Page Abstract Introduction Related Work Methodology Experiments Results Discussion Conclusion References
  • Descriptive title
  • Author names and affiliations
  •  Contact information
  • 250 words max
  • Concise summary of research
  • Problem statement
  • Key methodology
  • Primary results and contributions
  •  Research problem context
  •  Motivation
  •  Research questions/hypotheses
  •  Brief overview of approach
  • Comprehensive literature review
  • Positioning of current research
  • Gaps in existing approaches
  • Detailed description of approach
  • Technical implementation details
  • Algorithmic or model descriptions
  • Experimental setup
  • Dataset description
  • Evaluation metrics
  • Statistics analysis
  • Ablation studies (if applicable)
  • Presentation of key findings
  • Comparative analysis
  • Visualizations and tables
  • Implications of research
  • Limitations
  • Potential future work
  • Summarize key contributions
  • Broader impact of research
  • Minimum 15-20 academic sources
  • Follow ACL citation style

5. Evaluation Criteria

5.1 Technical Criteria (60%)
- Originality and creativity of approach (20%)
- Technical depth and complexity (20%)
- Experimental methodology (10%)
- Results and analysis (10%)

5.2 Documentation Criteria (40%)
- ACL paper quality (15%)
- Clarity of writing (10%)
- Presentation and visualization (7%)
- References and related work (8%)

Course Policies

Attendance
Physical attendance in class is a crucial component of your learning in this course. Students are required to attend a minimum of 70% of all class sessions in person. Failure to meet this attendance requirement will result in an automatic F grade for the course, regardless of performance in other assessments.

  • Attendance will be taken at the beginning of each class
  • Medical emergencies and university-approved activities must be documented
  • Contact the TA before class when possible for any anticipated absence

Late Work 

  • All assignments must be submitted through the course submission system by the specified due date and time
  • The submission system will automatically close at the deadline
  • No extensions will be granted except for documented emergencies
  • Late work will not be accepted for credit under any circumstances
  • While the submission portal will remain accessible after deadlines, you may continue to submit work for learning purposes and feedback.  No points will be awarded for any submissions after the deadline

Best Practices

  • Start assignments early to avoid technical issues
  • Submit well before the deadline to avoid last-minute complications
  • Check your submission confirmation to ensure successful upload
  • Set up calendar reminders for all due dates listed in the course schedule

Academic Integrity

All submitted work must be your own original work. This includes code, analysis, and documentation.

Violations leading to automatic F grade:

  • Copying or sharing code/solutions
  • Submitting others' work

Acceptable:

  • Discussing concepts with classmates
  • Seeking help from instructors/TAs

Maintain academic integrity - it's fundamental to your learning and professional development.

Note: This syllabus is subject to change. Any modifications will be announced in class and posted online.