EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING
This course is an introduction to data-driven methods applied to natural language processing. The emphasis is on methods, but we will survey applications such as syntactic parsing, text classification, information extraction, tagging, summarization. The final lectures will deal with statistical machine translation.
Lecturer: Philipp Koehn
TA: Tommy Herbert
Lectures: Monday and Thursday, 5:10pm, changed to: WRB room G.11
Tutorials: Tuesday and Friday, 1pm, AT 4.12
Tutorial group assignments.
- Tutorial 1 was discussed on January 22 (Tuesday) and 25 (Friday).
- Tutorial 2 will be discussed on February 1 (Friday) and 5 (Tuesday).
- Tutorial 3 will be discussed on February 8 (Friday) and 12 (Tuesday).
- The project baseline systems will be presented on February 22 (Friday) and 26 (Tuesday).
- Tutorial 4 will be discussed on March 4 (Tuesday) and 7 (Friday).
- Tutorial 5 will be discussed on March 11 (Tuesday) and 14 (Friday).
A single assessment (worth 30%) of the course will be given out late January. You will have to turn in your paper and code at the end of March in class. If you have a problem accessing the data from the web site, it is also available at /home/miles/projects/ner/data-eng/ (English) and/home/miles/projects/ner/data-deu/ (German).
The rest of the marks (70%) will go on the exam. Past exam, solutions.
SYLLABUSExact dates will change and may move around. Topics may shift and change during flight.| No | Date | Topic | Slides | Reference |
| 1 | 7 Jan | Introduction (I): Words and probability | display | print | MS chapter 1 K chapter 3 |
| 2 | 10 Jan | Introduction (II): Estimation and information theory | display | print | MS chapter 2 K chapter 3 |
| 3 | 14 Jan | Language modeling (I): From counts to smoothing | display | print | MS chapter 6 JM chapter 6 K chapter 7 |
| 4 | 17 Jan | Language modeling (II): Smoothing and back-off | display | print | MS chapter 6 JM chapter 6 K chapter 7 |
| 5 | 21 Jan | Tagging (I): Part-of-speech tagging with HMM | display | print | MS chapter 9/10 JM chapter 8 |
| 6 | 25 Jan | Tagging (II): Transformation-Based Learning | display | print | MS chapter 10 |
| 7 | 28 Jan | Tagging (III): Maximum Entropy Models | display | print | Ratnaparkhi [1996] Berger et al. [1993] |
| 8 | 31 Jan | Parsing (I): Context-free grammars and chart parsing | display | print | JM chapter 9/12 |
| 9 | 4 Feb | Project | display | print | - |
| 10 | 7 Feb | Parsing (II): Lexicalised and probabilistic parsing | display | print | JM chapter 12 |
| 11 | 11 Feb | Word sense disambiguation | display | print | JM section 17.2, MS chapter 7 Yarowsky [1995] |
| 12 | 14 Feb | Text categorization and clustering | display | print | MS chapter 14/16 |
| 13 | 18 Feb | Semantics and discourse | display | print | Carlson et al. [2001] Pang and Lee [2005] |
| 14 | 21 Feb | Machine translation (I): Introduction | display | print | - |
| 15 | 25 Feb | Machine translation (II): Word-based models and the EM algorithm | display | print | K chapter 4 Brown et al. [2003] |
| - | 28 Feb | NO CLASS | ||
| 16 | 3 Mar | Machine translation (III): Decoding | display | print | K chapter 6 Koehn [2004] |
| 17 | 6 Mar | Machine translation (IV): Phrase-based models | display | print | K chapter 5 Koehn et al. [2003] Och and Ney [2002] |
| 18 | 10 Mar | Machine translation (V): Syntax-based models | display | print | K chapter 11 Yamada and Knight [2002] Chiang [2005] Collins et al. [2005] |
| 19 | 13 Mar | Machine translation (VI): Advanced topics | display | print | - |
| 20 | 17 Mar | Review | - | - |
MS refers to "Manning and Schütze", JM refers to "Jurafsky and Martin", K to "Koehn", the three textbooks listed below.
REFERENCESWhen possible, online papers will be made available. As for books, the key references are:- Foundations of Statistical Natural Language Processing. Christopher Manning and Hinrich Schütze. Available online.
- Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Daniel Jurafsky and James H. Martin
- Statistical Machine Translation. Philipp Koehn. Not yet published, chapter copies will be handed out.


雷达卡


京公网安备 11010802022788号







