About this Course
Techniques and algorithms for associating relatively surface-level structures and information with natural language corpora.
Topics include:
- Word segmentation/tokenization
- Morphological analysis
- Part-of-Speech
- Language Modeling
- Named entity recognition
- Chunk parsing
- Linguistic resources that can be leveraged for these tasks (e.g., treebanks)
These techniques allow you to locate items of interest (e.g., product names, diagnoses, proper names) in running text, correlate their occurrences with each other and normalize text for further processing.