Extraction of structured information from unstructured documents
Large amounts of valuable information are hidden in unstructured documents (e.g., Web pages, e-mails, office documents). This thesis should start with a survey on information extraction approaches based on parsing of domain-specific, restricted vocabularies. Afterwards, a tool should be designed and implemented that allows (a) to describe relevant information at a conceptual level and (b) to generate an extractor based on the conceptual description. A proof-of-concept should be developed for the domain of scientific conferences announcements (via Web pages and e-mails).