This text covers the technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical concerns. It assumes some mathematical background on the part of the reader, but the chapters typically begin with a non-mathematical account of the key issues. Current research topics are covered only to the extent that they are informing current applications; detailed coverage of longer term research and more theoretical treatments should be sought elsewhere. There are many pointers at the ends of the chapters that the reader can follow to explore the literature. However, the book does maintain a strong emphasis on evaluation in every chapter both in terms of methodology and the results of controlled experimentation.
Les mer
1. Preface to the 2nd edition; 2. Chapter 1. Natural language processing; 3. 1.1 What is NLP?; 4. 1.2 NLP and linguistics; 5. 1.3 Linguistic tools; 6. 1.4 Plan of the book; 7. Chapter 2. Document retrieval; 8. 2.1 Information retrieval; 9. 2.2 Indexing technology; 10. 2.3 Query processing; 11. 2.4 Evaluating search engines; 12. 2.5 Attempts to enhance search performance; 13. 2.6 The future ofWeb searching; 14. Chapter 3. Information extraction; 15. 3.1 The message understanding conferences; 16. 3.2 Regular expressions; 17. 3.3 Finite automata in FASTUS; 18. 3.4 Context-free grammars; 19. 3.5 Limitations of current technology and future research; 20. 3.6 Summary of information extraction; 21. Chapter 4. Text categorization; 22. 4.1 Overview of categorization tasks; 23. 4.2 Handcrafted rule based methods; 24. 4.3 Inductive learning for text classification; 25. 4.4 Nearest neighbor algorithms; 26. 4.5 Combining classifiers; 27. 4.6 Evaluation of text categorization systems; 28. Chapter 5. Text mining; 29. 5.1 What is text mining?; 30. 5.2 Resolving reference and coreference; 31. 5.3 Automatic summarization; 32. 5.4 Testing of automatic summarization programs; 33. 5.5 Prospects for text mining and NLP; 34. References; 35. Index
Les mer

Produktdetaljer

ISBN
9789027249920
Publisert
2007-06-05
Utgave
2. utgave
Utgiver
Vendor
John Benjamins Publishing Co
Vekt
590 gr
Høyde
245 mm
Bredde
164 mm
Aldersnivå
U, P, 05, 06
Språk
Product language
Engelsk
Format
Product format
Innbundet