note Information Retrieval
Definition:
Information Retrieval: Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).
Chapter 1:
first manually classifying some documents and then hoping to be able to classify new documents automatically.
Three prominent scale:
- web search
gather documents for indexing, being able to build systems that work efficiently at this enormous scale, and handling particular aspects of the web, - personal information retrieval
- enterprise, institutional, and domain-specific search
Chapter 6: Scoring, term weighting and the vector space model
Term frequency and weighting:
a document or zone that mentions a query term
more often has more to do with that query and therefore should receive a higher score.
Reference:
https://nlp.stanford.edu/IR-book/