Practical Apache Lucene 8 : uncover the search capabilities of your application / Atri Sharma.

Gain a thorough knowledge of Lucene's capabilities and use it to develop your own search applications. This book explores the Java-based, high-performance text search engine library used to build search capabilities in your applications. Starting with the basics of Lucene and searching, you wil...

Full description

Saved in:
Bibliographic Details
Main Author: Sharma, Atri (Author)
Format: eBook
Language:English
Published: [Berkeley, CA] : Apress, [2020]
Subjects:
Online Access:Click for online access
Table of Contents:
  • Intro
  • Table of Contents
  • About the Author
  • About the Technical Reviewer
  • Acknowledgments
  • Introduction
  • Chapter 1: Hola, Lucene!
  • Key Features of Lucene
  • Information Retrieval Basics
  • Linear Scan
  • Stop List
  • Stemming
  • Term
  • Term-Document Incidence Matrix
  • Serving Queries Using a Term-Document Incidence Matrix
  • Basic Terminology
  • Heart of Lucene's Data Representation
  • Lucene's Inverted Index Structure
  • On-Disk Representation of a Lucene Index
  • Terms Dictionary
  • Frequencies File
  • Positions File
  • Queries on Lucene
  • Structure of a Lucene Query
  • Fields
  • Types of Queries in Lucene
  • Lucene vs. Relational Databases
  • Chapter 2: Hello World: The Lucene Way
  • Indexing Data in Lucene
  • Document
  • Analyzers
  • StandardAnalyzer
  • StopAnalyzer
  • SimpleAnalyzer
  • IndexWriter
  • Directory
  • Create Documents
  • Create Index and Write Documents
  • Adding Data to the Index
  • Bringing It All Together
  • TestClass
  • Document Search
  • QueryParser
  • TopDocs
  • IndexSearcher
  • IndexReader
  • Searching
  • Boolean Model
  • What Is Relevance?
  • Scoring Algorithms
  • TF/IDF
  • Vector Space Model
  • Scoring Example
  • Lucene's Scoring Model
  • Fields
  • Similarity
  • Boosting
  • Collectors
  • Chapter 3: Core Search Fundamentals
  • Codecs
  • DocValues
  • Phrase Queries
  • Term Vectors
  • BooleanQuery
  • MultiTermQuery
  • QueryCache
  • Scorer as Part of the Search Process
  • Chapter 4: Spatial Indexing
  • Spatial Module
  • What Are Geohashes?
  • Quad Trees
  • K-D Trees
  • BKD Trees
  • Using Spatial Indexing
  • Chapter 5: Location-Aware Search Engines
  • Why Use a Search Engine for Geographic Searches?
  • Range Queries
  • Function Queries
  • Geospatial Basics
  • Representing Spatial Data
  • Tiered Design for Storage
  • Geohashes
  • Spatial Data with Text Search
  • Distance Calculations
  • Bounding Box Filter
  • A Point on Distance Calculation
  • Chapter 6: Introducing Machine Learning with Apache Mahout
  • Origin of Apache Mahout
  • Why Apache Mahout?
  • Introduction to Machine Learning
  • Learning
  • Collaborative Filtering
  • Clustering
  • Categorization
  • Converting from Lucene Components to Mahout Components
  • Integrating Lucene with Mahout
  • lucene.vector
  • Lucene2seq
  • Java Version of Lucene2seq
  • Putting It All Together
  • Chapter 7: Improving Lucene's Performance
  • Increase Indexing Speed
  • Reuse Field Instances
  • The Curious Case of Large Commits
  • Reuse Tokens in Analyzers
  • Tuning Flush Intervals
  • Increase mergeFactor
  • Choosing the Correct Analyzers
  • Use Multiple Threads with One IndexWriter
  • Index into Separate Indexes and Then Merge
  • Improve Search Performance
  • Use the Latest Version of Lucene
  • Use IndexReader with the readOnly Attribute Equal to True
  • Use MMapDirectory/NIOFSDirectory
  • Decrease mergeFactor
  • Ignore First Query's Performance
  • Avoid Reopening IndexSearcher Instances
  • Share IndexSearcher Instances