Skip to main content
Version: Next

Lucene Configuration and Indexing

The indexing administration console allows you to manage the search index lifecycle. It provides complete control over the Lucene engine and enables you to perform the maintenance operations necessary for optimal performance.

Accessing Indexing Tools

Access Path: Tools > AdminTools Menu > Select AdminFulltextIndex

Full-Text Index Management

Index Status

Real-time Statistics

Display: Dashboard showing the current indexing status

Available Information:

  • Indexed documents: Number vs total
  • Indexed files: Processed attachments
  • Index size: Disk space occupied
  • Last update: Timestamp

Interpretation:

StatusMeaningAction
Indexed documents < TotalIndexing behindForce indexing
Size > 10 GBLarge indexOptimization recommended
Last update > 1hTask blockedCheck ProcessTimers
100% indexedHealthy statusNo action needed

Indexing Operations

Force Reindexing

Button: "Force reindexing"

Function: Launches a global index update

Types:

Incremental indexing:

  • Indexes only new or modified documents
  • Fast (a few minutes)
  • Usage: Daily updates

Complete indexing:

  • Rebuilds the index from scratch
  • Slow (several hours on large volumes)
  • Usage: After major configuration changes

When to use it:

  • After modifying indexed fields
  • After bulk imports
  • If documents cannot be found
  • After enabling attachment indexing
  • Following an indexing error

Procedure:

  1. Choose the type (incremental vs complete)
  2. Click "Force"
  3. Wait for completion (success message)
  4. Check the statistics

Precautions:

  • Perform during off-peak hours
  • May temporarily slow down the application
  • For large volumes (>10000 docs), schedule outside production

Index Optimization

Button: "Optimize index"

Function: Defragments and speeds up searches

Principle:

  • Indexing tends to fragment the index over time
  • Optimization consolidates segments
  • Improves search performance by 20 to 50%

Recommended frequency:

  • Small databases (< 10,000 documents): Monthly
  • Medium databases (10,000 - 100,000): Weekly
  • Large databases (> 100,000): Daily or via scheduled task

Procedure:

  1. Click "Optimize"
  2. Wait for completion (may take several minutes)
  3. Check the success message
  4. Test search performance

Automatic scheduling:

  • Configure via AdminTools > ProcessTimers
  • Create an "Index optimization" task
  • Schedule during nighttime

Explore Document Index

Function: Allows technical verification of indexed content for a specific document

Usage: Debug why a document is not found

Procedure:

  1. Enter the document identifier (ID)
  2. Click "Explore"
  3. Review the results:
    • Indexed fields
    • Content of each field
    • Extracted terms
    • Indexing status

Diagnosis:

  • If document doesn't appear: Not indexed
  • If a field is missing: Configuration needs adjustment
  • If content is empty: Extraction problem

Performance Optimization

Best Practices

1. Field targeting:

  • Index only relevant fields
  • Avoid technical fields (ID, GUID)
  • Prioritize rich textual fields

2. Attachment limitations:

  • Very large files (> 50 MB): Consider exclusion
  • Check IFilter availability
  • Monitor index size

3. Regular optimization:

  • Schedule automatic optimization
  • Frequency adapted to volume
  • Monitor performance improvements

4. Size monitoring:

  • Index > 10 GB: Urgent optimization
  • Index > 50 GB: Configuration review
  • Periodic cleanup of obsolete documents

Performance Indicators

Search time:

  • Target: < 1 second
  • If > 3 seconds: Optimization needed
  • Test with real queries

Indexing time:

  • Simple document: < 1 second
  • Document with attachments: 2-5 seconds
  • If higher: Investigation needed

Business Case: "Invisible" Document

Problem: A Quality Manager just uploaded a procedure "PRO-2024.pdf" and cannot find it in search 2 minutes later.

Diagnosis:

  1. Check indexing status:

    • Access AdminTools > AdminFulltextIndex
    • Review statistics
    • Check "Last update"
  2. Check scheduled task:

    • AdminTools > ProcessTimers
    • "Incremental indexing" task
    • Check last execution

Solution:

  1. If task hasn't run:
    • Force manual Incremental Indexing
  2. If task OK but document missing:
    • Check configuration ("Procedure" form indexed?)
    • Check filters (document excluded?)
  3. Wait for processing to complete
  4. Test the search

Result: Document appears instantly

Context: The Quality department receives dozens of material certificates in PDF daily. Technicians need to find a certificate by searching for the batch number inside the PDF.

Configuration:

  1. FullText search setup:

    • "Material certificate" form
    • ☑ Index attachments
    • Fields: Title, Reference, Supplier, Date
  2. IFilters verification:

    • Verify PDF IFilters are installed
    • Test text extraction
  3. Complete rebuild:

    • AdminTools > AdminFulltextIndex
    • Launch Complete reindexing
    • Wait (may take 1-2h depending on volume)
  4. Testing:

    • Search for a known batch number
    • Verify the certificate appears
    • Test with multiple terms

Result: Users can now type "LOT-2024-A456" and instantly find the corresponding certificate, even if this number only appears in the PDF.

Preventive Maintenance

Regular Controls

Daily:

  • Verify indexing task is running
  • Check indexing errors

Weekly:

  • Check statistics (indexed documents vs total)
  • Run optimization
  • Test search performance

Monthly:

  • Analyze index size
  • Clean up obsolete documents
  • Check configuration (new forms?)
  • Export statistics

Quarterly:

  • Complete configuration review
  • Adjust indexed fields
  • Train users on advanced search
  • Performance analysis

Alerts

Configure alerts for:

  • Indexing task failing > 3 times
  • Index > 90% of documents for > 24h
  • Index size > critical threshold
  • Search time > 5 seconds

Troubleshooting

Non-indexed Documents

Symptoms: Some documents are not findable

Diagnosis:

  1. Check configuration (form indexed?)
  2. Check filters (document excluded?)
  3. Explore specific document index
  4. Review indexing logs

Solutions:

  • Adjust configuration
  • Force reindexing
  • Check file access permissions

Slow Indexing

Symptoms: Indexing takes a long time

Causes:

  • Large attachments
  • Slow IFilters
  • Fragmented database
  • Insufficient server resources

Solutions:

  • Limit indexed file sizes
  • Optimize database
  • Schedule indexing during off-peak hours
  • Increase server resources

Slow Searches

Symptoms: Results take > 5 seconds to display

Causes:

  • Non-optimized index
  • Fragmented index
  • Overly broad query

Solutions:

  • Run optimization
  • Reduce number of indexed fields
  • Refine user queries

IFilters

What is an IFilter?

An IFilter is a Windows component that extracts text from specific files for indexing.

Required IFilters:

  • PDF: Adobe PDF IFilter or equivalent
  • Office: Included in Office (docx, xlsx, pptx)
  • Legacy Office: IFilter for doc, xls, ppt
  • Others: As needed (CAD, images with OCR, etc.)

Installation: See dedicated article on IFilters

Verification

Test extraction:

  1. Create a test document with an attachment
  2. Force indexing
  3. Explore the document index
  4. Verify attachment content is present

If failure:

  • IFilter not installed
  • Incompatible IFilter
  • Corrupted file

Support

For advanced configurations (distributed index, replication, performance on very large volumes), contact Avanteam support.