Fieldbox automates requirement extraction from heterogeneous documents

The challenge

 

In the infrastructure and construction sector, managing large quantities of data and handling effectively responses to calls for tender are essential to success. Egis, one of the world’s leading consultancy, engineering and construction services companies, faced significant challenges in managing project requirements due to the manual extraction of information from various document formats such as PDFs. This labor-intensive process involved over 160 project managers and represented over 1,000 working days per year, just to read, understand and classify requirements. The lack of a centralized information system compounded these difficulties, preventing the reuse of information and the standardization of processes.

 

Fieldbox proposition

 

Recognizing these challenges, Fieldbox has partnered with Egis to drive its digital transformation through AI-based solutions. The cornerstone of this partnership is the development of an AI service based on automating the detection and classification of programmatic requirements from heterogeneous files, either with a Large Language Model (LLM), or with other well-established Natural Language Processing (NLP) techniques.

The traditional NLP-based classification model was trained using text extracted from PDF documents, validated by industry experts, with a dataset of 4,500 sentences across various categories. The LLM-based classification model used a subset of this data, and was trained by few-shot learning, e.g. including examples in the prompt.
A hold-out set of 20% was reserved for testing, confirming the effectiveness of both models with a top-3 precision of 80%, ensuring productive use in the operational environment.

Several difficulties were addressed during the implementation:

  • Handling diverse sources and document formats: extracting information from varied document formats, including PDFs with images and tables, posed significant challenges.
  • Class labeling and performance metrics: deciding on the number of classes and choosing appropriate performance indicators were critical for ensuring effective categorization.
  • Tool and language limitations: the limited availability of NLP tooling for processing French text, which is less represented than English, required innovative solutions.

 

Results and benefits

 

The introduction of AI techniques and dedicated classification models embedded within a business application has revolutionized these processes for Egis, leading to several transformative outcomes:

  •   Accelerated Document Review Processes
      The AI-driven tools have significantly expedited the review of voluminous and technical documents.
  •   Standardization of Practices
      The adoption of a uniform approach across the group enhances consistency and efficiency.
  •   Market Expansion
      Improved capabilities have enabled Egis to pursue new markets with enhanced competitiveness.
  •   Domain Flexibility
      LLMs simplify data preparation, enabling easier domain expansion and on-demand category addition.

Three core strategies drove the project’s success: 

  • User-Centric Tool Design: focused on ergonomic and intuitive use, enhancing user engagement and adoption. Rigorous
  • Project Methodologies: implemented proven frameworks like Design Thinking and SCRUM to ensure projects stayed on time and within budget.
  • Effective Change Management: comprehensive support for user adoption, including training and ongoing assistance, facilitated a smooth transition to the new systems.

Through its partnership with Egis, Fieldbox has not only addressed significant operational challenges in the construction and engineering industry but also set a new standard for the integration of AI in enhancing business processes. This case study exemplifies how targeted digital transformation strategies, driven by advanced generative AI services, can lead to substantial improvements in efficiency, accuracy, and market competitiveness. It also highlights the trade-offs when choosing between modern LLMs and more established NLP approaches.