How to build Embedded AI Analyst for Enterprise. Part 1. Architecture

Are you looking forward to AI creating analytics and businesses utilizing the insights derived from them? The pervasive sentiment, “Our jobs are being taken by AI” or “Gen AI is taking over everything” is increasingly common to hear in the discussions about the future of work. While the rise of AI is undeniable, it has also highlighted a significant challenge for organizations which is the issue of data processing because despite the advancements in AI, business users within organizations still find working with data challenging. This creates a substantial gap in business activities, as the dependency on the tech and data team for analytical insights remains.

The AI Analyst is a high-tech, interactive tool that streamlines data analysis and reporting. It delivers real-time insights and updates, and can easily integrate with a variety of platforms. Operating akin to a high-level analyst, it can create, authenticate, and modify reports while engaging with organizational data.

AI Analyst is the next-gen and delivers on the promises of automation by combining RPAs and no-code platforms, Datrics, is working diligently to eliminate this dependency. The goal is to empower businesses to automatically generate business insights, without the need for extensive technical knowledge or a dedicated data team.

Datrics is a Next-Gen AI- Data Driven Intelligence Platform that empowers business teams and analysts to perform data exploration, derive insights, and build data/AI applications without the need for coding.

In this article, Kirill Kirikov, the CTO of Datrics, provides a unique perspective, drawing from his experiences and lessons learned building and launching AI Analyst for early adopters. He recognizes that the transition to AI-driven analytics is not without its challenges, but believes in the potential of AI to revolutionize the way businesses operate nowadays.

Let’s dive into the article

Outline:

Intro
Foundational Architecture Principles
Early-stage decisions
Assistant Architecture
Input guardrails
Output guardrails
Integrations
Conclusion

Introduction

With businesses contending with escalating data volumes and operational complexities, the AI Analyst becomes a crucial asset, providing specialized expertise to navigate challenges. Its significance lies in bridging the gap for business teams dealing with vast data sets and diverse sources. In essence, the AI Analyst enhances efficiency and decision-making in contemporary, data-driven environments.

The critical role of AI Analysts is becoming clearer than ever now, as businesses are looking to harness the power of Artificial Intelligence to remain competitive and relevant in an increasingly dynamic digital world.

This article delves into the development of an enterprise-level AI Data Analyst. Such a system demands a scalable computing infrastructure capable of conducting hundreds of data analysis operations concurrently. It requires seamless integration with various databases, ensuring that data from disparate sources can be accessed and utilized efficiently. Additionally, an effective AI Data Analyst architecture must abstract away technical complexities like handling missing values or varying data types, making the tool more accessible to business users. We will explore how to architect a solution that meets these challenges, providing valuable insights into building a robust and scalable AI Data Analyst for enterprise applications.

Foundational Architecture Principles

This document delineates the architecture of an AI Data Analyst, akin to enterprise-grade knowledge assistants, yet distinct in key aspects. The AI Analyst is engineered for relentless, 24/7 data operations, underpinned by a suite of features: continuous operational capability, dynamic scalability in response to data volume fluctuations, computation reliability, exemplary code quality, and stringent security and access control measures.

To meet these stringent operational standards, the AI Analyst rests on pivotal architecture five principles:

Scalability: Designed to ensure that the computational infrastructure robustly scales, handling the fluctuating data volumes needed for timely and accurate responses.
Security: Committed to data safety, this principle embeds multi-layered encryption and stringent access controls, addressing the critical need for information security in the modern era.
Transparency: Focused on providing clear insights into system operations, usage metrics, and cost implications, thereby enhancing user understanding and trust in the system.
Modularity: Emphasizing the ease of upgrades and modifications, this principle ensures the AI Analyst remains abreast of technological advancements through a plug-and-play architecture.
Compatibility: Ensuring that the AI Data Analyst seamlessly integrates with modern enterprise data stacks, such as data warehouses like Snowflake or Athena, and scheduling frameworks like Airflow, enhancing its utility and applicability in varied enterprise environments.

By following these principles, the AI Data Analyst emerges as a robust, adaptable solution, poised to address the dynamic needs of enterprise data analysis.

Early-Stage Decisions Enhancing AI Analysts’ Foundational Principles

Emphasis on Quality: This approach focuses on quality rather than initial cost savings. More investment in development stages, like token usage and robust system construction, leads to enhanced performance and long-term efficiency.
Adaptable LLM Models: The system is designed for easy LLM model switching to keep up with rapid advancements, notably in code generation. It allows for quick model updates and meets various business security requirements.
Prioritizing In-Memory Computation: The system centers on fast, in-memory computations, with the capability to switch to other methods like Spark or Snowpart for external data processing, maintaining flexibility without major platform changes.
Container-Based Architectural Choice: The architecture uses docker containers for their scalability and adaptability in diverse environments, from multiple machines to single-machine setups. Kubernetes integration ensures scalability and cloud-agnostic capabilities, responding quickly to different customer requirements.
Hybrid LLM Strategy: Initially employing a service-based LLM, the system is also evolving to include custom open-source software models for scenario planning and code generation. This dual strategy is especially valuable for sectors requiring high data security, like finance, offering options such as pure service-based LLM, a combination of proprietary LLM for scenario planning with service-based LLM for code generation, or fully proprietary LLM use. [More details here]
Precreated ETL over Direct SQL Queries: To simplify the process and reduce errors, the system uses precreated ETL processes instead of direct SQL queries. Direct access to SQL databases adds complexity and potential failure points in both code and SQL generation. By aggregating data into fewer tables through ETL, the system minimizes these risks, streamlining the data handling process and shortening the feedback and correction cycle. ETL can be removed in the future.

AI Analyst Layered architecture

Data Layer

The data layer includes an ETL process that reformats data for easier understanding by the AI agent. It combines a data warehouse with enterprise data or various databases merged through an ETL process into a singular database accessible to the Agent. While ETL provides a workaround for direct database queries, it limits granular data access control and does not support row-based access rights. To balance security and data duplication, multiple ETL processes with differing access rights are implemented in the early stages.

Agent algorithm to work with multiple data tables will be described in a separate article (coming soon)

Knowledge Layer

The Knowledge Layer is crucial to the AI Analyst’s “enterprise-specific” intelligence. It features a vital vector store, containing scenarios, code examples, their embeddings, and metadata. This vector store plays a key role in enabling semantic similarity searches across large data volumes, ensuring the system’s performance scales effectively. Importantly, this store continuously evolves, incorporating new examples as the AI Analyst addresses more queries.

Additionally, the Knowledge Layer holds vital data information, including metadata for tables, column descriptions, details on categorical and missing values, and the relationships between tables. This rich repository of information is essential for the AI Analyst to accurately respond to user inquiries. These aspects are elaborated in Part 2 of our discussion.

Moreover, the layer maintains a comprehensive record of user-agent conversations. This complete context allows the AI Analyst to provide answers not only based on initial data from the data warehouse (DWH) but also on data or visualizations generated in response to earlier questions.

Another crucial function of the Knowledge Layer is storing feedback on previous answers. This feedback, analyzed manually by data analysts, is instrumental in enriching the AI Analyst’s capabilities. It contributes to refining examples for few-shot and chain-of-thought approaches and fine-tuning the Datrics LLM. This continuous learning and adaptation process is central to enhancing the AI Analyst’s precision and relevance in enterprise contexts.

LLM Layer

The LLM Layer, the brain of the AI Analyst, manages language model requests. At its core is the LLM API Gateway, connecting with various LLM vendors for flexibility — switching between services like ChatGPT for code generation or Datrics LLM for classification. This adaptability extends to replacing Open AI’s ChatGPT with Microsoft Azure’s version for specific installations.

The LLM API Gateway tracks usage costs (tokens, subscriptions) for budget management and logs interactions for insights. This layer ensures strong language processing, connecting with other layers using different replaceable LLMs.

Application Layer: Orchestrating AI Analyst’s Response

The Application Layer serves as the orchestrator, comprising the API and the agent’s logic. It receives requests from applications like browser apps or chatbots (Slack or MS Teams) and navigates through multiple steps, from intent recognition to response formation, connecting all backend rails.

Backend

Importantly, this layer doesn’t handle data directly. Data transformation, a resource-intensive task, is separated to ensure efficient scaling. As outlined in the early-stage decisions, initial in-memory data processing can evolve to modules executing SQL queries or running Spark tasks. Thus, the API and agent logic are decoupled from the data itself.

The agent logic unfolds through several steps:

1. Request Handler

The Application Layer of the AI Analyst activates upon receiving a user query through an API. The Request Handler manages chat interactions and retrieves the last few messages in a conversation from the Conversation History Store. Additionally, the Request Handler loads the current configurations for the LLMs used in the application.

2. Input guardrails

Once the necessary database operations are completed, the Input Guardrails apply. In any specific context, input guardrails encompass a selection of policies, business rules, and validations, and are designed to ensure that incoming requests meet predefined criteria and may proceed. The primary objective of these guardrails is to prevent users from using the system in ways that deviate from its intended purpose. For instance, in the scenario of a flight booking app, customers should not have the capability to inquire about other passengers beyond the scope of their valid booking.

Guardrails are essentially a stack of functions arranged in a predetermined order. Each function evaluates the incoming request input and its metadata and takes one of three possible actions. The possible actions include “pass” indicating that the guardrail approves the request without any issues; “update” which suggests the request requires modification before being allowed to pass; and “reject” signaling that the request failed the guardrail and cannot continue for processing, this terminates the process and returns a rejection reason to the requester. This approach ensures that requests that fail the guardrails are rejected early and if requests require modifications before being shared further then this is handled appropriately, this not only ensures adherences to intended use cases but also efficiently processes incoming requests for maximum reliability.

The next step is to understand the context of the question

3. Context Enrichment

The component is needed for understanding the question’s context, connecting to initial or previous answers, and enriching prompts with metadata from the knowledge layer. It’s crucial to understand what data the question relates to if it’s initial data in the DWH or the answer to the previous question or the attached file.

This component is crucial also for adapting domain-specific terminology to enhance later retrieval processes. In many industries and businesses, queries include niche jargon, abbreviations, and phrases unique to the industry. These can be obscure or have different meanings in general language. To address this, the implementation should include a comprehensive glossary of company-specific terms and abbreviations. This glossary “translates” and modifies user queries for optimal retrieval. For instance, it could remove trailing punctuation and expand acronyms in the queries (e.g. “MVP” is reformulated as “MVP (Minimum Viable Product)”). Such alterations significantly boost the retrieval effectiveness of proprietary data.

4. Intent Recognition

Identifying the user’s intent — whether the answer involves text, structured data, or charts. Datrics LLM and OpenAI Chat GPT play key roles in generating response plans. LLMs are much better when they think out loud, that’s why the good approach is to create a plan for answering the question. This plan can include code generation and the use of pre-built functions (it can speed up the code generation and give more stability).

5. Code Generation

Generating code aligned with the planned answer, using OpenAI’s Chat GPT or its Azure-hosted version in the initial version.

6. Execution

During the execution phase, it’s crucial to note that the code generated in the preceding step is error-free, ensuring its application to a chunk of data. This chunk, comprising 10,000 rows, serves as a stratified sample, typically sufficient for code testing. If the code encounters errors during execution, these errors are collected, prompting the agent to retry the code generation step with the received feedback. This iterative process allows for up to 5 retries, a configurable value. The significance lies in executing data processing within a sandbox container and optimizing resource management by allocating ample RAM or CPU for the container within the Kubernetes cluster. Moreover, this approach lays the foundation for future integration with third-party systems, such as Spark, or executing SQL queries on the Data Warehouse, without necessitating changes to the agent’s code.

7. Summary Generation

In generating summaries, considering that the AI Analyst is a valuable tool for business users, their primary focus is on the business results rather than technical details. Hence, it becomes crucial to transform the answer into a format comprehensible to business users, such as a text summary or a presentation. Notably, due to the inability to send data directly to the LLM, the approach involves employing code templates for summary generation. This ensures that the insights provided align with the business context and are readily understandable by the intended audience.

8. Artifacts Saving

Storing results, including text, charts, and datasets or Excel files, on block storage. Minio — is a great tool that allows sharing data between containers, is cloud-agnostic, can work as a proxy for GCS or S3, or can be a standalone block storage.

9. Response Formation:

Sending the response with the summary and links to artifacts back to the client. To enhance user experience, technical details are streamed to the client while the agent works.

This intricate process ensures that the AI Analyst not only generates technically sound results but also presents them in a comprehensible and valuable manner for business users.

Frontend

The AI Analyst’s front end is a user-friendly web interface meticulously crafted using Vue and JavaScript. The design prioritizes simplicity, enabling users to effortlessly pose questions, receive answers, and access guidelines for effective interaction with the AI Analyst. Incorporating a feedback feature is essential, providing users with a channel to contribute to refining the AI Analyst’s performance.

The modular design of the AI Analyst’s architecture is pivotal, allowing for potential frontend substitutions in the future, such as a mobile app or an instant messaging platform (eg. MS Teams of Slack). This adaptability stems from backend interactions occurring through APIs, facilitating seamless integration with various frontends while upholding consistent functionality and user experience.

As users engage with the data, the frontend must offer robust data preview functionality, featuring sorting, filtering, and paging capabilities. Interactive charts with zoom-in, and zoom-out functionalities, and agendas should be seamlessly integrated to enhance data visualization. Additionally, users should have the capability to quote artifacts from previous messages, providing a clearer context for data-related questions or enabling users to perform joint operations on the results of multiple past answers. This feature enhances user understanding and facilitates more intricate data interactions within the front end.

Reporting layer

The Reporting Layer plays a pivotal role in the architecture of the AI Analyst, offering transparency across critical dimensions: costs, usage, and data analytics. Meticulously crafted, this layer provides a holistic view of the AI Analyst’s operational dynamics, serving as an indispensable tool for management and continuous enhancement.

A primary function of the Reporting Layer is cost analysis, meticulously tracking and analyzing expenses tied to the AI Analyst’s operations. This includes costs related to token consumption by LLMs, data processing, and other computational resources. By providing detailed insights into these expenditures, the Reporting Layer facilitates effective budget management and identifies opportunities for cost optimization.

Another critical facet is usage monitoring, keeping a vigilant eye on how the AI Analyst is utilized across the organization. This monitoring encompasses various metrics, such as user interactions, peak usage times, and query types processed. Understanding these usage patterns is crucial for scaling the AI Analyst effectively and ensuring it aligns with the evolving needs of the enterprise.

In addition, the Reporting Layer delves into data analytics, offering a comprehensive examination of the AI Analyst’s performance and effectiveness. This includes analyzing response accuracy, user satisfaction, and overall operational efficiency. Such analytics play a pivotal role in guiding future improvements, ensuring the AI Analyst remains at the forefront as a cutting-edge tool for the enterprise.

Output Guardrails

After receiving the final chat completion from the LLM, the next step post-processes it in the Output Handler. It is typical to find that no matter how carefully you engineer a prompt and steps in advance of the model, there always remains a final risk of hallucinations, and undesirable information being shown to users8. To mitigate this risk, there should be a set of Output Guardrails in place. These are a set of asynchronously executed checks on the model’s response that include a content filter and a hallucination detector. The content filter detects and removes biased and harmful language as well as removing any personal identifiable information (PII). The hallucination detector checks whether there is any information in the response that is not given in the retrieved context. Both guardrails are based on LLMs. Besides mitigating risk, they also inform future development and troubleshooting efforts.

Feedback loop

The AI Analyst must grow alongside the enterprise it serves, which necessitates the addition of new knowledge to its database. When a user finds an answer helpful and clicks the “thumbs up” button, that answer must be integrated into the knowledge database. However, caution must be exercised because adding new answers can impact overall quality. Therefore, it is a technical requirement to initially introduce new entries into the knowledge base manually and conduct testing, with Langsmith serving as a suitable framework.

Furthermore, it is imperative to maintain visibility into the AI Analyst’s overall performance. To achieve this, the implementation of performance metrics is a technical requirement. These metrics encompass factors such as the percentage of correct answers, the number of code generation errors, response time, average token usage, and the cost per answer. Regular monitoring of these metrics daily is a technical necessity to ensure optimal performance.

Summary of Considerations

We summarize some of the considerations covered earlier in this article.

Computation Platform: The importance of a scalable computation platform cannot be overstated, as it enables the concurrent execution of numerous data pipelines without conflicts.
Guardrails: The implementation of input/output guardrails is vital for risk mitigation and upholding the reputation of AI applications in enterprise environments. These guardrails should be adaptable to meet the organization’s risk criteria.
Configuration Database: Maintaining a database table to track LLM configurations facilitates efficient monitoring, potential rollbacks, and the association of specific model versions with user feedback and errors.
Search Optimization: Fine-tuning the search algorithm involves a combination of semantic similarity search algorithms, precise keyword matching, and metadata filtering. Continuous optimization, based on user feedback, is essential to achieve the right balance of quantity, quality, and source diversity in search results.
Effective Prompting: Collaborative prompt engineering with users or experts is a key factor in the success of an AI application.
Controlling LLMs: Introducing features like intent recognition or a similar deterministic split enhances control flow, providing developers with greater control over the behavior of LLM applications.
Data Preparation: Implementing an ETL process significantly enhances the quality of answers. Additionally, detailed table descriptions, connections, and information about categories and missing values contribute to improved answer quality.
Domain Knowledge: Incorporating specific industry or company knowledge, whether through glossaries, prompt engineering, or fine-tuning, is essential for LLMs to understand domain-specific information effectively.
UX and UI: Ensuring a user-friendly interface with features for quoting text, pointing to data, or referencing charts in the dialogue greatly enhances interaction with the agent, reducing errors and improving the overall user experience.

Conclusion

This marks just the initial phase of an era where a portion of analysts’ tasks can be automated. It’s important to note that having a fleet of AI Analysts within an enterprise won’t replace human analysts but will significantly reduce the volume of routine tasks they handle. Picture a scenario where business users can receive answers to their questions from existing reports 24/7 within minutes instead of hours. Achieving this level of automation necessitates the presence of pre-created reports, continuous monitoring of metrics, and ongoing enhancement of the AI Analyst’s knowledge. These responsibilities fall on human analysts, who can then allocate more of their time to searching for valuable insights, rather than performing routine EBITDa calculations.

The early era of AI Analysts has already begun to yield significant business Return on Investment (ROI), and we project that by 2024, these tools will reach early product adoption by the areas of finance, marketing and sales. We are witnessing a paradigm shift from expensive and manual Data Science/Analytics routines to no-code/low-code platforms, and ultimately to AI Analysts — our virtual employees. Today, we have the benefit of copilots, albeit with human interaction. However, the landscape of the Analytics function is on the cusp of a significant transformation.

The end vision is a future where these virtual employees communicate with each other, akin to a virtual analyst liaising with a virtual Project Manager, or a virtual admin requesting access. It’s plausible to envision a scenario where even a virtual CEO is part of the equation. We are thrilled to be at the forefront of this exciting transition, actively shaping and building this future right now.

Want to dive deeper into the world of AI Analysts? Feel free to share your thoughts in the comments section or for more detailed information, you can reach out to us.

‍

Transforming Recruitment: How Datrics AI Analyst Revolutionized Bear Claw's BC-Agent

Revolutionizing Business Intelligence in a Leading Financial Institution through AI-Driven Insights