A Custom Chatbot Leveraging GPT Capabilities

Keer terug naar blogs

A Custom Chatbot Leveraging GPT Capabilities

By Diego Olaya – article on Medium

The most recent ChatGPT chatbot, developed by OpenAI, has demonstrated the ability of AI to empower and support a diverse spectrum of users across various applications. Use cases vary from enriching the learning experience at all educational levels, supporting programmers with debugging and code explanations, to enabling content creators to craft immersive and captivating narratives. This article aims to bridge the gap between the often technical discussions about ChatGPT and the practical needs of businesses. To accomplish this, I first present the fundamentals of ChatGPT and show a custom application that leverages its capabilities for a specific case. By the end of this article, you will have a comprehensive understanding of ChatGPT and how it can be adapted to suit your unique requirements.

Understanding ChatGPT

Large language models (LLMs) have become an essential component of modern chatbots and virtual assistant technologies due to their ability to analyse and understand natural language. In essence, a language model is a probabilistic technique trained on large volumes of text data with the purpose of learning patterns within language, such as syntax and semantics. Among the modelling strategies that have been proposed, Neural Network architectures such as Recurrent Neural Networks and transformers have led to outstanding results. In particular, the transformer architecture achieves superior performance while maintaining lower computational costs thanks to its encoder-decoder structure and self-attention mechanisms.

Generative Pre-trained Transformer (GPT) models, such as the ChatGPT, use the transformer architecture and a reinforcement learning mechanism in their training. Although these models leverage the same architecture, they differ in their intended usage. GPT models are general purpose LLMs. They are trained for a broad range of natural language processing tasks including text classification, question-answering, and text summarisation. Whereas, the ChatGPT model was specifically trained to hold conversations, making it well-suited for chatbot applications. It is worth noting that the GPT-3 model is one of the largest LLMs to date, with 175 billion parameters, compared to the 117 million parameters of the ChatGPT.

All put together, the ChatGPT model is a LLM built on the basis of a transformer architecture and reinforcement learning from human feedback, specially trained for text-based conversional applications such as chatbots and dialogue systems. Hence, it does not capture the wider range of linguistic phenomena and cannot support a larger set of language tasks unlike general purpose GPT models. Moreover, the model’s training is sourced from a vast amount of written content from the internet, up until 2021. As a result, any information requested beyond this period cannot be provided by the model.

Creating a custom chatbot with GPT using your own data

As aforementioned, GPT models rely on pre-defined datasets during training. Since the training data does not necessarily contain information that can be directly applied to a particular context, customisation is required. This allows the models to learn and adapt to the unique context of an organisation. As a result, the models will generate more accurate and context-specific responses.

At present, the GPT-3.5 model cannot be fine-tuned, which restricts the extent to which it can be customised. However, other models within the GPT family can be tailored in any of the following strategies:

Using the GPT model as-is and providing context-specific information

This approach also known as Prompt Engineering consists of designing and optimising prompts with information that the model uses to formulate a response. A prompt refers to the set of instructions received by the model, which can take many forms, such as a question, statement, or command. For example, in the case of a company, specific offers or products can be used as examples to feed base GPT models with information to answer customer queries.

However, this method is limited to processing a maximum number of text chunks known as tokens, which can range in length from a character to a word (a maximum of 4096 tokens for the GPT-3.5 model). Therefore it may be impractical when processing large amounts of information spread across multiple text files, if not combined with other strategies.

Fine-tuning base GPT-3 models

Base GPT-3 models can be fine-tuned with custom data, which involves continuing its training on a smaller, context-specific dataset. This can lead to better performance on custom tasks, without needing training from scratch. Additionally, the model can learn the right information from the data, thereby eliminating the need of custom prompts. This results as well in token savings, as the prompt is reduced to the user’s query.

The process of fine-tuning involves creating a custom training dataset and launching a training job. Once the model is fine-tuned, it can be used by specifying it as a parameter within an API call. For instance, a company can use past support call centre dialogues to fine-tune a GPT-3 model to generate responses that align with its services and philosophy.

The choice of implementation strategy depends on the use case and the associated costs to each method, as requirements such as the size of the search space and utilisation vary depending on the application.

The custom chatbot case

The user is a large network of over 600 companies. This can make it challenging for sales personnel to identify the company within the group that matches a specific customer request. To address this issue and facilitate the interaction between the sales personnel and their customers, a custom chatbot was built to support sales in finding the most suitable company for each case.

Since base GPT models lack knowledge on the user’s companies, a custom chatbot application was developed with the right information on each company’s offering and core competencies. The chatbot engages in a human-like fashion with the user and provides accurate information about the companies.

Figure 1 illustrates the limitation of the base GPT-3 model (text-davinci-003) to obtain the right information. The text highlighted in green is the answer of the model to our query: “Please tell me about MbarQ”. For readers unfamiliar with MbarQ, it is a Belgian start-up that provides AI services and is not involved in the hospitality industry.

Figure 1. Base GPT-3 model response to: “Please tell me about MbarQ”.

Next, the base GPT-3 model was fed with information about MbarQ, so that it can accurately respond to our query. As shown in Figure 2, incorporating this knowledge base returns a response that closely aligns with the vision and services of MbarQ.

Figure 2. Custom knowledge base GPT-3 model response to: “Please tell me about MbarQ”.

A custom chatbot was built leveraging the capabilities of the GPT-3 model from OpenAI and Power Apps, as illustrated in Figure 3. First, it was created a dataset with descriptions related to the services and capabilities of each company. This provides the necessary context-specific information to the base model and ensures accurate responses to queries. Finally, a custom flow and app within Power Apps were implemented, in order to integrate the GPT-3 model and the custom data via a user-friendly interface. An example of the application is displayed in Figure 4.

Figure 3. Architecture of the custom chatbot.
Figure 4. App preview.

Final reflection

ChatGPT has transformed the way people interact with AI. It has brought AI closer to the general public and has inspired many people to leverage its capabilities to a wide range of applications. The base capabilities of language models like GPT are powerful and versatile. A whole new range of opportunities for companies can be unlocked with the appropriate customisation.

One should not ignore the limitations, potential risks, and biases that make AI technologies vulnerable to produce harmful content and misleading information. However, I strongly believe that stand-alone AI models, fine-tuned to meet specific business requirements, have the potential to build responsible AI solutions.

I would like to thank my colleagues at MbarQ for their valuable contributions to this article, specially to Stefan Schoonbrood, Steven Van Goidsenhoven, and Pieter van der Deen.


Meest recente blogs.