Introduction
In the rapidly evolving field of Conversational AI, OpenAI's groundbreaking function-calling feature, unveiled on June/23, has emerged as a game-changer.
Whether you are someone trying to learn about chatbots, a curious enthusiast looking to explore the depths of this feature, a developer seeking to optimize its usage, or someone who has encountered the context limitation in leveraging function calling with OpenAI, you've come to the right place.
In this article, we will delve into the world of function calling, discussing its fundamental concepts, implementation, and the challenges posed by token size limitations, and explore potential solutions to overcome them.
Prerequisites
Familiarity with OpenAI, ChatGPT, LLM (Large Language Model), Tokens, and TokenLimitation is recommended.
Additionally, you will need an OpenAI API key to implement or test the code on your own.
Introduction to Function Calling
Function calling within OpenAI has revolutionized the capabilities of conversational AI, empowering chatbots to go beyond generating text and enabling them to perform a wide array of tasks.
With function calling, chatbots transform into intelligent virtual assistants capable of executing diverse actions and integrating with external systems.
At its core, function calling enables chatbots to leverage external APIs or execute predefined functions, expanding their scope of operations. By defining the necessary parameters and desired outcomes, developers can harness the power of function calling to obtain structured JSON outputs from user queries.
These JSON outputs are a foundation for making suitable API or function calls, unlocking a world of possibilities.
Imagine asking the chatbot about the current weather and witnessing it seamlessly fetch data from a third-party weather API. or requesting a summary of your email. The chatbot would retrieve and process your emails, providing you with the required overview.
For enterprises, the chatbot could become a one-stop solution for your customers to inquire about their credit scores, bills, and more. The potential applications are limitless.
Function Call Implementation (Python)
I have skipped various implementation details here, specifics of which can be found in the provided Python notebook link.
Let's start by making external API functions and JSON Schema function descriptions to call later.
Now, let's handle OpenAI's API call to process user queries using LLM:
Here is a sample conversation:
The conversation size significantly increases with repeated calls and function descriptions.
For instance, this example conversation contains about 152 tokens.
Additionally, both function descriptions occupy around 330 tokens (in the original code). With just two to three successive questions, the token count reaches about 1k, leading to a rapidly growing context window that needs effective management.
Moreover, integrating numerous services with extensive function descriptions poses a challenge that requires careful consideration and handling.
Effective Management of Context and Services
You can consider several solutions to address the aforementioned issue:
- Limiting the number of functions and function description size (definitely not what we want).
- Employing a more advanced model with a higher context window (e.g., 8k, 16k, 32k token window options).
- Utilizing embeddings for finding the most relevant function descriptions and then using this subset.
We will explore the third option in detail, which involves leveraging another model (word embedding models) to efficiently manage function descriptions and token size.
Function Calling Optimized with Word Embedding
Embedding
Embedding represents data in a vector space, where semantically (of similar meaning) similar data is stored in close proximity.
I won't go into the details. The article (links provided below) has done a great job explaining what embedding is, its capabilities, and how to utilize it effectively.
Consider This Example for a Simple Understanding
Imagine representing any line of text in a 2D plane as a point. Texts with similar meanings would be grouped closely, while those with different meanings would be farther apart.
This concept is known as embedding. Its practical applications include finding the most relevant text to a given query by representing the query in the same 2D plane, identifying its closest points, and then tracing back the texts that these points represent.
This technology is utilized in search engines to gather relevant pages matching your query, to create personalized assistants based on your custom dataset, and in PrivateGPT.
Utilizing Embeddings for our Use Case
We begin by embedding our function descriptions in a vector space (remember to keep all your functions distinct, as similar functions can confuse the model).
When a user poses a query, we embed it into the same vector space, allowing us to identify the closest points to the query's embedding. We then collect the corresponding function descriptions of these points and utilize only this smaller set to be processed by GPT in response to the user's query.
This approach reduces the token size by utilizing only relevant function descriptions rather than including all descriptions each time.
Implementation of an Optimized Function Call with Word Embedding (Python):
You can refer to the information here. Let's start by making external API functions to call later. Also, embed these functions in embedding vector space.
Now, here's the code to find relevant functions from embedding vector space based on user query (it uses cosine distance between two vectors):
Now let's handle OpenAI's API call to process user queries using LLM (optimized with Embedding model):
Here is a sample conversation (focus on the relevant function and see how it is similar to the user query. In this OpenAI API call, only this one relevant function description was sent with the user query, not all):
Pros
Here, let me list down some pros of using embeddings:
- Obviously, the token limit is no longer an issue, at least for function descriptions
- Word embedding API is much cheaper than GPT model API, and though there is a token limit to embedding API, it's scarce to have a single function description that big
- Then since we send less text in the API call to GPT, it's relatively faster to process.
Wrapping Up
In this blog, we have explored the revolutionary function-calling feature offered by OpenAI, which empowers chatbots with limitless capabilities in Conversational AI and integrated services.
However, as we delved into the implementation details, we encountered challenges related to token size limitations, leading to rapidly growing context windows and the need for effective management.
To address this issue, we explored the utilization of embeddings to optimize function calling.
Embeddings represent data in a vector space, where semantically similar data is stored close together.
By embedding function descriptions and user queries into the same vector space, we could efficiently identify relevant function descriptions and significantly reduce the token size used during LLM processing.
This optimized approach ensures that only relevant function descriptions are processed, allowing chatbots to handle diverse tasks and integrate with numerous external Services/APIs/Functions without overwhelming token constraints.
Appendix and Helpful Links
- Google Colaboratory
- Edit description
- colab.research.google.com
- OpenAI Platform
- Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.
- platform.openai.com
- openai-cookbook/examples/How_to_call_functions_with_chat_models.ipynb at main ·…
- Examples and guides for using the OpenAI API. Contribute to openai/openai-cookbook development by creating an account…
- github.com
- openai-cookbook/examples/Embedding_Wikipedia_articles_for_search.ipynb at main ·…
- Examples and guides for using the OpenAI API. Contribute to openai/openai-cookbook development by creating an account…
- github.com
- Embeddings | Machine Learning | Google for Developers
- An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings…
- developers.google.com