Streaming vs Blocking Conversational UIs with the Vercel AI SDK

Should you use a streaming or blocking interface? What are the tradeoffs?

Streaming vs Blocking Conversational UIs with the Vercel AI SDK
Comparing blocking vs streaming UIs

Conversational user interfaces like chatbots have exploded in popularity recently thanks to large language models like GPT-3. However, these powerful models can be slow to generate responses, especially long ones. This poses challenges when building real-time conversational UIs.

Subscribe or follow me on Twitter for more content like this!

Should you use a streaming or blocking interface? What are the tradeoffs? In this post, we'll explore these questions and how to implement both types of UIs using the Vercel AI SDK.

The Problem with Blocking Interfaces

A blocking conversational UI waits for the full AI-generated response before displaying anything to the user. This can mean staring at loading indicators for many seconds before seeing the result.

Here's an example blocking interface prompting GPT-3 for the first 200 characters of Harry Potter:

Blocking approach to AI response.
Gif showing the impact of a blocking UI approach. Note the pause and then flash when submitted.

The user has to wait over 10 seconds before seeing any output. This poor experience is unacceptable for conversational products.

Streaming to the Rescue

With a streaming interface, parts of the AI response are sent to the user as soon as they are generated. This provides a much more natural conversational feel.

Let's look at that same Harry Potter example with a streaming interface:

Gif showing streaming approach.
Gif showing a streaming UI approach. The result is spelled out as it's generated. The user doesn't have to wait very long to start seeing the results of the button click.

Now the output starts appearing after just a second or two, with more being appended in real-time as it's generated.

Streaming conversational interfaces lead to much better user experiences. The Vercel AI SDK is designed to make building them easy.

Guide: Implementing Streaming-based UI with Vercel AI SDK

Let's look at a step-by-step guide to implementing streaming in a test Next.js app. We'll use a chatbot in this example

Step 1: Set Up Your Next.js App

Start by creating a new Next.js application and installing the necessary dependencies. In your project directory, use the following commands:

npx create-next-app my-ai-app
cd my-ai-app
npm install ai openai

Step 2: Add Your OpenAI API Key

OpenAI's API Key is obviously pretty crucial for accessing the language model! Create a .env file in your project root and add your OpenAI API Key:

OPENAI_API_KEY=your_api_key_here

Step 3: Create a Route Handler

Create a Next.js Route Handler that will utilize the Edge Runtime to generate a chat completion via OpenAI. This completion will be streamed back to your Next.js application. Create a file named route.ts in the app/api/chat directory:

import OpenAI from 'openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export const runtime = 'edge';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const response = await openai.chat.completions.create({
    model: 'gpt-3.5-turbo',
    stream: true,
    messages,
  });

  const stream = OpenAIStream(response);

  return new StreamingTextResponse(stream);
}

Step 4: Wire Up the UI

Create a UI component that interacts with the streaming-based completion. Use the useChat hook provided by the ai/react package to simplify UI integration. This component will allow users to send messages and view the chat history in real-time:

import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <main className="mx-auto w-full h-screen max-w-lg p-24 flex flex-col">
      <section className="mb-auto m">
        {messages.map(m => (
          <div className="mb-4" key={m.id}>
            {m.role === 'user' ? 'User: ' : 'AI: '}
            {m.content}
          </div>
        ))}
      </section>
      <form className="flex space-x-4" onSubmit={handleSubmit}>
        <input
          className="rounded-md p-2 text-black"
          value={input}
          onChange={handleInputChange}
          placeholder="Say something..."
        />
        <button
          className="border-solid border-2 border-white p-2 rounded-md"
          type="submit"
        >
          Send
        </button>
      </form>
    </main>
  );
}

By following these steps, you can seamlessly implement a streaming-based UI for your example chatbot using the Vercel AI SDK. This approach enhances user experiences by providing quicker and more natural responses in real time, resulting in improved user satisfaction (we hope!).

When to Avoid Streaming

While streaming UIs improve experiences with large, slow models, they aren't always necessary. If you can satisfy your use case with a smaller, faster model, a blocking interface may suffice.

Streaming also introduces some extra complexity on the front end to smoothly handle the incremental data. So if streaming isn't needed to achieve quick response times, it's often best avoided.

Putting It All Together

In summary:

  • Blocking conversational UIs suffer from poor loading experiences as users wait for responses.
  • Streaming UIs improve this by sending partial responses as they are generated.
  • The Vercel AI SDK makes building streaming interfaces easy.
  • But streaming may be overkill depending on your model and use case.

I hope this overview helps you decide how to architect the best conversational interface! The Vercel AI SDK provides the flexibility to implement both streaming and blocking designs.

Subscribe or follow me on Twitter for more content like this!