Streaming
Some LLMs provide a streaming response. This means that instead of waiting for the entire response to be returned, you can start processing it as soon as it's available. This is useful if you want to display the response to the user as it's being generated, or if you want to process the response as it's being generated.
Using .stream()
The easiest way to stream is to use the .stream() method. This returns an readable stream that you can also iterate over:
- npm
- Yarn
- pnpm
npm install @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
import { OpenAI } from "@langchain/openai";
const model = new OpenAI({
  maxTokens: 25,
});
const stream = await model.stream("Tell me a joke.");
for await (const chunk of stream) {
  console.log(chunk);
}
/*
Q
:
 What
 did
 the
 fish
 say
 when
 it
 hit
 the
 wall
?
A
:
 Dam
!
*/
API Reference:
- OpenAI from @langchain/openai
For models that do not support streaming, the entire response will be returned as a single chunk.
Using a callback handler
You can also use a CallbackHandler like so:
import { OpenAI } from "@langchain/openai";
// To enable streaming, we pass in `streaming: true` to the LLM constructor.
// Additionally, we pass in a handler for the `handleLLMNewToken` event.
const model = new OpenAI({
  maxTokens: 25,
  streaming: true,
});
const response = await model.invoke("Tell me a joke.", {
  callbacks: [
    {
      handleLLMNewToken(token: string) {
        console.log({ token });
      },
    },
  ],
});
console.log(response);
/*
{ token: '\n' }
{ token: '\n' }
{ token: 'Q' }
{ token: ':' }
{ token: ' Why' }
{ token: ' did' }
{ token: ' the' }
{ token: ' chicken' }
{ token: ' cross' }
{ token: ' the' }
{ token: ' playground' }
{ token: '?' }
{ token: '\n' }
{ token: 'A' }
{ token: ':' }
{ token: ' To' }
{ token: ' get' }
{ token: ' to' }
{ token: ' the' }
{ token: ' other' }
{ token: ' slide' }
{ token: '.' }
Q: Why did the chicken cross the playground?
A: To get to the other slide.
*/
API Reference:
- OpenAI from @langchain/openai
We still have access to the end LLMResult if using generate. However, token_usage is not currently supported for streaming.