UPDATES

How I put ChatGPT right into a WYSIWYG editor


With all of the hype occurring, AI (or quite Machine Studying (ML) and Massive Language Fashions (LLMs) are in all places. Personally, I won’t use ChatGPT (and related options) a lot, however I certain do depend on likes of GitHub Copilot (for clever autocompletion in VS Code), or Grammarly (for modifying my weblog posts) day by day.

I feel we’re nonetheless fairly a couple of breakthroughs away from AGI and present expertise gained’t be sufficient to get us there (fortunately or not). That mentioned, we’re already deep into the occasions of “AI-enhanced” apps, the place the apps on the prime could not have one of the best AI programs, however they combine them in the very best manner.

That’s why it was an attention-grabbing course of, exploring OpenAI’s API and making an attempt to combine it into the Wealthy Textual content Editor (RTE) of Vrite — my open-source headless CMS.



Extending the WYSIWYG Editor

For these unfamiliar Vrite, in a nutshell, is a headless CMS for technical content material, like programming blogs or software program docs. It may be seen as two apps in a single — Kanban dashboard for content material administration and WYSIWYG editor for writing, with further dev-friendly options like embedded code snippet editor and formatter.

The most recent large addition to Vrite is an early extension system, to simply construct integrations and lengthen what Vrite can do. This, to me, appeared like the right technique to introduce ChatGPT into the editor — as an extension.



Block Actions

To have the ability to use the extension system for integrating ChatGPT into the editor, a brand new API needed to be launched. I known as it Block Motion API since its particularly meant for including fast actions to the editor, that function on top-level content material blocks, like paragraphs, headings, or photographs, as is highlighted under:

Content blocks in Vrite editor - highlighted

With Block Actions API, extensions can learn the JSON content material of the lively block and replace it with HTML-formatted content material, identical to it’s executed in Vrite API (on one finish, parsing JSON output is less complicated whereas, on the opposite, HTML is extra appropriate to remodel content material into).

From the UI facet, Block Actions are displayed as buttons on the facet of the actively-selected block. They will both invoke an motion straight on click on or — like with ChatGPT — open a dropdown menu to immediate the person for extra particulars.

GPT-3.5 extension's Block Action dropdown

The buttons needed to be completely positioned, which required each a customized TipTap extension and tapping deeper into the underlying ProseMirror (each libraries powering the Vrite editor).

The method principally got here all the way down to determining the place and dimension of the block node, given a collection of a complete top-level node or simply its baby node (supply code):

// ...
const BlockActionMenuPlugin = Extension.create({
  // ...
  onSelectionUpdate() {
    const { choice } = this.editor.state;
    const isTextSelection = choice instanceof TextSelection;
    const selectedNode = choice.$from.node(1) || choice.$from.nodeAfter;

    if (!selectedNode) {
      field.type.show = 'none';
      return;
    }

    const { view } = this.editor;
    const node =
      view.nodeDOM(choice.$from.pos) ||
      view.nodeDOM(choice.$from.pos - choice.$from.parentOffset) ||
      view.domAtPos(choice.$from.pos)?.node;

    if (!node) return;

    const blockParent = getBlockParent(node);
    const parentPos = doc
      .getElementById('pm-container')
      ?.getBoundingClientRect();
    const childPos = blockParent?.getBoundingClientRect();

    if (!parentPos || !childPos) return;

    const relativePos = {
      prime: childPos.prime - parentPos.prime,
      proper: childPos.proper - parentPos.proper,
      backside: childPos.backside - parentPos.backside,
      left: childPos.left - parentPos.left,
    };

    let rangeFrom = choice.$from.pos;
    let rangeTo = choice.$to.pos;

    field.type.prime = `${relativePos.prime}px`;
    field.type.left = `${relativePos.left + parentPos.width}px`;
    field.type.show = 'block';

    if (isTextSelection) {
      strive {
        const p = findParentAtDepth(choice.$from, 1);
        rangeFrom = p.begin - 1;
        rangeTo = p.begin + p.node.nodeSize - 1;
      } catch (e) {
        field.type.show = 'none';
      }
    }

    // ...
  },
});

Enter fullscreen mode

Exit fullscreen mode



Changing Editor Content material

The second half concerned dealing with the precise means of changing the block’s content material with the newly offered one. The trickiest factor to determine right here was to get the right vary (the beginning and finish place in ProseMirror) of the block node. This was essential to then use TipTap’s instructions to correctly change the vary.

If you happen to’ve taken a better have a look at the final code snippets — the code for that was already there. The block’s vary was up to date, along with the Block Motion UI positioning, on each choice replace.

The precise alternative of the vary with new content material was a lot simpler to do. All there was to it was changing the HTML to Schema-adherent JSON and involving correct instructions (supply code):

// ...
const replaceContent = (content material) => {
  unlock.clear();
  setLocked(true);
  if (vary()) {
    let dimension = 0;
    const nodeOrFragment = createNodeFromContent(
      content material,
      props.state.editor.schema
    );

    if (nodeOrFragment instanceof PMNode) {
      dimension = nodeOrFragment.nodeSize;
    } else {
      dimension = nodeOrFragment.dimension;
    }
    props.state.editor
      .chain()
      .focus()
      .insertContentAt(
        vary()!,
        generateJSON(content material, props.state.editor.extensionManager.extensions)
      )
      .scrollIntoView()
      .focus()
      .run();
    setRange({ from: vary()!.from, to: vary()!.from + dimension - 1 });
    computeDropdownPosition()();
  }
  unlock();
};
// ...

Enter fullscreen mode

Exit fullscreen mode

The replaceContent() operate may then be known as remotely, from the extension’s sandbox, by sending a correct message to the primary body.

To allow use-cases like ChatGPT integrations, the place the content material shall be up to date (i.e. changed) a number of occasions in a row earlier than the method is completed, the operate additionally locked the editor for a short while of the operate being known as and up to date the vary, and UI positioning on each name. However why was this required?



Integrating With OpenAI’s API

The method of integrating OpenAI’s API is fairly well-documented in its official docs. Provided that an official SDK is offered, your complete course of will be executed in only a few traces of code:

async ({ ctx, enter }) => {
  const configuration = new Configuration({
    apiKey: ctx.fastify.config.OPENAI_API_KEY,
    group: ctx.fastify.config.OPENAI_ORGANIZATION,
  });
  const openai = new OpenAIApi(configuration);
  const response = await openai.createChatCompletion({
    mannequin: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: input.prompt }],
  });
};

Enter fullscreen mode

Exit fullscreen mode

Now, all that’s true, however provided that you’re keen to attend what usually is +20s for a single response! That’s loads for a single request. Nothing from altering the server location to optimizing the request by limiting max_tokens or customizing different parameters labored. All of it comes all the way down to the truth that current-day LLMs (these on the extent of GPT-3 a minimum of) are nonetheless quite sluggish.

With that mentioned, the ChatGPT app nonetheless manages to be perceived as pretty quick and responsive. That’s due to the usage of streaming and Server-Despatched Occasions (SSEs).



Streaming ChatGPT Response

The chat completion and different endpoints of OpenAI’s API help streaming by way of Server-Despatched Occasions, basically sustaining an open connection by way of which the brand new tokens are despatched as quickly as they’re out there.

Sadly, the official Node.js SDK doesn’t help streaming and requires you to make use of workarounds to get it working, leading to way more code required, simply to attach with the API (supply code):

async ({ ctx, enter }) => {
  const configuration = new Configuration({
    apiKey: ctx.fastify.config.OPENAI_API_KEY,
    group: ctx.fastify.config.OPENAI_ORGANIZATION,
  });
  const openai = new OpenAIApi(configuration);
  const response = await openai.createChatCompletion(
    {
      mannequin: 'gpt-3.5-turbo',
      stream: true,
      messages: [{ role: 'user', content: input.prompt }],
    },
    { responseType: 'stream' }
  );
  ctx.res.uncooked.writeHead(200, {
    ...ctx.res.getHeaders(),
    'content-type': 'textual content/event-stream',
    'cache-control': 'no-cache',
    connection: 'keep-alive',
  });

  return new Promise<void>((resolve) => {
    const responseData = response.knowledge as unknown as {
      on(occasion: string, knowledge: (knowledge: string) => void): void;
    };

    responseData.on('knowledge', (knowledge) => {
      const traces = knowledge
        .toString()
        .break up('n')
        .filter((line) => line.trim() !== '');
      for (const line of traces) {
        const message = line.change(/^knowledge: /, '');
        if (message === '[DONE]') {
          ctx.res.uncooked.finish();
          resolve();
          proceed;
        }
        strive {
          const parsed = JSON.parse(message);

          const content material = parsed.selections[0].delta.content material || '';

          if (content material) {
            ctx.res.uncooked.write(`knowledge: ${encodeURIComponent(content material)}`);
            ctx.res.uncooked.write('nn');
          }
        } catch (error) {
          console.error('Couldn't JSON parse stream message', message, error);
        }
      }
    });
  });
};

Enter fullscreen mode

Exit fullscreen mode

On prime of that, you additionally must help streaming in your finish, between your API server and internet consumer which, within the case of Vrite, meant integrating SSEs with Fastify and tRPC. Not the cleanest resolution, however fairly secure nonetheless.

From the frontend (the extension sandbox to be exact), a reference to the brand new streaming endpoint must be established and incoming knowledge — accurately processed (supply code):

import { fetchEventSource } from "@microsoft/fetch-event-source";
// ...

const generate = async (context: ExtensionBlockActionViewContext): Promise<void> => {
  const includeContext = context.temp.includeContext as boolean;
  const immediate = context.temp.immediate as string;

  let content material = "";

  context.setTemp("$loading", true);
  window.currentRequestController = new AbortController();
  window.currentRequestController.sign.addEventListener("abort", () => {
    context.setTemp("$loading", false);
    context.refreshContent();
  });
  await fetchEventSource("https://extensions.vrite.io/gpt", {
    methodology: "POST",
    headers: {
      "Content material-Sort": "software/json",
      "Settle for": "textual content/event-stream"
    },
    physique: JSON.stringify({
      immediate: includeContext ? `"${gfmTransformer(context.content material)}"nn${immediate}` : immediate
    }),
    sign: window.currentRequestController?.sign,
    async onopen() {
      return;
    },
    onerror(error) {
      context.setTemp("$loading", false);
      context.refreshContent();
      context.notify({ textual content: "Error whereas producing content material", kind: "error" });
      throw error;
    },
    onmessage(occasion) {
      const partOfContent = decodeURIComponent(occasion.knowledge);

      content material += partOfContent;
      context.replaceContent(marked.parse(content material));
    },
    onclose() {
      context.setTemp("$loading", false);
      context.refreshContent();
    }
  });
};
Enter fullscreen mode

Exit fullscreen mode

The EventSource Net API for dealing with SSEs (constructed into most fashionable browsers) sadly helps solely GET requests, which was fairly limiting when a POST request with bigger physique JSON knowledge was required. In its place, you need to use the Fetch API or a prepared library like Microsoft’s Fetch Occasion Supply.

Once more, with streaming enabled, you’ll now obtain new tokens as quickly as they’re out there. Provided that OpenAI’s API makes use of Markdown in its response format, a full message will have to be put collectively from the incoming tokens and parsed to HTML, as accepted by the replaceContent operate. For this objective, I’ve used the Marked.js parser.

Now, with every new token, the bigger response is being constructed up. Each time a brand new token comes, the complete Markdown is parsed and the content material up to date, making for a pleasant “typing-like impact”.

Whereas this course of does have some overhead, it’s not noticeable in use, whereas the Markdown merely must be parsed with every new token, as it might include e.g. the closing of the code block, or the top of the formatted section. So, whereas this course of may probably be optimized, it wouldn’t result in any recognizable efficiency enchancment within the majority of instances.

Lastly, price noting is the usage of AbortController, which can be utilized to cease the stream at any time the person chooses to. That’s particularly nice for longer responses.

GPT-3.5 Block Action... in action



Backside Line

Basically, I’m very proud of how this turned out. Knowledge streaming, good typing impact, and good integration with the editor’s present content material blocks due to Markdown parsing — all got here collectively to create a compelling Consumer Expertise.

Now, there’s actually room for enchancment. The Block Actions API, in addition to the Vrite Extensions as a complete nonetheless have quite a lot of improvement work forward of them earlier than they are often created by different customers. Different UI/UX enhancements to contemplate, like working on a number of blocks directly (e.g. for extra context for ChatGPT) and displaying UI inline (very like Notion AI) to not obscure the view are only a few examples of what I used to be contemplating. That mentioned, it’ll take some extra time to implement these concepts effectively.

Vrite Kanban dashboard

Vrite is way more than only a GPT-enhanced editor. It’s a full, open-source CMS centered on technical content material like programming blogs, with a code editor, API, Kanban administration dashboard included and simple publishing integrations included. So, if you happen to’re fascinated by making an attempt it out and presumably utilizing it to energy your weblog, undoubtedly test it out!

Leave a Reply

Your email address will not be published. Required fields are marked *