Self-hosted generative AI: How I built Stable Diffusion into a Discord Bot

Self-hosted generative AI: How I built Stable Diffusion into a Discord Bot

Generative AI is taking the tech world by storm.

I’m very late to the AI game, having just started playing with Stable Diffusion. I recently downloaded the project and started running it on my Mac Studio. This allowed me to use a web interface to create AI images.

Another thing I love to do is find interesting and creative ways to build custom Discord bots to do things for me, so I thought “Could I potentially integrate this into a Discord bot”?

Using the Stable Diffusion API

To start, I needed to figure out how to use the Stable Diffusion API.

I needed to start the project with the switch, enabling the API. I used Postman with the /sdapi/v1/tst2img endpoint to send a simple prompt just to see what the response was.

Turns out the image is returned as a base64 string in the JSON payload. This was enough info I needed to know that I could pull this off.

Creating the bot

I started by creating the shell of a Discord Bot in Go.

The following is code for a simple bot that takes in any text as a prompt and sends it to the Stable Diffusion API. I used logging initially to make sure that things were happening, but could also see that images were being generated and saved in the project folder on my Mac Studio.

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"

    "github.com/bwmarrin/discordgo"
    _ "github.com/go-sql-driver/mysql"
    "github.com/joho/godotenv"
)

type Txt2ImgRequestBody struct {
    Prompt string `json:"prompt"`
}

func main() {
    godotenv.Load()

    token := os.Getenv("BOT_TOKEN")
    sess, err := discordgo.New("Bot " + token)
    if err != nil {
        log.Fatal(err)
    }

    sess.AddHandler(func(s *discordgo.Session, m *discordgo.MessageCreate) {
        if m.Author.ID == s.State.User.ID {
            return
        }

        b := Txt2ImgRequestBody{
            Prompt: m.Content,
        }

        jbytes, err := json.Marshal(b)
        if err != nil {
            log.Fatal(err)
        }

        req, err := http.NewRequest("POST", fmt.Sprintf("%v/sdapi/v1/txt2img", os.Getenv("SDW_BASE")), bytes.NewBuffer(jbytes))
        if err != nil {
            log.Fatal(err)
        }
        c := http.Client{}
        _, err = c.Do(req)
        if err != nil {
            log.Fatal(err)
        }
    })

    sess.Identify.Intents = discordgo.IntentsAll

    err = sess.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer sess.Close()

    fmt.Println("the bot is online!")

    sc := make(chan os.Signal, 1)
    signal.Notify(sc, syscall.SIGINT, syscall.SIGTERM, os.Interrupt)
    <-sc
}

Now that I could see images were being saved, I needed to figure out a way to get them into Discord. Using the ChannelFileSend method, I could pass in io.Reader object (which represents the file) and send it to the channel that it came from.

// This was added to the above code
c := http.Client{}
res, err := c.Do(req)
if err != nil {
    log.Fatal(err)
}

defer res.Body.Close()

// Decode the response body into a struct that I can work with
var rb Txt2ImgResponseBody
err = json.NewDecoder(res.Body).Decode(&rb)
if err != nil {
    log.Fatal(err)
}

// Decode the base64 image into a []byte type
decoded, err := base64.StdEncoding.DecodeString(rb.Images[0])
if err != nil {
    log.Fatal(err)
}

// Create the io.Reader
r := bytes.NewReader(decoded)

// Send the file to the server
s.ChannelFileSend(m.ChannelID, "generated.png", r)

Here is the Txt2ImgResponseBody struct that represents the response body from Stable Diffusion:

type Txt2ImgResponseBody struct {
    Images []string `json:"images"`
}

This results in me being able to have the bot drop the image in chat!

Now by default, generated images are 512x512, but I wanted to be able to customize them from chat. I decided to use a slash command with some options to be able to set the dimensions of the generated image.

I started by defining the slash command.

imgCommand := discordgo.ApplicationCommand{
    Name:        "img",
    Description: "Generate an AI image",
    Options: []*discordgo.ApplicationCommandOption{
        {
            Name:        "prompt",
            Description: "The Stable Diffusion prompt",
            Type:        discordgo.ApplicationCommandOptionString,
            Required:    true,
        },
        {
            Name:        "height",
            Description: "Image height in pixels (default: 512)",
            Type:        discordgo.ApplicationCommandOptionInteger,
        },
        {
            Name:        "width",
            Description: "Image width in pixels (default: 512)",
            Type:        discordgo.ApplicationCommandOptionInteger,
        },
    },
}

Next, I moved all the Stable Diffusion bits to a separate handler function, just to keep it a bit more organized.

func ImgCommandHandlerfunc(s *discordgo.Session, i *discordgo.InteractionCreate) {
    s.InteractionRespond(i.Interaction, &discordgo.InteractionResponse{
        Type: discordgo.InteractionResponseChannelMessageWithSource,
        Data: &discordgo.InteractionResponseData{
            Content: "Got it! I'll drop a message back here when it's done.",
        },
    })

    b := Txt2ImgRequestBody{
        Prompt: "",
        Height: 512,
        Width:  512,
    }

    data := i.ApplicationCommandData()
    opts := data.Options

    for _, el := range opts {
        if el.Name == "prompt" {
            b.Prompt = el.StringValue()
        }
        if el.Name == "height" {
            b.Height = el.IntValue()
        }
        if el.Name == "width" {
            b.Width = el.IntValue()
        }
    }
    jbytes, err := json.Marshal(b)
    if err != nil {
        log.Fatal(err)
    }

    req, err := http.NewRequest("POST", fmt.Sprintf("%v/sdapi/v1/txt2img", os.Getenv("SDW_BASE")), bytes.NewBuffer(jbytes))
    if err != nil {
        log.Fatal(err)
    }
    c := http.Client{}
    res, err := c.Do(req)
    if err != nil {
        log.Fatal(err)
    }

    defer res.Body.Close()
    var rb Txt2ImgResponseBody
    err = json.NewDecoder(res.Body).Decode(&rb)
    if err != nil {
        log.Fatal(err)
    }

    decoded, err := base64.StdEncoding.DecodeString(rb.Images[0])
    if err != nil {
        log.Fatal(err)
    }

    r := bytes.NewReader(decoded)
    s.ChannelFileSend(i.ChannelID, "generated.png", r)
}

Finally, I registered the command after the bot session has been created:

sess.AddHandler(func(s *discordgo.Session, i *discordgo.InteractionCreate) {
    if i.ApplicationCommand().Name == "img" {
        ImgCommandHandler(s, i)
    }
})

The result is that I can generate an image directly from within Discord using /img and providing a prompt, and optionally custom dimensions. I did expand mine a bit since originally writing this since it drops the image in a new thread, but otherwise the code is very similar.