LLM Chat
You probably already use some of these services, but here are a few curated ones that we can use in these tutorials for various methods.
AI chat products are apps that manage our interactions with LLMs. They have a similar appearance to any other chat application, but the text is being generated by an LLM and streamed to us.
Most popular ones are hosted online or available through a mobile app, but some can be run locally on our machines ourselves.
Here are the popular hosted ones:
ChatGPT
The famous chat application that started the craze, released in Nov 2022 by OpenAI.
You can sign up for a free account here: https://chatgpt.com/
Perplexity.AI
Perplexity.AI takes an interesting approach to AI chat, attempting to address some of the oddities encountered with LLM's tendency to "hallucinate," or generate creative results without regard for their truthfulness. Perplexity.AI combines text generation with web search to look for credible sources relevant to a user's prompt.
It's available for public use here: https://www.perplexity.ai/
Hugging Chat
Hugging Face host their own chat service that enables you to pick from a variety of models. This is good for a free, powerful, flexible option to test models of different kinds.
Sign in with a free Hugging Face account: https://huggingface.co/chat/
Meta AI Chat
Meta AI Chat is backed by Meta's LLM, LLaMa. This model is one of the most popular open-weight models available, downloadable in different sizes and flavors, and runnable locally on your own machine with the hardware.
Find it here: https://www.meta.ai/
Google Gemini
"Gemini" is the name for both Google's LLM collection and their chat service. It's the successor to "Bard," which you may have heard of.
Find it here: https://gemini.google.com/app
Google Gemini via Google Vertex AI
Vertex AI is a cloud service of Google Cloud. You can run text completion on multiple models and with multiple kinds of inputs. You can adjust more of the parameters for inference like temperature.
Find it here: https://console.cloud.google.com/vertex-ai/generative/multimodal/create/text
Anthropic's Claude
Anthropic is a significant company in the AI space, with their flagship LLM called "Claude."
Anthropic is a public-benefit corporation that claims to be focused on AI safety. In 2023, they announced upcoming investment from Amazon upwards of $4B USD.
Find it here: https://claude.ai/
How LLM Chats Work
LLM chat applications are just web apps that do the following:
- manage a list of "messages"
- submit prompts to a hosted LLM API
- receives completions from the hosted LLM API
- manage a set of "tools" for functionality beyond text generation
And that's pretty much it. It's relatively easy to code your own LLM chat app, if you have access to an API for performing the LLM completion itself.
Whenever you submit a prompt, this is wrapped in a "message" object that is added to a list of managed messages, then all of those messages are sent to the LLM API.
This last point is key. All messages in a single "chat" session are sent with every new prompt, all of them since the start of the chat. The LLM then generates a completion one token at a time. This is how the LLM can perform completions based off of the entire conversation, limited by the token limit.
This also means that prompting techniques like prompting the LLM to expound on its reasoning first can often come up with more relevant (or higher quality?) completions.
The effect of seeing each token added in near realtime is called "streaming," and it's a feature of LLM APIs and SDKs.