Other Resources

`Bonus Resources!`

HyperTech Core v0.1.0 ☄️

HYPERION 🪐

(coming soon!)

Resources ✨

YouTube 📺

Build 🏗️

Compute ⚡

R&D 🧪

Bonus 🛸

Looking for all of the other cool technologies being developed in the space? Checkout my GitHub Stars for tons of really interesting projects that are FOSS & FOSAI.

https://github.com/adynblaed?tab=stars (opens in a new tab)

Awesome-LLM

The content below is from Awesome-LLM (opens in a new tab).

Base LLM

Model	Size	Architecture	Access	Date	Origin	Model License[^1]
Switch Transformer	1.6T	Decoder(MOE)	-	2021-01	Paper (opens in a new tab)	-
GLaM	1.2T	Decoder(MOE)	-	2021-12	Paper (opens in a new tab)	-
PaLM	540B	Decoder	-	2022-04	Paper (opens in a new tab)	-
MT-NLG	530B	Decoder	-	2022-01	Paper (opens in a new tab)	-
J1-Jumbo	178B	Decoder	api (opens in a new tab)	2021-08	Paper (opens in a new tab)	-
OPT	175B	Decoder	api (opens in a new tab) \| ckpt (opens in a new tab)	2022-05	Paper (opens in a new tab)	OPT-175B License Agreement (opens in a new tab)
BLOOM	176B	Decoder	api (opens in a new tab) \| ckpt (opens in a new tab)	2022-11	Paper (opens in a new tab)	BigScience RAIL License v1.0 (opens in a new tab)
GPT 3.0	175B	Decoder	api (opens in a new tab)	2020-05	Paper (opens in a new tab)	-
LaMDA	137B	Decoder	-	2022-01	Paper (opens in a new tab)	-
GLM	130B	Decoder	ckpt (opens in a new tab)	2022-10	Paper (opens in a new tab)	The GLM-130B License (opens in a new tab)
YaLM	100B	Decoder	ckpt (opens in a new tab)	2022-06	Blog (opens in a new tab)	Apache 2.0 (opens in a new tab)
LLaMA	65B	Decoder	ckpt (opens in a new tab)	2022-09	Paper (opens in a new tab)	Non-commercial bespoke license (opens in a new tab)
GPT-NeoX	20B	Decoder	ckpt (opens in a new tab)	2022-04	Paper (opens in a new tab)	Apache 2.0 (opens in a new tab)
Falcon	40B	Decoder	ckpt (opens in a new tab)	2023-05	Homepage (opens in a new tab)	Apache 2.0 (opens in a new tab)
UL2	20B	agnostic	ckpt (opens in a new tab)	2022-05	Paper (opens in a new tab)	Apache 2.0 (opens in a new tab)
鹏程.盘古α	13B	Decoder	ckpt (opens in a new tab)	2021-04	Paper (opens in a new tab)	Apache 2.0 (opens in a new tab)
T5	11B	Encoder-Decoder	ckpt (opens in a new tab)	2019-10	Paper (opens in a new tab)	Apache 2.0 (opens in a new tab)
CPM-Bee	10B	Decoder	api (opens in a new tab)	2022-10	Paper (opens in a new tab)	-
rwkv-4	7B	RWKV	ckpt (opens in a new tab)	2022-09	Github (opens in a new tab)	Apache 2.0 (opens in a new tab)
GPT-J	6B	Decoder	ckpt (opens in a new tab)	2022-09	Github (opens in a new tab)	Apache 2.0 (opens in a new tab)
GPT-Neo	2.7B	Decoder	ckpt (opens in a new tab)	2021-03	Github (opens in a new tab)	MIT (opens in a new tab)
GPT-Neo	1.3B	Decoder	ckpt (opens in a new tab)	2021-03	Github (opens in a new tab)	MIT (opens in a new tab)

Instruction Finetuned LLM

Model	Size	Architecture	Access	Date	Origin	Model License[^1]
Flan-PaLM	540B	Decoder	-	2022-10	Paper (opens in a new tab)	-
BLOOMZ	176B	Decoder	ckpt (opens in a new tab)	2022-11	Paper (opens in a new tab)	BigScience RAIL License v1.0 (opens in a new tab)
InstructGPT	175B	Decoder	api (opens in a new tab)	2022-03	Paper (opens in a new tab)	-
Galactica	120B	Decoder	ckpt (opens in a new tab)	2022-11	Paper (opens in a new tab)	CC-BY-NC-4.0 (opens in a new tab)
OpenChatKit	20B	-	ckpt (opens in a new tab)	2023-3	-	Apache 2.0 (opens in a new tab)
Flan-UL2	20B	Decoder	ckpt (opens in a new tab)	2023-03	Blog (opens in a new tab)	Apache 2.0 (opens in a new tab)
Gopher	-	-	-	-	-	-
Chinchilla	-	-	-	-	-	-
Flan-T5	11B	Encoder-Decoder	ckpt (opens in a new tab)	2022-10	Paper (opens in a new tab)	Apache 2.0 (opens in a new tab)
T0	11B	Encoder-Decoder	ckpt (opens in a new tab)	2021-10	Paper (opens in a new tab)	Apache 2.0 (opens in a new tab)
Alpaca	7B	Decoder	demo (opens in a new tab)	2023-03	Github (opens in a new tab)	CC BY NC 4.0 (opens in a new tab)
Orca	13B	Decoder	ckpt (opens in a new tab)	2023-06	Paper (opens in a new tab)	Non-commercial bespoke license (opens in a new tab)

LLaMA (opens in a new tab) - A foundational, 65-billion-parameter large language model. LLaMA.cpp (opens in a new tab) Lit-LLaMA (opens in a new tab)
Alpaca (opens in a new tab) - A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca.cpp (opens in a new tab) Alpaca-LoRA (opens in a new tab)
Flan-Alpaca (opens in a new tab) - Instruction Tuning from Humans and Machines.
Baize (opens in a new tab) - Baize is an open-source chat model trained with LoRA (opens in a new tab). It uses 100k dialogs generated by letting ChatGPT chat with itself.
Cabrita (opens in a new tab) - A portuguese finetuned instruction LLaMA.
Vicuna (opens in a new tab) - An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
Llama-X (opens in a new tab) - Open Academic Research on Improving LLaMA to SOTA LLM.
Chinese-Vicuna (opens in a new tab) - A Chinese Instruction-following LLaMA-based Model.
GPTQ-for-LLaMA (opens in a new tab) - 4 bits quantization of LLaMA (opens in a new tab) using GPTQ (opens in a new tab).
GPT4All (opens in a new tab) - Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa.
Koala (opens in a new tab) - A Dialogue Model for Academic Research
BELLE (opens in a new tab) - Be Everyone's Large Language model Engine
StackLLaMA (opens in a new tab) - A hands-on guide to train LLaMA with RLHF.
RedPajama (opens in a new tab) - An Open Source Recipe to Reproduce LLaMA training dataset.
Chimera (opens in a new tab) - Latin Phoenix.
WizardLM|WizardCoder (opens in a new tab) - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder.
CaMA (opens in a new tab) - a Chinese-English Bilingual LLaMA Model.
Orca (opens in a new tab) - Microsoft's finetuned LLaMA model that reportedly matches GPT3.5, finetuned against 5M of data, ChatGPT, and GPT4
BayLing (opens in a new tab) - an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.
UltraLM (opens in a new tab) - Large-scale, Informative, and Diverse Multi-round Chat Models.
Guanaco (opens in a new tab) - QLoRA tuned LLaMA
BLOOM (opens in a new tab) - BigScience Large Open-science Open-access Multilingual Language Model BLOOM-LoRA (opens in a new tab)
BLOOMZ&mT0 (opens in a new tab) - a family of models capable of following human instructions in dozens of languages zero-shot.
Phoenix (opens in a new tab)
T5 (opens in a new tab) - Text-to-Text Transfer Transformer
T0 (opens in a new tab) - Multitask Prompted Training Enables Zero-Shot Task Generalization
OPT (opens in a new tab) - Open Pre-trained Transformer Language Models.
UL2 (opens in a new tab) - a unified framework for pretraining models that are universally effective across datasets and setups.
GLM (opens in a new tab)- GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.
ChatGLM-6B (opens in a new tab) - ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型，基于 General Language Model (GLM) (opens in a new tab) 架构，具有 62 亿参数.
ChatGLM2-6B (opens in a new tab) - An Open Bilingual Chat LLM | 开源双语对话语言模型
RWKV (opens in a new tab) - Parallelizable RNN with Transformer-level LLM Performance.
ChatRWKV (opens in a new tab) - ChatRWKV is like ChatGPT but powered by my RWKV (100% RNN) language model.
StableLM (opens in a new tab) - Stability AI Language Models.
YaLM (opens in a new tab) - a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
GPT-Neo (opens in a new tab) - An implementation of model & data parallel GPT3 (opens in a new tab)-like models using the mesh-tensorflow (opens in a new tab) library.
GPT-J (opens in a new tab) - A 6 billion parameter, autoregressive text generation model trained on The Pile (opens in a new tab).
Dolly (opens in a new tab) - a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT.
Pythia (opens in a new tab) - Interpreting Autoregressive Transformers Across Time and Scale
Dolly 2.0 (opens in a new tab) - the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
OpenFlamingo (opens in a new tab) - an open-source reproduction of DeepMind's Flamingo model.
Cerebras-GPT (opens in a new tab) - A Family of Open, Compute-efficient, Large Language Models.
GALACTICA (opens in a new tab) - The GALACTICA models are trained on a large-scale scientific corpus.
GALPACA (opens in a new tab) - GALACTICA 30B fine-tuned on the Alpaca dataset.
Palmyra (opens in a new tab) - Palmyra Base was primarily pre-trained with English text.
Camel (opens in a new tab) - a state-of-the-art instruction-following large language model designed to deliver exceptional performance and versatility.
h2oGPT (opens in a new tab)
PanGu-α (opens in a new tab) - PanGu-α is a 200B parameter autoregressive pretrained Chinese language model develped by Huawei Noah's Ark Lab, MindSpore Team and Peng Cheng Laboratory.
MOSS (opens in a new tab) - MOSS是一个支持中英双语和多种插件的开源对话语言模型.
Open-Assistant (opens in a new tab) - a project meant to give everyone access to a great chat based large language model.
HuggingChat (opens in a new tab) - Powered by Open Assistant's latest model – the best open source chat model right now and @huggingface Inference API.
StarCoder (opens in a new tab) - Hugging Face LLM for Code
MPT-7B (opens in a new tab) - Open LLM for commercial use by MosaicML
Falcon (opens in a new tab) - Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
XGen (opens in a new tab) - Salesforce open-source LLMs with 8k sequence length.
baichuan-7B (opens in a new tab) - baichuan-7B 是由百川智能开发的一个开源可商用的大规模预训练语言模型.
Aquila (opens in a new tab) - 悟道·天鹰语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。

LLM Training Frameworks

DeepSpeed (opens in a new tab) - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Megatron-DeepSpeed (opens in a new tab) - DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
FairScale (opens in a new tab) - FairScale is a PyTorch extension library for high performance and large scale training.
Megatron-LM (opens in a new tab) - Ongoing research training transformer models at scale.
Colossal-AI (opens in a new tab) - Making large AI models cheaper, faster, and more accessible.
BMTrain (opens in a new tab) - Efficient Training for Big Models.
Mesh Tensorflow (opens in a new tab) - Mesh TensorFlow: Model Parallelism Made Easier.
maxtext (opens in a new tab) - A simple, performant and scalable Jax LLM!
Alpa (opens in a new tab) - Alpa is a system for training and serving large-scale neural networks.
GPT-NeoX (opens in a new tab) - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Tools for Deploying LLMs

FastChat (opens in a new tab) - A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
SkyPilot (opens in a new tab) - Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
vLLM (opens in a new tab) - A high-throughput and memory-efficient inference and serving engine for LLMs
Text Generation Inference (opens in a new tab) - A Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace (opens in a new tab) to power LLMs api-inference widgets.
Haystack (opens in a new tab) - an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
Sidekick (opens in a new tab) - Data integration platform for LLMs.
LangChain (opens in a new tab) - Building applications with LLMs through composability
wechat-chatgpt (opens in a new tab) - Use ChatGPT On Wechat via wechaty
promptfoo (opens in a new tab) - Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
Agenta (opens in a new tab) - Easily build, version, evaluate and deploy your LLM-powered apps.

Tutorials About LLMs

[Andrej Karpathy] State of GPT video (opens in a new tab)
[Hyung Won Chung] Instruction finetuning and RLHF lecture Youtube (opens in a new tab)
[Jason Wei] Scaling, emergence, and reasoning in large language models Slides (opens in a new tab)
[Susan Zhang] Open Pretrained Transformers Youtube (opens in a new tab)
[Ameet Deshpande] How Does ChatGPT Work? Slides (opens in a new tab)
[Yao Fu] 预训练，指令微调，对齐，专业化：论大语言模型能力的来源 Bilibili (opens in a new tab)
[Hung-yi Lee] ChatGPT 原理剖析 Youtube (opens in a new tab)
[Jay Mody] GPT in 60 Lines of NumPy Link (opens in a new tab)
[ICML 2022] Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models Link (opens in a new tab)
[NeurIPS 2022] Foundational Robustness of Foundation Models Link (opens in a new tab)
[Andrej Karpathy] Let's build GPT: from scratch, in code, spelled out. Video (opens in a new tab)|Code (opens in a new tab)
[DAIR.AI] Prompt Engineering Guide Link (opens in a new tab)
[邱锡鹏] 大型语言模型的能力分析与应用 Slides | Video (opens in a new tab)
[Philipp Schmid] Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers Link (opens in a new tab)
[HuggingFace] Illustrating Reinforcement Learning from Human Feedback (RLHF) Link (opens in a new tab)
[HuggingFace] What Makes a Dialog Agent Useful? Link (opens in a new tab)
[张俊林]通向AGI之路：大型语言模型(LLM)技术精要 Link (opens in a new tab)
[大师兄]ChatGPT/InstructGPT详解 Link (opens in a new tab)
[HeptaAI]ChatGPT内核：InstructGPT，基于反馈指令的PPO强化学习 Link (opens in a new tab)
[Yao Fu] How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources Link (opens in a new tab)
[Stephen Wolfram] What Is ChatGPT Doing … and Why Does It Work? Link (opens in a new tab)
[Jingfeng Yang] Why did all of the public reproduction of GPT-3 fail? Link (opens in a new tab)
[Hung-yi Lee] ChatGPT (可能)是怎麼煉成的 - GPT 社會化的過程 Video (opens in a new tab)
[Keyvan Kambakhsh] Pure Rust implementation of a minimal Generative Pretrained Transformer code (opens in a new tab)

Courses About LLMs

[DeepLearning.AI] ChatGPT Prompt Engineering for Developers Homepage (opens in a new tab)
[Princeton] Understanding Large Language Models Homepage (opens in a new tab)
[OpenBMB] 大模型公开课主页 (opens in a new tab)
[Stanford] CS224N-Lecture 11: Prompting, Instruction Finetuning, and RLHF Slides (opens in a new tab)
[Stanford] CS324-Large Language Models Homepage (opens in a new tab)
[Stanford] CS25-Transformers United V2 Homepage (opens in a new tab)
[Stanford Webinar] GPT-3 & Beyond Video (opens in a new tab)
[李沐] InstructGPT论文精读 Bilibili (opens in a new tab) Youtube (opens in a new tab)
[陳縕儂] OpenAI InstructGPT 從人類回饋中學習 ChatGPT 的前身 Youtube (opens in a new tab)
[李沐] HELM全面语言模型评测 Bilibili (opens in a new tab)
[李沐] GPT，GPT-2，GPT-3 论文精读 Bilibili (opens in a new tab) Youtube (opens in a new tab)
[Aston Zhang] Chain of Thought论文 Bilibili (opens in a new tab) Youtube (opens in a new tab)
[MIT] Introduction to Data-Centric AI Homepage (opens in a new tab)

Opinions about LLMs

A Stage Review of Instruction Tuning (opens in a new tab) [2023-06-29] [Yao Fu]
LLM Powered Autonomous Agents (opens in a new tab) [2023-06-23] [Lilian]
Why you should work on AI AGENTS! (opens in a new tab) [2023-06-22] [Andrej Karpathy]
Google "We Have No Moat, And Neither Does OpenAI" (opens in a new tab) [2023-05-05]
AI competition statement (opens in a new tab) [2023-04-20] [petergabriel]
我的大模型世界观 (opens in a new tab) [2023-04-23] [陆奇]
Prompt Engineering (opens in a new tab) [2023-03-15] [Lilian]
Noam Chomsky: The False Promise of ChatGPT (opens in a new tab) [2023-03-08][Noam Chomsky]
Is ChatGPT 175 Billion Parameters? Technical Analysis (opens in a new tab) [2023-03-04][Owen]
Towards ChatGPT and Beyond (opens in a new tab) [2023-02-20][知乎][欧泽彬]
追赶ChatGPT的难点与平替 (opens in a new tab) [2023-02-19][李rumor]
对话旷视研究院张祥雨｜ChatGPT的科研价值可能更大 (opens in a new tab) [2023-02-16][知乎][旷视科技]
关于ChatGPT八个技术问题的猜想 (opens in a new tab) [2023-02-15][知乎][张家俊]
ChatGPT发展历程、原理、技术架构详解和产业未来 (opens in a new tab) [2023-02-15][知乎][陈巍谈芯]
对ChatGPT的二十点看法 (opens in a new tab) [2023-02-13][知乎][熊德意]
ChatGPT-所见、所闻、所感 (opens in a new tab) [2023-02-11][知乎][刘聪NLP]
The Next Generation Of Large Language Models (opens in a new tab) [2023-02-07][Forbes]
Large Language Model Training in 2023 (opens in a new tab) [2023-02-03][Cem Dilmegani]
What Are Large Language Models Used For? (opens in a new tab) [2023-01-26][NVIDIA]
Large Language Models: A New Moore's Law (opens in a new tab) [2021-10-26][Huggingface]

Other Awesome Lists

LLMsPracticalGuide (opens in a new tab) - A curated (still actively updated) list of practical guide resources of LLMs
Awesome ChatGPT Prompts (opens in a new tab) - A collection of prompt examples to be used with the ChatGPT model.
awesome-chatgpt-prompts-zh (opens in a new tab) - A Chinese collection of prompt examples to be used with the ChatGPT model.
Awesome ChatGPT (opens in a new tab) - Curated list of resources for ChatGPT and GPT-3 from OpenAI.
Chain-of-Thoughts Papers (opens in a new tab) - A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models.
Instruction-Tuning-Papers (opens in a new tab) - A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
LLM Reading List (opens in a new tab) - A paper & resource list of large language models.
Reasoning using Language Models (opens in a new tab) - Collection of papers and resources on Reasoning using Language Models.
Chain-of-Thought Hub (opens in a new tab) - Measuring LLMs' Reasoning Performance
Awesome GPT (opens in a new tab) - A curated list of awesome projects and resources related to GPT, ChatGPT, OpenAI, LLM, and more.
Awesome GPT-3 (opens in a new tab) - a collection of demos and articles about the OpenAI GPT-3 API (opens in a new tab).
Awesome LLM Human Preference Datasets (opens in a new tab) - a collection of human preference datasets for LLM instruction tuning, RLHF and evaluation.
RWKV-howto (opens in a new tab) - possibly useful materials and tutorial for learning RWKV.
ModelEditingPapers (opens in a new tab) - A paper & resource list on model editing for large language models.
Awesome LLM Security (opens in a new tab) - A curation of awesome tools, documents and projects about LLM Security.

Other Useful Resources

Arize-Phoenix (opens in a new tab) - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
Emergent Mind (opens in a new tab) - The latest AI news, curated & explained by GPT-4.
ShareGPT (opens in a new tab) - Share your wildest ChatGPT conversations with one click.
Major LLMs + Data Availability (opens in a new tab)
500+ Best AI Tools (opens in a new tab)
Cohere Summarize Beta (opens in a new tab) - Introducing Cohere Summarize Beta: A New Endpoint for Text Summarization
chatgpt-wrapper (opens in a new tab) - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
Open-evals (opens in a new tab) - A framework extend openai's Evals (opens in a new tab) for different language model.
Cursor (opens in a new tab) - Write, edit, and chat about your code with a powerful AI.
AutoGPT (opens in a new tab) - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
OpenAGI (opens in a new tab) - When LLM Meets Domain Experts.
HuggingGPT (opens in a new tab) - Solving AI Tasks with ChatGPT and its Friends in HuggingFace.
EasyEdit (opens in a new tab) - An easy-to-use framework to edit large language models.
chatgpt-shroud (opens in a new tab) - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.

Other Papers

If you're interested in the field of LLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of LLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link (it is possible that there are overlaps between different subfields):

LLM-Analysis

Analyse different LLMs in different fields with respect to different abilities
LLM-Acceleration

Hardware and software acceleration for LLM training and inference
LLM-Application

Use LLM to do some really cool stuff
LLM-Augmentation

Augment LLM in different aspects including faithfulness, expressiveness, domain-specific knowledge etc.
LLM-Detection

Detect LLM-generated text from texts written by humans
LLM-Alignment

Align LLM with Human Preference
Chain-of-Thought

Chain of thought—a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning.
In-Context-Learning

Large language models (LLMs) demonstrate an in-context learning (ICL) ability, that is, learning from a few examples in the context.
Prompt-Learning

A Good Prompt is Worth 1,000 Words
Instruction-Tuning

Finetune a language model on a collection of tasks described via instructions

Models Concepts