《How AI Works》阅读笔记
2024-12-21
本文是《How AI Works》的阅读笔记, 通过 MindMap 记录, 通过一些方法转换成 markdown, 图片无法显示, 用于未来参考和检阅.
- AN AI OVERVIEW
- Correlation
- Machine learning
- model
- point of making the model
- training
- Training a model is fundamentally different from programming
- Deep learning
- machine learning algorithm
- vector
- the four measurements of each iris flower mean we can represent the flower as a string of four numbers, say, (4.5, 2.3, 1.3, 0.3)
- The flower described by this vector has a sepal length of 4.5 cm, sepal width of 2.3 cm, petal length of 1.3 cm, and petal width of 0.3 cm. By grouping these measurements together, we can refer to them as a single entity.
- features
- two-dimensional iris training data
- features
- The flower described by this vector has a sepal length of 4.5 cm, sepal width of 2.3 cm, petal length of 1.3 cm, and petal width of 0.3 cm. By grouping these measurements together, we can refer to them as a single entity.
- Matrices
- Matrices
- the four measurements of each iris flower mean we can represent the flower as a string of four numbers, say, (4.5, 2.3, 1.3, 0.3)
- Interpolation; extrapolation
- A HISTORY OF AI
- symbolic AI and connectionism
- with the advent of deep learning it』s safe to say that the connectionists have won the day
- symbolic ai
- connectionism
- three main branches of machine learning
- Reinforcement learning
- supervised learning
- unsupervised learning
- 连接主义逐渐占优势的几个因素
- Speed
- CPU 和 CPU 性能的区别
- a neural network is ideally suited to what a GPU can do
- CPU 和 CPU 性能的区别
- Algorithm
- The first approaches to training neural networks were primitive and unable to take advantage of their true potential. Algorithmic innovations changed that.
- Data
- Before the World Wide Web, collecting, labeling, and processing datasets of the magnitude necessary to train a deep neural network proved difficult
- Speed
- symbolic AI and connectionism
- CLASSICAL MODELS: OLD-SCHOOL MACHINE LEARNING
- 这一章节讨论了三个经典模型
- 最近邻算法; nearest neighbors
- nearest neighbor classifiers are the simplest of models—so simple that there’s no model to train; the training data is the model
- To assign a class label to a new, unknown input, find the training sample closest to the unknown sample and return that sample』s label
- This example uses two-dimensional feature vectors, x0 and x1, so we can visualize the process
- As a natural extension to the nearest neighbor model, locate the k training samples nearest the unknown sample. k is often a number like 3, 5, or 7, though it can be any number. This type of model uses a majority voting system, so the assigned class label is the one that』s most common among the k training samples
- data often lives in a lower-dimensional space than the dimensionality of the data itself
- Nearest neighbor models aren』t used often these days
- slow
- increasing the number of training samples increases the time it takes to use the classifier.
- 分离式的输入
- slow
- 随机森林算法; support vector machines
- A random forest is a collection of decision trees, each randomly different from the others. The forest』s prediction is a combination of its trees』predictions.
- Three steps go into growing a random forest: bagging (also called bootstrapping), random feature selection, and ensembling
- Bagging
- The phrase 「with replacement」means we might select a training sample more than once or not at all. This technique is used in statistics to understand a measurement』s bounds.
- feature selection
- ensembly
- Bagging
- Three steps go into growing a random forest: bagging (also called bootstrapping), random feature selection, and ensembling
- A random forest is a collection of decision trees, each randomly different from the others. The forest』s prediction is a combination of its trees』predictions.
- 支持向量机; svm; support vector machine
- four concepts: margins, support vectors, optimization, and kernels
- margin
- 找一个最大边界
- Example
- Support vectors
- support vector; optimization; kernel
- 找一个最大边界
- kernel
- training a support vector machine means locating good values for the parameters related to the kernel used
- margin
- SVMs are binary classifiers
- two options
- one-versus-rest or oneversus-all
- one-versus-one
- two options
- four concepts: margins, support vectors, optimization, and kernels
- 最近邻算法; nearest neighbors
- Genetic programming
- the neural network
- support vector machines, decision trees, and random forests use data to generate functions according to a carefully crafted algorithm designed by a human
- 这一章节讨论了三个经典模型
- NEURAL NETWORKS: BRAIN-LIKE AI
- Biological neuron; fire
- Think of a biological neuron like a light switch. It』s off until there is a reason (sufficient input) to turn it on.
- artificial neuron
- 人工神经元不只有两个状态,可以类似一个旋钮可以多级调节
- Summary
- inputs to the neuron
- activation function
- accept input to the neuron and produce an output value
- weights
- bias
- sigmoid
- hidden layers
- each have one hidden layer with 2, 3, and 8 nodes,
- Example
- Neural networks don』t tell us the actual class label for the input, but only their confidence in one label relative to another.
- Neural networks are randomly initialized, such that repeated training leads to differently performing models even when using the same training data
- Recap
- The fundamental unit of a neural network is the neuron, also called a node.
- backpropagation and gradient descent
- The general training algorithm
- 训练类似一个从 a 走到 b 的过程
- architecture
- average error
- good enough
- overfitting; 过度拟合
- 解决过度拟合的几个方法
- overfitting is addressed in several ways, the best of which is acquiring more training data
- more training data means a better representation of that data collection
- weight decay
- data augmentation
- Data augmentation takes the existing training data and mutates it to produce new data
- overfitting is addressed in several ways, the best of which is acquiring more training data
- 解决过度拟合的几个方法
- overfitting; 过度拟合
- good enough
- 训练类似一个从 a 走到 b 的过程
- gradient descent; 梯度下降
- Summary
- where do the gradients come from
- 梯度下降算法的基本思想是:1. 初始化参数:首先,我们需要为模型的参数(如权重)选择一个初始值,这些参数决定了模型的行为。2. 计算梯度:然后,算法计算损失函数相对于每个参数的梯度(即偏导数)。梯度是一个向量,指向函数增长最快的方向。3. 参数更新:接下来,算法通过沿着梯度的相反方向更新参数来减小损失函数的值。更新的步长由学习率(learning rate)决定,学习率是一个超参数,需要在训练开始前设置。4. 迭代:重复上述步骤,直到达到某个停止条件,如梯度足够小、达到预定的迭代次数或损失函数值不再显著减小
- 二维梯度下降
- Summary
- Summary
- Backpropagation; 反向传播
- Backpropagation gives us the 「speed」 representing how the network』s error changes with a change in any weight or bias value
- principal takeaways
- Biological neuron; fire
- CONVOLUTIONAL NEURAL NETWORKS: AI LEARNS TO SEE
- CNN; Convolutional neural networks
- Convolution; 卷积
- Fortunately for us, convolution is a straightforward operation in digital images
- Convolutional layers
- Fortunately for us, convolution is a straightforward operation in digital images
- Pooling layers
- Example
- Thrive on structure in their inputs, which is the complete opposite of classical machine learning models
- Convolution; 卷积
- CNN; Convolutional neural networks
- GENERATIVE AI: AI GETS CREATIVE
- diffusion models
- large language models
- generative adversarial networks
- Generative adversarial networks; GAN
- generator
- 一个可控 GAN 训练后的方向
- 一个训练后可以控制人连特征的生成器
- 一个可控 GAN 训练后的方向
- discriminator
- to learn how to differentiate between fake and real inputs
- mode collapse
- conditional GANs
- generator
- Generative adversarial networks rely on competition between the generator and the discriminator to learn to create fake outputs similar to the training data
- Generative adversarial networks; GAN
- Diffusion models
- training a diffusion model involves teaching it to predict noise added to a training image
- the original image is destroyed at the end of the process, leaving only noise
- In effect, sampling from the diffusion model moves from right to left using the trained network to predict noise that is then subtracted to produce the previous image. Repeating this process for all the steps in the schedule completes the noise-to-image generation process.
- The forward algorithm trains the diffusion model, and the reverse algorithm samples from a trained model during inference to produce output images
- forward algorithm
- Diffusion 也是一种 GAN
- 条件式的 GAN 训练
- reverse algorithm
- the original image is destroyed at the end of the process, leaving only noise
- summary
- GANs
- Conditional GANs
- Controllable GANs
- Diffusion models
- Conditional diffusion models
- training a diffusion model involves teaching it to predict noise added to a training image
- LARGE LANGUAGE MODELS: TRUE AI AT LAST
- ANI; artificial narrow intelligence
- AGI; artificial general intelligence
- Large language models; LLMs
- Large language models use a new class of neural network, the transformer
- recurrent networks
- Transformer networks
- Transformer networks typically include an encoder and a decoder
留言板
PLACE_HOLDER
PLACE_HOLDER
PLACE_HOLDER
PLACE_HOLDER