Swift's Blog

U-Net原理及代码实现

U-Net是医疗领域进行语义分割的利器，随着AIGC的爆火，U-Net已成为Diffusion Model的backbone，有必要详细记录下。
2024-08-08 02:12:35
CV
U-Net

CNN
Read more
Mixtral MoE代码解读

一直对稀疏专家网络好奇，有些专家没被选中，那么梯度是否为0，这一轮被选中有梯度，下一轮没被选中无梯度，模型可以训练收敛吗？
2024-08-06 02:42:13
NLP
LLM

Sparse MOE
Read more
千卡GPU训练难点

没吃过猪肉，但也要见识下猪跑：你的真实姓名的回答千卡训练经验的含金量：Frossmann的回答
2024-08-05 01:44:00
Machine Learning
GPU
Read more
常见金融术语

https://m.cfa.cn/cfa/2413.html
2024-07-16 01:36:26
Finance
金融术语
Read more
DSSM双塔特征交互

传统的DSSM双塔无法在早期进行user和item侧的特征交互，这在一定程度上降低了模型性能。我们想要对双塔模型进行细粒度的特征交互，同时又不失双塔模型离线建向量索引的解耦性。下面介绍两篇这方面的工作。
2024-07-09 01:00:42
搜广推
Feature Interaction

Dual Tower
Read more
Learn To Rank

在信息检索中，给定一个query，搜索引擎召回一系列相关的Documents，然后对这些Documents进行排序，最后将Top N的Documents输出。
2024-07-07 02:50:12
搜广推
Algorithm

Neural Networks
Read more
两种神经网络参数初始化方法

重点介绍一下Xavier和Kaiming初始化：
2024-06-21 01:28:57
Machine Learning
Algorithm

Neural Networks
Read more
LLM Inference Performance Engineering

https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices
2024-06-12 01:15:00
NLP
LLM

Inference

Throughput
Read more
LLaMA2详解

LLaMA2的模型结构拆解：
2024-06-02 02:19:35
NLP
LLM

LLaMA
Read more
GPU利用率

英伟达官方的GPU利用率的定义如下：
$GPU Util rate = \frac{number \ of \ active \ SM}{number \ of \ total \ SM} \times 100\%$
2024-05-19 14:21:32
Machine Learning
GPU
Read more

/23