$$TF=\frac{n_{i, j}}{\sum_kn_{k, j}}$$ term frequency $$IDF=log\left(\frac{D}{Df(t)}\right)$$

Static Embedding: VectorProcessor.py

The tokens are vectorized based on a shallow neural network structure. Usually, it can be simply expressed as $$S^{dict}_{out}=W^{wv}_{out, in}V^{word}_{in}$$ \(in\) features the dimension of the each word's vector. \(out\) represents the size of the dictionary. \(S^{dict}_{out}\) is a softmax output in a Word2Vec model.

Word2Vec has two modes including the continuous bag of words (CBOW) and Skip gram (SG). Lacking of the corpus, we can load the pretained model via Gensim library and tune it to meet the specific needs.

Dynamic Embedding: DynamicProcessor.py

In this part, you will see some basic untrained models purely for the purpose of studying the stuctures and principles of those models. Usually, under limited database and computation resources, we directly utilize the pretrained models from Hugging Face with transformers library.