NLP Discussion

Announcements

Last application assignment done!
Remaining units
- Bring it together for applications
- Ethics and Algorithmic Fairness
- Reading and discussion only for both units. Discussion requires you!
Two weeks left!
- next Tuesday (lab on NLP)
- next Thursday (discussion on applications)
- following Tuesday (discussion on ethics/fairness)
- following Thursday (concepts final review)
Early start to final application assignment (assigned next Thursday, April 24th) and due Friday May 9th at noon
Concepts exam at lab time during finals period (Tuesday, May 6th, 11-12:15 in this room)

Our algorithms need features coded with numbers
- How did we generally do this?
- What about algorithms like random forest?

I watched a scary movie last night and couldn’t get to sleep because I was so afraid
I went out with my friends to a bar and slept poorly because I drank too much
How represented with bag of words?
binary, count, tf, tf-idf
What are the problems with these approaches
- relationships between features or similarity between full vector across observations
- context/order
- dimensionality
- sparsity
What is benefit of stemming, stop words, etc with BoW (and n-grams)
- reduce dimensionality
- reduce sparsity
- remove noise

TF - number of times a word/term appears in document divide by total number of words in document

IDF - log(number of documents in corpus divided by number of documents containing the word/term)

TF-IDF = TF * IDF

Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/
Meaningful features? (based on domain expertise)
Include sentiment
Lower dimensional
Very limited breadth
Can add domain specific dictionaries

word2vec (google) for embeddings - CBOW - skipgram

Components of word2vec

fasttext (facebook)

glove (Stanford)

BERT is pre-trained on a large corpus of text using two main tasks:

Masked Language Model (MLM): Randomly masks some of the words in the input text and trains the model to predict the masked words based on the surrounding context. This helps BERT learn bidirectional representations by considering both left and right contexts
Next Sentence Prediction (NSP): Trains the model to predict whether a given pair of sentences are consecutive in the original text. This helps BERT understand the relationships between sentences.
Uses 12 hidden layers with 768 units each
Use 12 attention heads (BERT uses self-attention mechanisms to weigh the importance of different words in a sentence. This allows the model to understand the context of each word by considering its relationships with other words)

Advantages