Reviewing my old live journal posts

I’m about to BARE MY SOUL to the internet. Well, the soul of my teenage self. Get ready!

Live Journal, you may remember, was(/is - it does still exist) was a blogging site before we really knew what blogging was. It was both a place to put diary entries and those quizzes that got passed around in the day. It was also used for FanFic and community gathering. I did NOT use my (main) account for fan gathering. (Though I did write some excellent/terrible fanfics. Didn’t everyone have a Lord of Rings self-insert character?)

I don’t really remember when I started this idea, but I thought it would be fun to see just how emo I was in 2006. Let’s go!

Step 0: Load the libraries

library(tidyverse)
library(lubridate)
library(tidytext)
library(hunspell)
library(ggrepel)
library(cowplot)

Step 1: Clean the data

I downloaded all my past Live Journal entities to a folder on my desktop in the same CSV format, so that I could easily load them in for analysis. I am pleasantly surprised that Live Journal made it so easy to download my history like this! I did have to click the same button a ton of time - but I did get all my data.

The next step was to take every journal and separate out the individual words using ‘tidytext.’

lj_words <- lj_df %>%
  select(itemid, eventtime, logtime, subject, current_music, current_mood, event) %>%
  mutate(across(c(eventtime, logtime), ymd_hms),
         year = year(logtime),
         month = month(logtime),
         ) %>%
  mutate(event = str_remove_all(event, "'")) %>%
  unnest_tokens(word, event, token = "words", format = "html", strip_url = FALSE)

Not included in this blog post, for privacy of my teenage friends, I also cleaned and changed names of my friends and locations to clean the data and protect their privacy. For example, instead of the name “Linda” you may see “nameofsister”.

I was (and continue to be) terrible at spelling words correctly and also terrible at checking what I’ve typed after the fact. I use ‘Hunspell’ here in an attempt to fix some of the most common issues. Does this spell check get everything? No! But alas, I am a terrible speller and we move on in life.

lj_words_spell_check <- lj_words_protect %>%
  anti_join(my_stop_words, by = "word") %>%
  count(word) %>%
  rowwise() %>%
  mutate(spell_check = hunspell(word)) %>%
  filter(length(spell_check) >= 1) %>%
  mutate(suggest = hunspell_suggest(spell_check))

lj_correct <- lj_words_spell_check %>%
  filter(length(suggest) > 0) %>%
  mutate(suggest_pick = pluck(suggest, 1)) %>% # just pick the first one because I am lazy
  ungroup() %>%
  unnest(suggest_pick) %>%
  select(word, suggest_pick)

lj_words_corrected <- lj_words_protect %>%
  left_join(lj_correct, by = "word") %>%
  mutate(word = coalesce(suggest_pick, word)) %>%
  unnest_tokens(output = "word", input = word) # used because sometimes the correction is actually 2+ words now

Step 2: Now we move on to analysis!

The data is clean, or at least as clean as it is going to get today.

Word counts

I start with TF-IDF. The goal here is to see what I was talking about each year and how it may differ as I got older. As a reminder, I have changed the names of all my friends and family for privacy. That way you don’t know who “nameofbestfriend” is and why I stopped mentioning “nameofbestfriend” in 2006. (We had a bit of a falling out at the end of HS.)

tfidf <- lj_words_corrected %>%
  count(year, word) %>% 
  tidytext::bind_tf_idf(word, year, n) %>%
  anti_join(stop_words, by = "word") %>%
  group_by(year) %>%
  top_n(n = 10, wt = tf_idf) %>%
  ungroup() %>%
  filter(n >= 2)

Look at 2009 - clearly my only entries were my Norse myth college class. I remember I put a few of my class papers on my Live Journal.

We can look at the differences between TF-IDF and a regular word count, while accounting for stop words.

wordcount <- lj_words_corrected %>%
  count(year, word) %>%
  anti_join(stop_words, by = "word") %>%
  group_by(year) %>%
  top_n(n = 10, wt = n) %>%
  ungroup() %>%
  filter(n >= 3)

Sentiment

Next I look at sentiment. I remember using live journal to be super angsty. I assumed that I would largely see negative sentiment and words across the years.

df_plot <- lj_words_corrected %>%
  left_join (get_sentiments("bing"), by = "word" ) %>%
  rename(bing_sentiment = sentiment) %>%
  left_join (get_sentiments("nrc"), by = "word" ) %>%
  rename(nrc_sentiment = sentiment) %>%
  pivot_longer(cols = c(bing_sentiment, nrc_sentiment), 
               names_to = "sentiment_type", values_to = "sentiment") %>%
  count(sentiment_type, year, sentiment) %>%
  filter(! is.na(sentiment) ) %>%
  rename(count = n ) %>%
  group_by(sentiment_type, year) %>%
  mutate(total = sum(count)) %>%
  ungroup() %>%
  mutate(percent = count / total,
         year_month = ymd(str_c(year, "01", "01", sep = "-"))
         )

Instead, it seems my words were largely more positive then negative. (Outside of 2009 - which is either largely from my anxiety attacks that year or Norse mythology is just super depressing.) Not as ansgty as I remember!

Ah, but did I mark my “current mood” / how are you feeling” part as positive as my words are? You be the judge.

End :)

So there you have it. Was teenage Erin as emo as I thought? Maybe not! Or maybe I wrote all the most emo journals in my physical diary. The world will never know (because those diaries have been lost).