library(tidyverse)
library(readr)
library(USAboundaries)
library(sf)
library(leaflet)
library(RColorBrewer)
library(leaflet.extras2)
library(geojsonsf)Hello everyone! I am still currently working on a project with the Wilford Woodruff Papers. In this post, I will talk more about my process of creating an interactive graph for a location analysis of places mentioned in Wilford’s writings. Specifically, I will be covering how I added dates to my data so that I can categorize Wilford’s journals by the date that he wrote them. I will be using the following packages in R: tidyverse, readr, USAboundaries, sf, leaflet, RColorBrewer, leaflet.extras2, geojsonsf.
Getting Data and Problems
Currently, with the data that I was given from the Wilford Woodruff Papers, there are no dates that are includes in the table. However, in the Consulting Class this semester, there is a group that is also working the the Wilford Woodruff Papers. They were able to pull dates from the text data and make it into a column. I was able to get this data from them so that I can utilize it in my project.
There was a major problem however in getting this data, in that there was no key that I could join the date column on. In order for the consulting group to get dates, they also separated the text column per day.
My first idea for solving this problem was to use a package called fuzzyjoin. This package would allow me to do a partial match to the text so that I could match the text column from the consulting group to the original text column. This seemed like a simple solution at first but then proved to be a little more difficult. At first when I tried to join the data, my R session would abort and restart every time I would run the code. I then tired to take a sample of the text and fuzzy join that but ran into several other problems including open brackets in the text. Through a lot of trial and error, I then decided to take a different route in joining the tables.
I ended up joining the dates table to the original data by using the code that the consulting group used. On the original text I used paste0() to add the id for each page at the end of the text. I added a unique identifier to for sides of the id so that once I separated the text using the code from the consulting group, I would be able to easily pull out the id. Here is what this code looked like:
sorted_data <- journal[order(journal$id),]
sorted_data$text_transcript <- paste0(sorted_data$text_transcript, '%%%%', sorted_data$id, '%%%%')papers2 <- papers %>%
separate(text, c("text", "id", "extra"), "%%%%") After applying the code from the consulting group and then separating the data, not all of the columns contained an id number so I dropped those from the table. I decided to drop these columns because in the original data set, the rows contain text from multiple days, so I will just be using the first date mentioned.
library(lubridate)
papers3 <- papers2 %>% drop_na() %>%
subset(select = -c(extra)) %>%
mutate(day = mdy(date)) %>%
select(id, day)
head(papers3) id day
1 1 1836-09-10
2 2 1836-09-19
3 4 1836-09-21
4 5 1836-10-05
5 6 1836-10-12
6 7 1836-10-21
After creating the table with the dates and id, I was able to join this to the original data set by id.
journal$id <- as.character(journal$id)
data_dates <- journal %>% inner_join(papers3, by='id')Adding Dates to Graph
After adding the dates to my data set, the next step was to add the dates to my graph. I wanted to do this by adding a time slider to the top of my leaflet graph. I decided to use addTimeslider from the R package, leaflet.extras2 to accomplish this.
In my previous blog post, I talked about how I found the geometries for the different counties mentioned in Wilford’s journals. The attribute addTimeslider however, does not accept geometries and can only use data that is of type point or linestring. In order to get point data, I found city data were I am able to get the latitude and longitude of each row instead of geometries. I did lose a little bit of data by doing this, but I feel as though it is an important element in my graph that it is okay to do.
Below you can see how the time slider was implemented in my graph.
data <- sf::st_as_sf(city_data)
data2 <- city_data %>%
select(day, point)
data2 <- sf::st_as_sf(data2)
leaflet() %>%
addProviderTiles('CartoDB.Positron') %>%
setView(-98.5795, 39.8283, zoom = 3) %>%
addTimeslider(data = data2, radius = 4, weight = 5, options = timesliderOptions(position = "topright", timeAttribute = "day", range = TRUE))My next steps to finalizing my graph are to add the attributes that I previously had to my graph, including adding a legend, colors, and popups. After doing this, I want to look into the stories of Wilford Woodruff and provides statistics places that Wilford has been, including the locations that are most mentioned and where he traveled through time.