src/static-site.Rmd


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283

---
title: Simple Static Site Generator with R, Python, and GNU Make
date: 2018-05-15
output:
  html_document:
    css: style.css
    highlight: pygments
---

# Motivation

I'm hoping to do more blogging in the future, and needed a place to post things.
Over the years I've used a few different blogging platforms, including LiveJournal, Blogger, and most recently, [Jekyll](https://jekyllrb.com/).
I really enjoyed the Jekyll approach of writing plain text and compiling it to a series of fixed (i.e., static) html pages.
There are a number of these so-called static-site generators out there.
Since I mainly work with `R`, [blogdown](https://github.com/rstudio/blogdown) is a good choice and no doubt would have worked just fine.
However, I decided I wanted to attempt to build a basic framework myself.

# Features to Include

1. Version control.

2. Write posts in plain text, ideally with good support for inline code, and then generate decent-looking HTML.

3. An organized, efficient approach to the generation process: keep track of page dependencies and re-build a page to HTML only when a dependency has changed.

5. Auto-generation of an index/table of contents page.

5. Auto-generation of an RSS feed.
   
# Version Control

The whole site is in a `git` repository (you can see it [here](https://github.com/kenkellner/blog)).
Only plain text files are tracked - generated files like HTML, XML are not.
The site is organized into two primary folders: folder `src` contains raw text input files, and folder `build` contains the corresponding output HTML files that make up the actual viewable website.

# Writing Posts

For me (an `R` user), [R markdown](https://rmarkdown.rstudio.com/) was the obvious choice for converting plain text to HTML.
Using a simple syntax you can mix text and nicely-formatted code blocks, run `R` code in the background, and generate visualizations that appear in the HTML file.
It's an amazing tool for creating reproducible analyses.
You are also able to run Python code in `Rmarkdown` using the `reticulate` package (something I hope to do more of the future).

I made a simple R script `build_Rmd.R` to convert `Rmd` files to `html` with the `rmarkdown` package:

```{r,eval=FALSE}
args = commandArgs(trailingOnly = TRUE)
rmarkdown::render(args[1],output_file=paste('../',args[2],sep=''))
```

The `commandArgs` function allows this script to take information from command line standard input (in this case the input and output file names).
This will be needed later.

I kept all the detailed `Rmarkdown` options in the yaml header of each `Rmd` file, which looks basically like this:

```{r,eval=FALSE}
title: A title
date: 2018-01-01
output:
  html_document:
    theme: journal
    css: style.css
    highlight: pygments
```

The header allows me to specify a basic HTML theme (`journal`) which I modified with a custom CSS stylesheet (`style.css`).
The `src` directory also contains a file `_navbar.yml`, which contains links that are put into a simple navigation bar inserted at the top of each page when I compile it with `Rmarkdown`.

# Building the Site

I wanted to explicitly organize the dependency structure of my site. 
For example, building `example-post.html` depends on `example-post.Rmd`, `_navbar.yml` and `style.css`.
Also, Rmarkdown can sometimes take a little while to compile `Rmd` to `html` especially if the document is complicated.
Thus, I wanted to avoid constantly re-compiling all pages and only re-compile a page when its dependencies have changed.

This is a common problem in programming, and a time-tested solution is [GNU Make](https://www.gnu.org/software/make/).
`Make` uses a kind of recipe, a `makefile`, to construct output files from a set of dependencies based on rules you define.
If the dependencies of a particular output file change (e.g., you edit one) and you run `make` again, the output will be re-created.
On the other hand, if the dependencies are unchanged and the output already exists, `make` won't waste time generating it again.

In the first part of the `makefile` for my static site, I set up variables `$SRC` and `$BUILD` corresponding to folders containing text files (`src`) and output html (`build`) respectively.

```{sh, eval=FALSE}
SRC=src
BUILD=build
```

Next the makefile makes a list of the names of all `Rmd` files in the `src` directory, saving it into variable `$RMD_IN`. 
By changing the file extension on each of these `Rmd` files to `html`, and changing their directory from `src` to `build`, the makefile now also has a corresponding list of all the output HTML files it needs to generate, saved in `$RMD_OUT`.

```{sh, eval=FALSE}
RMD_IN = $(wildcard $(SRC)/*.Rmd)
RMD_OUT := $(patsubst $(SRC)/%.Rmd,$(BUILD)/%.html,$(RMD_IN))
```

Next I set up a recipe in the `makefile` to tell `make` how to generate a given `html` file (represented with the wildcard notation `%`) from its corresponding dependencies - the matching `Rmd` file, the navbar file, and the CSS file:

```{sh,eval=FALSE}
$(BUILD)/%.html: $(SRC)/%.Rmd $(SRC)/_navbar.yml $(SRC)/style.css 
	Rscript build_Rmd.R $< $@
```

If either (1) the desired HTML file doesn't exist; or (2) one of the dependencies (specified after the colon `:`) have changed since `make` was last run, then `make` will execute the second line, which runs the `R` script I described earlier for running `Rmarkdown`. 
Otherwise, nothing will happen - the HTML file is already exists and is up-to-date.
No need to build it again!

The code above generates a *given* HTML file. 
I want to build all the HTML files in `$RMD_OUT` when I run `make`, or at least all the HTML files don't exist yet or have updated dependencies.
For that I added another rule called `all`:

```{sh,eval=FALSE}
all: $(RMD_OUT)
	@echo "Done"
```

This is called a "phony" `make` rule because it doesn't explicitly generate a new file (i.e., there's no file on the left-hand side of the colon).
However the rule does have dependencies - the list of all required output HTML files (`$RMD_OUT`).
Thus, running `make all` will force all HTML files to be generated or updated (if necessary) according to the rule above.

# Generating an Index Page

I needed a simple landing page for the blog (`index.html`; see it [here](https://kenkellner.com/blog)) that would list all the posts I've written so far.
I wanted this page to contain two pieces of information: the title of each post (with a link to the page), and the date it was posted, in descending order.
To keep things simple and consistent, I decided to use `R` and `Rmarkdown` for this task as well.

The basic steps are as follows:

1. Search the `src` directory for all `Rmd` files:

    ```{r, eval=FALSE}
    f = list.files(pattern='\\.?md')
    f = f[f!='index.Rmd'] #Exclude this file
    ```

2. Extract the post date and title from each file:

    ```{r,eval=FALSE}
    library(stringr)
    
    get_date = function(filepath){
    	ln = grep('date:',readLines(filepath),value=TRUE)
    	dt = strsplit(ln, ': ')[[1]][2]
    	dt
    }
    
    get_title = function(filepath){
    	ln = grep('title:',readLines(filepath),value=TRUE)
    	t = str_split(ln, ': ', 2)[[1]][2]
    	t = gsub("^'","",t)
    	t = gsub("'$","",t)
    	t
    }

    dates = sapply(f,FUN=get_date)
    titles = sapply(f,FUN=get_title)
    ```

3. Generate links to the HTML pages in markdown format:

    ```{r,eval=FALSE}
    filenames = sub('.Rmd','.html',f) 
    links = paste('[',titles,'](',filenames,')',sep='')
    ```

4. Format and print the dates and linked titles in a neat HTML table:

    ```{r, eval=FALSE}
    tab = data.frame(Date=dates,Title=links,row.names=NULL)
    tab = tab[order(tab$Date,decreasing=T),]
    knitr::kable(tab,row.names=FALSE)
    ```

I decided it would be easiest to generate the index file every time I built the site, regardless if there were any new posts to add. 
Therefore, I added a rule to build it to the `all` recipe in the `makefile`:

```{sh,eval=FALSE}
all: $(RMD_OUT)
	@Rscript build_Rmd.R $(SRC)/index.Rmd $(BUILD)/index.html
	@echo "Done"
```

The index page is built in the same way (using `build_Rmd.R`) the actual blog posts are, and thus has the same navigation bar, styling, etc.

# Generating an RSS feed

I first thought I'd try to create an RSS feed for my blog in `R`, for consistency. 
However, though I found several packages to *read* RSS files, I couldn't find one to *create* an RSS file.

My next choice was a Python module called [python-feedgen](https://github.com/lkiesow/python-feedgen), which I've used in the past. Much like with building the `index.html` page, the approach here was to gather dates, titles, and links for each post and feed them into the module.
I added a second script, `build_rss.py`, to do this, containing the steps below.

1. Load required modules:

    ```{python, eval=FALSE}
    from feedgen.feed import FeedGenerator
    import os
    from datetime import datetime
    ```

2. Find all current posts (i.e., all HTML files in the build directory excluding the index):

    ```{python, eval=FALSE}
    links = os.listdir('build')
    links.remove('index.html')
    ```

3. Create sorted lists of dates, links, and titles for each post. Dates and titles are extracted from the `Rmd` source file.

    ```{python, eval=FALSE}
    src = []
    for i in links:
        src.append('src/'+i.replace('.html','.Rmd'))
    
    dates = []
    for i in src:
        for line in open(i):
            if 'date: ' in line:
                dates.append(line.split(': ')[1].strip('\n'))
    
    titles = []
    for i in src: 
        for line in open(i):
            if 'title: ' in line:
                titles.append(line.split(': ',1)[1].strip('\n'))
    
    dates, links, titles = zip(*sorted(zip(dates, links, titles)))
    ```

4. Initialize the RSS feed object (class `FeedGenerator`) and add metadata:

    ```{python, eval=FALSE}
    fg = FeedGenerator()
    fg.id(leader)
    fg.link(href=leader+'feed.xml', rel='self')
    fg.title('Ken Kellner\'s Blog')
    fg.subtitle(' ')
    fg.language('en')
    ```

5. For each post, add an RSS entry to the object using the dates, links, and titles obtained above:

    ```{python, eval=FALSE}
    leader='https://kenkellner.com/blog/' 

    for i in range(len(dates)):
        fe = fg.add_entry()
        fe.title(titles[i])
        fe.id(leader+links[i])
        fe.link(href=leader+links[i])
        fe.author({'name': 'Ken Kellner', 'email': 'contact@kenkellner.com'})
        fe.description('')
        date_raw = dates[i]+' -0500'
        fe.published(datetime.strptime(date_raw, '%Y-%m-%d %z'))
    ```

6. Output the RSS feed as an XML file (`feed.xml`):

    ```{python, eval=FALSE}
    fg.rss_file('build/feed.xml')
    ```

I added another rule to the `makefile` to generate the RSS feed each time I built the site:

```{sh, eval=FALSE}
all: $(RMD_OUT)
	@Rscript build_Rmd.R $(SRC)/index.Rmd $(BUILD)/index.html > /dev/null 2>&1 
	@python3 build_rss.py
	@echo "Done"
```

# Deploying the Site

I host my websites on a Digital Ocean droplet to which I have full filesystem access.
The fastest way to deploy the site is to simply `rsync` the entire local `build` directory to the `blog` directory of the site on the droplet.
I added another "phony" recipe to the `makefile` to do this:

```{sh, eval=FALSE}
deploy:
	@rsync -r --progress --delete --update build/ \
		kllnr.net:/var/www/kenkellner.com/blog/
```

So, each time I complete I post I run `make all` and then `make deploy`, and my blog is up-to-date!