README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

article-epub
============

Description
-----------

A command-line tool written in Python to convert scientific articles available as HTML into ePub form for reading on a supported e-reader. 
Uses a plugin system with a "recipe" for each supported scientific publisher.
Takes an article URL, title, or (ideally) DOI as input.

Obviously, you need to be able to legally access any article you want to convert, e.g. via a university library.

Like most web scraping applications, the provided recipes are liable to break frequently.

Currently, the following publishers are supported:

* ScienceDirect (Elsevier)
* Springer
* Wiley
* Oxford
* BioOne
* Royal Society
* PLoS ONE
* National Institutes of Health (NIH)
* NRC Research Press
* Taylor & Francis

Dependencies
------------

* Linux environment required
* [Calibre](https://calibre-ebook.com/) (to access `ebook-convert`)
* Firefox with headless support
* [Geckodriver](https://github.com/mozilla/geckodriver/releases) installed somewhere in `$PATH`
* [Pandoc](http://pandoc.org/)

Python packages (available with `pip`):

* [Selenium](http://selenium-python.readthedocs.io/)
* [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [pypandoc](https://github.com/bebraw/pypandoc)

Usage
-----

```
usage: article-epub [-h] [-u URL] [-d DOI] [-t TITLE] [-o FILE] [-p]

optional arguments:
  -h, --help  show this help message and exit
  -u URL      URL of article
  -d DOI      DOI of article
  -t TITLE    Title of article
  -o FILE     Name of output file
  -p          List supported publishers
```