Learn how to reformat ePUB Public Domain books by splitting HTML files to put in TOC divisions

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
·@rosatravels·
0.000 HBD
Learn how to reformat ePUB Public Domain books by splitting HTML files to put in TOC divisions
##  Contribution to the Open Source Project: Calibre 


![calibrehome2.jpg](https://cdn.utopian.io/posts/96427b29ffe7608a3019fbffa9f0d270abe8calibrehome2.jpg)

### Github Repository:

https://github.com/kovidgoyal/calibre


### Learn how to  use the Calibre software to **split HTML** files of public domain books to reformat & polish the ePUB book with correct divisions for TOC


#### What Will I Learn?

#### 5 Major Concepts: 

- Learn to find public domain books using the  Gutenberg (http://www.gutenberg.org/) site. There are over 57,000 free public domain ebooks for you to download so that you can read these books with your Calibre software or any eReader you might have.

- Some public domain books are thick and difficult to navigate inside the ePUB book because they are not properly formatted with TOC.   The HTMLs do not match according to new divisions of chapters so you are going to find the right HTML  location  to split up the chapter divisions.

- You are going to learn how to split the HTMLs into 2 sections to get the right location to create proper sections for your chapters.

- You are going to learn how Calibre generates the HTML code for the new split HTML to keep your book consistent in the formatting.  This is the magic of Calibre.  If you know HTML codes, like me, I can key them in, but even if you don't, you need to know what is happening in the editing screen as you make these changes.

-  You are going to learn how these divisions enable you to have the TOC generated with the .toc.ncx that I taught youin the last tutorial.

- Finally, to have a better feel of each chapter division starting on the new page, manually put in the page break <br/> into the HTML section to make your ePUB book even more pleasurable in the reading.  


###  System  Requirements

1.  System Requirements:  Install **Calibre Software 3.23 (updated on May 4, 2018)**
2.  OS Support:  
- Windows (Vista, 7, 8 and 10)
- Linux (32-bit and 64-bit Intel
- Mac OS X (10.9 Mavericks and higher)

Read the Calibre page and download their software onto your computer. 
After download, click execute and start using this software following today's tutorial.


###  Resources about Calibre:
- Website:  https://calibre-ebook.com/
- Github Code Link:  https://github.com/kovidgoyal/calibre
- License:  GNU General Public License
- Translator Activity: https://www.transifex.com/calibre/calibre/
- 10 Years in Development since 2008
- Contributors to Calibre: The Launchpad, Transifex, fosshub.com and github.com services for providing bug, file, translation and code hosting for calibre.



### Difficulty

Intermediate  (It is helpful if you can know a little bit of HTML codes)

### Tutorial Requirements

- [Calibre Tutorial #2:  Turn a PDF eBook to ePub for Mobile Device](https://steemit.com/utopian-io/@rosatravels/use-calibre-software-turn-a-pdf-ebook-to-epub-for-mobile-device)
- [Caliber Tutorial #7:  Learn How To Create a .TOC.NCX for an ePUB](https://steemit.com/utopian-io/@rosatravels/learn-how-to-create-a-toc-ncx-for-an-epub-book-with-calibre)

Please study the above 2 tutorials because I will not go into explanation on these technical features when I use these features in this tutorial.


### Description

There are a lot of public domain books that are freely distributed on the internet.  The copyright of these books have expired, so a team of volunteers have digitized them and make them available for public use.   In the past, these books only come in PDFs format.  But for the last 10 years Project Gutenberg had acquired many volunteers to format these books into ePUB and Kindle mobi format so that people can freely download these books to read them on their devices.

Some of these great literature books are a pleasure to read but the problem is that these books are machine formatted without putting in correct formatting measurement.  Often times, the book is sectioned at the wrong place such that  a table of content cannot be created.  The content of the book is there but the ePUB book is not user friendly at all.  

A lot of the great literary work has large volumes and this makes the ePUB book  difficult to navigate.  What is needed is to **split** the HTML files into smaller chunks so that you can put the books into right chapter divisions.

So in this tutorial, I will show you how to take a thick  'public domain' book and split them into sections by using the HTML splitting tool.  Only when the HTMLs are split in the right location, you will then be able to put in the TOC in the .toc.ncx.  

There is quite a few technical points that I need to explain so that you understand the concepts behind each feature.  This video is going to be a bit longer than the usual videos because of the needed longer explanation of the concepts that are executed.


--------------

## Step 1  

Download a public domain book from Project Gutenberg site.

![gutenberg.jpg](https://ipfs.busy.org/ipfs/QmUCWVVUFkx1uWPS9Cv9rTt1yMjzpYSatgFK342cY3zMxR)

I have downloaded the Thick Bible ePUB version into my computer to use this book as an  example of my video demo.


## Step 2

Add the thick book onto your Calibre software to see what the ePUB book looks like.

![split2.jpg](https://ipfs.busy.org/ipfs/QmcjJf4Fu3AMbtjaMK6kTtrLUHz3vS52XMxDahknY8hFZm)

- as you can see, the book is unformatted
- no clear metadata
- no book cover
- very difficult to navigate


## Step 3 

Customize the book with the correct metadata and book cover.  I will not teach this in the tutorial as I have covered this steps in previous tutorial.  Still I need to go through this step before we move onto the next step.

The result of this step should look like the following:


![split3.jpg](https://ipfs.busy.org/ipfs/Qmaa2omWzHFTeGns6EV7a8MFQGrgKBzAPwY5BtnKwWH2nv)


## Step 4

Use the  Calibre Editor feature.  Many people do not know that they can edit the book inside the Calibre software.  Once you get used to doing this, you will find that it is much easier to edit whatever book you have in Calibre than to do them manually from your source file.

![split4.jpg](https://ipfs.busy.org/ipfs/QmSmdD5XZjG7ZwfBNrhVetwSJNj2ExzcM5K4KxVq4zMfP7)

- You can either use the Edit Book on the Tool Bar
- Or you can use the Short Cut:  Right click to 'Edit Book'


## Step 5 

Find the correct HTML to find the division of the new book

![split4.jpg](https://ipfs.busy.org/ipfs/QmRhoBXy6RaYePdS2gsdU4Aye7tLaPiAbK5AMUuZLbQixA)

If you don't find the right HTML file, you will make the wrong split.

You want to split the file at the right division of where you want your chapter to end and the next chapter to begin.  You need to have the dividing line.  

## Step 6

Split the file at the 'specified location'. This is important.


![split5.jpg](https://ipfs.busy.org/ipfs/QmeBuyTL1edR7owS39MSBAdMYzQEa4kW4n8Q5rn6s15QXi)


When the dividing line is set, then you can use this tab to instruct the Caliber software to divide the new book here.

- Click the upper part of the tab to split.  
- The lower part of the tab is to undo the split.  


## Step 7

Now, click inside the preview panel.  This is the step most people do not know, so go slow in this process.

![split6.jpg](https://ipfs.busy.org/ipfs/QmT1Xse316z8oaRTNypKZ9TyhTYuDWr95HDqg3UXND9EP8)

Please take note that everything has to coincide together on the 3 panels.

HTML coding on the left has to match with the HTML of the content of the book and the Preview of the ePUB book

-  ``` @public@vhost@g@gutenberg@html@files@10@10-h@10-h-3.htm ``` 

-  ``` <h3 id="pgepubid00003"><a id="The_Second_Book_of_Moses_Called_Exodus"/>```

-  ```The Second Book of Moses Called Exodus```

Everything has to coincide here to match in all 3 panels.



## Step 8

The file splits **ABOVE** this location

Why above?

Because that marks the end of a chapter  and the  beginning of a new chapter.

What does the HTML say?

``` <hr class="c3"/> ```

The **hr** actually marks the  horizontal line confirming to us that this is the right division break between the 2 chapters.

## Step 9

Look for the Green line in the Preview Section
Click on the Green Line to make the split

## Step 10

New codes are now given to the division of the 2 books

This part is a bit technical but if you understand HTML, you will understand why Calibre software is magic.  The project owner has already keyed into the software to generate the new codes for the new split HTML file.  Each HTML file needs to have special codings at the beginning.  We can't just chop the HTML half and expect the new HTML file to run smoothly without the HTML coding.

All ebooks are HTML based so if you know HTML, you can put in the code yourself.

The split HTML file is now called Split 1 as it is split from html3.

```@public@vhost@g@gutenberg@html@files@10@10-h@10-h-3.htm_split1```

And when this split occurs here, Calibre software generates this HTML code so that you can format the the next new chapter correctly:

```css <?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org"/>

<title>The Project Gutenberg eBook of The Old Testament Of The King James Version Of The Bible.</title>




<link type="text/css" href="0.css" rel="stylesheet"/>
<link type="text/css" href="1.css" rel="stylesheet"/>
<link type="text/css" href="pgepub.css" rel="stylesheet"/>
<meta name="generator" content="Ebookmaker 0.4.0a5 by Marcello Perathoner &lt;webmaster@gutenberg.org&gt;"/>
</head>
<body>
```

Remember to SAVE your work after you have done all the division markings.


---------

## Step 11 

Create the .toc.ncx

I have taught this in the previous tutorial so I will not teach this concept in this tutorial, but nevertheless, I need to execute this step for step 12.

## Step 12  Final Polish

To make the book function even better with the new chapter occurring right at the top of the page, you can do this.

- Add in ```<br>``` before the ```<h3 id="pgepubid00004">```

----------------



# Video Tutorial 

https://www.youtube.com/watch?v=Hksn5wnrtZg



--------

# Supplementary Resources:

#1.  You can download my ePUB formatted book of the Bible: [Click here](https://github.com/rosatravels/Calibre/blob/19d44b2c8b9287f309abee8945e2a41ca42a68b5/The%20King%20James%20Version%20of%20the%20B%20-%20God.epub)


#2.  Other Resources on Calibre on my Github site:
https://github.com/rosatravels/Calibre


-------


## Curriculum:

Please follow the Series of Videos on Calibre:

- [Calibre Tutorial #1:  Create Clickable Table of Content](https://steemit.com/utopian-io/@rosatravels/step-by-step-video-tutorial-how-to-create-ebooks-with-clickable-toc-using-calibre)

- [Calibre Tutorial #2:  Turn a PDF eBook to ePub for Mobile Device](https://steemit.com/utopian-io/@rosatravels/use-calibre-software-turn-a-pdf-ebook-to-epub-for-mobile-device)

- [Calibre Tutorial #3:  Turn News Magazines from Web into ePUB Books](https://steemit.com/utopian-io/@rosatravels/learn-how-to-turn-news-magazines-from-web-into-epub-book-to-read-on-our-mobile-device-offline)

- [Calibre Tutorial #4: Use Calibre to Search for Authors and to purchase the Best eBook with the lowest price, many formats with no DRM restrictions](https://steemit.com/utopian-io/@rosatravels/use-calibre-to-search-for-author-and-the-best-ebook-in-49-ebookstores-with-lowest-price-no-drm-many-formats-and-languages)

- [Calibre Tutorial #5:   Learn how to create a Python  File for the Calibre software to Turn a Blog into an ePub Book](https://steemit.com/utopian-io/@rosatravels/learn-how-to-create-python-coding-file-to-turn-a-blog-into-an-epub-book)

- [Calibre Tutorial #6:  Learn How To Put Together a Few ePub Books into One ePUB Anthology](https://steemit.com/utopian-io/@rosatravels/learn-how-to-put-together-a-few-epub-books-into-one-epub-anthology)

- [Caliber Tutorial #7:  Learn How To Create a .TOC.NCX for an ePUB](https://steemit.com/utopian-io/@rosatravels/learn-how-to-create-a-toc-ncx-for-an-epub-book-with-calibre)

- [Calibre Tutorial #8:  Learn How To Create a .MOBI Kindle book from an ePUB format](https://steemit.com/utopian-io/@rosatravels/learn-how-to-create-a-mobi-amazon-kindle-from-epub-book-with-right-formatting)


-----------------

Thank you for your time and kind attention,

Rosa







👍 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,