Bookshelf | Reports | Community | KDP Select

Home » Amazon KDP Support » Ask the Community » Formatting

Thread: Best way to generate clean HTML from Word?


This question is answered.

Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 10 - Pages: 1 - Last Post: Oct 3, 2010 2:49 PM Last Post By: cebpubs
tomwood2

Posts: 20
Registered: 09/20/10
Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 12:01 PM
 
Click to report abuse...   Click to reply to this thread Reply
I want to end up with just <ptag>text text text</ptag>in a single line for every paragraph. (It keeps interpreting tags as real tags here, how do you write html tags so they are interpreted as straight text?)

I'll add back in the header and style sheet myself, I just need to get the body of the book into clean HTML..

I've tried saving the Word file as a .htm and then using view source to get to the html, but it word wraps. How can I get just single line paragraphs with the <ptag> and </ptag> at each end so I can copy that into a plain text editor and edit it as HTML?

Thanks!

Tom

Message was edited by: tomwood2

Message was edited by: tomwood2

Message was edited by: tomwood2

Message was edited by: tomwood2

Message was edited by: tomwood2

Message was edited by: tomwood2
notjohn

Posts: 15,241
Registered: 01/06/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 12:08 PM   in response to: tomwood2 in response to: tomwood2
 
Click to report abuse...   Click to reply to this thread Reply
I have found that the easiest way to get decent html is to use Google Docs or Gmail. In the latter case, mail your Word doc to your Gmail account, then go to the account and click on Read On Web (or something like that). Then View Source and copy to Clipboard (assuming you use Windows).

I use Word 2000. I believe that 2003 and later give you the option to Save As / Filtered / html, and that that gives a fairly good result.
bevehoward

Posts: 243
Registered: 09/22/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 12:13 PM   in response to: tomwood2 in response to: tomwood2
 
Click to report abuse...   Click to reply to this thread Reply
Another option is to copy the contents from the word edit interface and paste it into a good html editor... formatting is retained, but the machine generated tags in the background are not.

Note, for large works, the copy process will take a long time... watch the hourglass before proceeding to "paste"

just plain text <<

A slight modification of the above...

After copying, paste the contents into a good plain text editor that won't choke on large amounts of text.

After it pastes, select all, copy again, and paste into a good html editor... all of the word formatting will have been removed.

The text editor's settings with respect to "end of line" will impact this... for example, if the text editor is set to place end of line characters and you have a single paragraph ("enter" in word) you may loose your paragraph breaks.

Beverly/Howard
tomwood2

Posts: 20
Registered: 09/20/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 12:33 PM   in response to: bevehoward in response to: bevehoward
 
Click to report abuse...   Click to reply to this thread Reply
I wasn't completely clear. When I copy/paste from View Source in the .htm version saved from Word, I can then Search>Replace on the stuff that Word puts at each end of a paragraph. So I need that stuff there in some form to do the replace with the plain paragraph tags. I can do that now, but the View Source text is word wrapping. I suspect I really want those paragraphs all on one line?
bevehoward

Posts: 243
Registered: 09/22/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 12:50 PM   in response to: tomwood2 in response to: tomwood2
Helpful
Click to report abuse...   Click to reply to this thread Reply
I suspect I really want those paragraphs all on one line? <<

I'm probably not clear, but...

In general, html source doesn't care about line breaks (carriage returns) contained in html source code. That will vary within some formatting tags, but not in body text.

The basic concept of a browser was that it (like the kindle) would "reflow" content to fit the user's screen no matter what form and format the users screen was in.

As a result, you can freely format text with "hard" line breaks to keep it easily readable when veiwing source or you can use long, unbroken lines... imho, the former is preferable, but, remember that source editors have a bad habit of taking them back out again.

The "br" tag (as well as the "li" tags) can then be used were you want to declare a line break be used when viewed with a browser and a "p" tag when you want a paragraph break.

Note, there has been at least one thread here recently where it appears that "li" tags created a problem in the kindle import.

Beverly Howard
tomwood2

Posts: 20
Registered: 09/20/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 1:31 PM   in response to: bevehoward in response to: bevehoward
 
Click to report abuse...   Click to reply to this thread Reply
We're close, but just to be even more clear...

These line breaks within the body of a paragraph are not being introduced with carriage returns or anything else.

IMGhttp://i18.photobucket.com/albums/b113/tomwood2/online/htmlsource.png[/IMG]

They are word wraps that one of the programs introduce, I guess Firefox, but maybe Word during the HTML conversion.
bevehoward

Posts: 243
Registered: 09/22/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 2:05 PM   in response to: tomwood2 in response to: tomwood2
Correct
Click to report abuse...   Click to reply to this thread Reply
sigh... too many layers when trying to work from a distance

Really can't tell as I don't know if firefox is wrapping those lines for display or if they contain carriage returns at the end of each line.

Know you are using word, but, what are you using as your html editor and are you also using a text editor, and if so, what editor? Until I know what you can use to get to the source, it's hard to give good answers.

Since you are at the stage of needing and wanting to attack the html source, if you don't have a good html and plain text editor, it's probably time to consider getting them.

www.kompozer.net is a good html editor and has a built in source text editor.

Also, take a look at www.textpad.com for a good text editor as it's been an indispensible tool for me for almost two decades and it won't choke on large files.
notjohn

Posts: 15,241
Registered: 01/06/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 2:10 PM   in response to: tomwood2 in response to: tomwood2
 
Click to report abuse...   Click to reply to this thread Reply
[i]I just need to get the body of the book into clean HTML..[/i]

I have found that the easiest way to get decent html from Word is through Google Docs or Gmail. You can then tweak it in any good text editor (not Word!).

Actually, I have found that the DTP is very lenient with respect to html coding. If your paragraph doesn't end with [b]</p>[/b] it doesn't seem to mind. (Nor do today's web browsers, in my experience. I almost never close out paragraphs in my websites.)

The way to write code that displays on this forum is to type [b]& lt;[/b] instead of the first angle bracket, but closing up the ampersand and the L, thus: [b]<[/b]
tomwood2

Posts: 20
Registered: 09/20/10
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 2:27 PM   in response to: bevehoward in response to: bevehoward
 
Click to report abuse...   Click to reply to this thread Reply
I'm using WinSyntax as a code editor.

I asked the same question here:

http://www.kindleboards.com/index.php/topic,38132.0.html

And used the Tidy program recommended.

http://infohound.net/tidy/

It still comes back as blocks of text wrapped at about 90 characters in the HTML, but when opened in a browser it reflows to the browser window. So I think it's working okay!
cub06h

Posts: 2,990
Registered: 11/23/07
Re: Best way to generate clean HTML from Word?
Posted: Oct 1, 2010 4:35 PM   in response to: tomwood2 in response to: tomwood2
Helpful
Click to report abuse...   Click to reply to this thread Reply
[i]It still comes back as blocks of text wrapped at about 90 characters in the HTML, but when opened in a browser it reflows to the browser window. So I think it's working okay! [/i]

You can certainly have paragraphs made up of multiple lines. I enter a carriage return (Enter key) every so often so I can see the entire text on my page, rather than have it run off to the side. Breaking the line in a text editor ("non-document mode") doesn't create a new paragraph in an html file.
cebpubs

Posts: 654
Registered: 03/08/09
Re: Best way to generate clean HTML from Word?
Posted: Oct 3, 2010 2:49 PM   in response to: cub06h in response to: cub06h
 
Click to report abuse...   Click to reply to this thread Reply
Now there's an ID we haven't seen in a while.
Legend
Helpful Answer
Correct Answer

Point your RSS reader here for a feed of the latest messages in all forums