I want to end up with just <ptag>text text text</ptag>in a single line for every paragraph. (It keeps interpreting tags as real tags here, how do you write html tags so they are interpreted as straight text?)
I'll add back in the header and style sheet myself, I just need to get the body of the book into clean HTML..
I've tried saving the Word file as a .htm and then using view source to get to the html, but it word wraps. How can I get just single line paragraphs with the <ptag> and </ptag> at each end so I can copy that into a plain text editor and edit it as HTML?
I have found that the easiest way to get decent html is to use Google Docs or Gmail. In the latter case, mail your Word doc to your Gmail account, then go to the account and click on Read On Web (or something like that). Then View Source and copy to Clipboard (assuming you use Windows).
I use Word 2000. I believe that 2003 and later give you the option to Save As / Filtered / html, and that that gives a fairly good result.
Another option is to copy the contents from the word edit interface and paste it into a good html editor... formatting is retained, but the machine generated tags in the background are not.
Note, for large works, the copy process will take a long time... watch the hourglass before proceeding to "paste"
just plain text <<
A slight modification of the above...
After copying, paste the contents into a good plain text editor that won't choke on large amounts of text.
After it pastes, select all, copy again, and paste into a good html editor... all of the word formatting will have been removed.
The text editor's settings with respect to "end of line" will impact this... for example, if the text editor is set to place end of line characters and you have a single paragraph ("enter" in word) you may loose your paragraph breaks.
I wasn't completely clear. When I copy/paste from View Source in the .htm version saved from Word, I can then Search>Replace on the stuff that Word puts at each end of a paragraph. So I need that stuff there in some form to do the replace with the plain paragraph tags. I can do that now, but the View Source text is word wrapping. I suspect I really want those paragraphs all on one line?
I suspect I really want those paragraphs all on one line? <<
I'm probably not clear, but...
In general, html source doesn't care about line breaks (carriage returns) contained in html source code. That will vary within some formatting tags, but not in body text.
The basic concept of a browser was that it (like the kindle) would "reflow" content to fit the user's screen no matter what form and format the users screen was in.
As a result, you can freely format text with "hard" line breaks to keep it easily readable when veiwing source or you can use long, unbroken lines... imho, the former is preferable, but, remember that source editors have a bad habit of taking them back out again.
The "br" tag (as well as the "li" tags) can then be used were you want to declare a line break be used when viewed with a browser and a "p" tag when you want a paragraph break.
Note, there has been at least one thread here recently where it appears that "li" tags created a problem in the kindle import.
sigh... too many layers when trying to work from a distance
Really can't tell as I don't know if firefox is wrapping those lines for display or if they contain carriage returns at the end of each line.
Know you are using word, but, what are you using as your html editor and are you also using a text editor, and if so, what editor? Until I know what you can use to get to the source, it's hard to give good answers.
Since you are at the stage of needing and wanting to attack the html source, if you don't have a good html and plain text editor, it's probably time to consider getting them.
www.kompozer.net is a good html editor and has a built in source text editor.
Also, take a look at www.textpad.com for a good text editor as it's been an indispensible tool for me for almost two decades and it won't choke on large files.
[i]I just need to get the body of the book into clean HTML..[/i]
I have found that the easiest way to get decent html from Word is through Google Docs or Gmail. You can then tweak it in any good text editor (not Word!).
Actually, I have found that the DTP is very lenient with respect to html coding. If your paragraph doesn't end with [b]</p>[/b] it doesn't seem to mind. (Nor do today's web browsers, in my experience. I almost never close out paragraphs in my websites.)
The way to write code that displays on this forum is to type [b]& lt;[/b] instead of the first angle bracket, but closing up the ampersand and the L, thus: [b]<[/b]
[i]It still comes back as blocks of text wrapped at about 90 characters in the HTML, but when opened in a browser it reflows to the browser window. So I think it's working okay! [/i]
You can certainly have paragraphs made up of multiple lines. I enter a carriage return (Enter key) every so often so I can see the entire text on my page, rather than have it run off to the side. Breaking the line in a text editor ("non-document mode") doesn't create a new paragraph in an html file.