Damn Those Errors!

Today’s post comes from author Carmen Webster Buxton.  Carmen was born in Honolulu, and experienced a childhood on the move, as her father was in the US Navy. She has been a librarian, a teacher, a project manager, a wife, and a mother, although not in that order. She now lives in Maryland with her husband, her daughter, and an elderly beagle who has his own pet cat. She writes science fiction, mostly set in the far future, and the occasional fantasy. The Sixth Discipline is her first novel to be published as an ebook, and its sequel No Safe Haven was published shortly after it. A third (unrelated) novel called Tribes will follow soon.

If you would to learn more about Carmen and her work you can visit her at www.carmenspage.blogspot.com or follow her on Twitter at @CarmenWBuxton


Why are there so damn many errors in ebooks?


20110629-120055.jpg(Please note that this post refers to errors in ebooks published by traditional print publishers, not self-published ebooks, which can have their own set of problems stemming from other causes)

If you are a reader who prefers digital reading, you will have noticed that some ebooks have errors in them, either formatting errors or actual incorrect words. Some errors are minor—the occasional word with a hyphen where it shouldn’t be, or a paragraph that block indents when it shouldn’t. Some are more annoying because they make the story hard to read and understand, as when paragraphs of dialog that should be separate all run together. Every now and then you will see truly awful errors, like a paragraph of text that repeats, or letters with diacritical marks that convert into gibberish.

Print books might have a typo or two get past the proofer, but they don’t suffer from as many errors as ebooks. Which is interesting, because print books are mostly to blame for ebooks errors, or rather the workflow for print books is to blame.

Most authors write in MS Word or something like it, and send their manuscript to the editor in that format. Some editors even rely on the Word “Track Changes” feature to convey suggested edits. Once substantive edits are final, the Word document is converted to something that allows for easier and more sophisticated page layout, like InDesign. It could be other software, but I’m going to say InDesign just for simplicity’s sake. Note that this workflow is all driven by the need to get a manuscript into printed pages quickly. Print books have been the main event for so long, they are still very much at the core of what publishers do.

Once the document is in InDesign, the Word file is no longer edited, and thus, last minute changes and corrections are not in that file. On the other hand, the file in InDesign is often tweaked to make the pages look good. If the composition person wants a word to hyphenate, and InDesign isn’t breaking the word, or is breaking it in a bad place, the compositor will often type in a hard hyphen. If a line is breaking in a bad place, they might use a nonstandard character like a non-breaking space to control where the line breaks. Whatever it takes to make the pages look pretty, the composition staff will do it. Using software like InDesign, composition staff will produce a PDF that is then sent to the printer to produce the printed book.

That’s a total of three file formats for each book: MS Word, InDesign, and PDF. Now that ebooks are part of the picture, publishers have to decide which file to use to convert to ebook format. They can use the MS Word file; it converts easily, but doesn’t have the latest corrections. Also, it’s hard to make Word convert reliably in a way that identifies things like chapter breaks. They can use the InDesign file; Indesign even offers a conversion to ePub. But anything that has been done to the file to make pages look good is going to be hard for a conversion to give good results. The PDF file will have absolutely every correction, but PDF has the same page layout constraints. That’s how you get words like “hyphen- ation” in an ebook. Besides, PDF is notoriously difficult to convert to text in a way that yields reliable, readable paragraphs 100% of the time.

In addition to new books, a lot of publishers are now looking at converting their backlist books. If the book is old enough, there might not even be a file to convert. Instead, they have to scan the printed book (or possibly the “boards”) with OCR (optical character recognition) software. But OCR can make a lot of mistakes. “Freddie” could become “Freclclio.” Because backlist books are so often OCR’d, I always recommend getting the free sample before you buy the book, just to see if the publisher has corrected the formatting and other errors that creep in.

Right now publishers are scrambling to adjust to the digital world. They haven’t yet figured out that they can take advantage of the fact that ebooks can be corrected much more easily than print books. Books are now data, and publishers need to know that. In the long run, the most likely solution will be that book publishers will follow information providers’ lead and begin to convert their data to a neutral format like SGML (standard generalized markup language) or XML (extensible markup language). Instead of marking up the text in a typesetting system according to how it should look (e.g., 14 point Bodoni bold, 11 point Helvetica italic), mark it up to show what it is (e.g., chapter, chapter title, paragraph, etc.). Once you have data in this kind of structured format, you can more reliably convert it to whatever output format is needed: print, web, or ebook. You can impose rules to make the data valid (e.g., every chapter has to have a chapter title). If you make the SGML or XML text the source for all outputs, you can proof and correct it once without having downstream effects on other formats. You can even load it into a database and control who edits it.

Basically, it’s getting better, but ebook workflow still has a ways to go.

About Misty

Your friendly neighborhood narcissist. I'm sarcastic, cynical and a bit cranky. I own a soap box so big that sometimes I have difficulty stepping down off of it, and I'm about 94% certain I have multiple personalities. I don't sleep enough, and I read more than any person should ever consider normal. I have anger management issues, especially when I'm stuck in traffic and I have an unhealthy obsession with my Kindle. I am a vampire lovin', zombie obsessed, book-in-hand, iPod freak. You either love me or hate me. You be the judge.