It took me 20 minutes of unfruitful (fruitless?) research and then a mere 5 minutes of basic xml editing to fix the document in question. Then I spent an hour writing (and proofreading and proofreading again) this guide and now it is 1 am...
Steps:
- Change your file extension from docx to zip *
- Extract word/document.xml *
- Using a Programmer's Notepad (I prefer Notepad++) replace >< with >\r\n< in extended mode to make the XML more readable, putting each tag on its own line. (The expression \r\n stands for new line in windows.) You can use regexp mode or multi-line replace in other text editors, or copy a newline/hidden paragraph symbol and paste it in the replace box. (Which works even in Word.)
- Import the modified word/document.xml back into the archive *
- Rename it back from zip to docx *
- Attempt to open the document
- Note that this time the error message will be more useful, for example it will tell you that the error is at: line 13540, as opposed to line 2 column 0.
- Go to the line specified (was 13540 for me) and remove it. And the tags around it too, if necessary. Make sure you keep the XML well-formatted! (You can attempt to fix the erroneous line instead of removing it, but usually the indicated line is as useless as the other lines around it, so removal will not result in any loss of data. See the note** below for why this is.)
- Import document.xml and rename zip/docx as usual. *
- Repeat steps 6-9 until your document is recovered. (Took me 5 repeats, as more errors arose at lines 15870, 13595, 13222 and 10835.*** (Not sure why it found errors in reverse order, but it did.)
* These steps are unnecessary if you use Total Commander (or other advanced file managers). It can look inside docx files as if they were folders because it automatically recognizes them as compressed archives. It also lets you edit files easier, just press F4. And it even detects file changes and asks if it should reimport your edited files into the archive upon closing the editor.
** Note: To me it seems that none of the errors were real. While the source code was extremely bloated and wasteful, every indicated error was a line with completely well formed XML. (Around it was good XML too.) For pages and pages, almost every line repeated the same statement:
*** There is no reason to pay attention to my line numbers, I only included them to show you how many lines my document had. Over ten thousand lines of XML just to describe an 8 pages long document... Crazy!
*** And of course the case of word finding errors in reverse order is quite peculiar.
*** And of course the case of word finding errors in reverse order is quite peculiar.
p.s.: Helpful people over at answers.microsoft.com can recover your document for you (I saw pages upon pages of recovered documents), and now that I figured it out I can attempt to help you here as well. I shall remain subscribed to this topic.
http://3ice.hu/
Edit reason: Removed unnecessary step: "Delete a random file from the archive and it will trigger text recovery in word. And what's great is that I was able to recover all of my equations, not just the text! (I deleted the entire customXml folder for it looked suspicious/useless. But you can probably delete something smaller, like word\fontTable.xml or word\webSettings.xml.)"
 
No comments:
Post a Comment