Friday, March 17, 2017

Removing frames (text boxes) from a word document, after OCR or saving as rtf from pdf document

You saved or scanned a document with OCR software like Abbyy FineReader or OmniPage Pro? You saved as rtf a PDF document and the resultant word document, contains multiple frames?

Frames make the document very hard to edit because all text is placed inside frames. We need to remove those frames if we want to edit the document.

How do we do that?

If you do not care about formatting you do this:

—Open the file which has frames in MS Word
—Save the file as a Plain text file.
—Open the new text file you have just saved in Notepad or WordPad or some other text editor.
—Now Select all the text by pressing Ctrl+A, Copy and paste that into a New MS Word file. Then Save it with any name you want. Frames are gone.

If you do care about formatting:

—Copy everything in the Word document, paste all the text into WordPad, copy all the text in the WordPad document, and paste it back into the Word document.

—Select the entire document by pressing Ctrl+A, and then press Ctrl+Q. This will set every paragraph back to its default condition and most likely remove the frames.


Use a macro to remove text boxes and delete text

Code: [Select]
Sub DeleteTextBoxesAndText()
Dim oShp As Word.Shape
Dim i As Long
For i = ActiveDocument.Shapes.Count To 1 Step -1
Set oShp = ActiveDocument.Shapes(i)
If oShp.Type = msoTextBox Then
End If
Next i
End Sub


Use a macro to remove text boxes but keep text

Code: [Select]
Sub RemoveTextBox2()
    Dim shp As Shape
    Dim oRngAnchor As Range
    Dim sString As String

    For Each shp In ActiveDocument.Shapes
        If shp.Type = msoTextBox Then
            ' copy text to string, without last paragraph mark
            sString = Left(shp.TextFrame.TextRange.Text, _
              shp.TextFrame.TextRange.Characters.Count - 1)
            If Len(sString) > 0 Then
                ' set the range to insert the text
                Set oRngAnchor = shp.Anchor.Paragraphs(1).Range
                ' insert the textbox text before the range object
                oRngAnchor.InsertBefore _
                  "Textbox start << " & sString & " >> Textbox end"
            End If
        End If
    Next shp
End Sub


Use a macro to remove frames

Code: [Select]
Sub RemoveFrames()
    Dim aFrame As Frame
    Dim p As Paragraph
    Dim l As Single

    For Each aFrame In ActiveDocument.Frames
       aFrame.RelativeHorizontalPosition = wdRelativeHorizontalPositionPage
       l = aFrame.HorizontalPosition
       For Each p In aFrame.Range.Paragraphs
          p.LeftIndent = l
       Next p
    Next aFrame
End Sub


Use a macro to remove text boxes but keep text (commercial tool, free trial)

Quickly remove all text boxes and keep texts in Word

and a macro for frames by the same tool:
Quickly remove all frames and keep text from document in Word