Class HWPFDocumentCore

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable
    Direct Known Subclasses:
    HWPFDocument, HWPFOldDocument

    public abstract class HWPFDocumentCore
    extends POIDocument
    This class holds much of the core of a Word document, but without some of the table structure information. You generally want to work with one of HWPFDocument or HWPFOldDocument
    • Field Detail

      • STREAM_OBJECT_POOL

        protected static final java.lang.String STREAM_OBJECT_POOL
        See Also:
        Constant Field Values
      • STREAM_WORD_DOCUMENT

        protected static final java.lang.String STREAM_WORD_DOCUMENT
        See Also:
        Constant Field Values
      • FIB_BASE_LEN

        protected static final int FIB_BASE_LEN
        Size of the not encrypted part of the FIB
        See Also:
        Constant Field Values
      • RC4_REKEYING_INTERVAL

        protected static final int RC4_REKEYING_INTERVAL
        [MS-DOC] 2.2.6.2/3 Office Binary Document ... Encryption: "... The block number MUST be set to zero at the beginning of the stream and MUST be incremented at each 512 byte boundary. ..."
        See Also:
        Constant Field Values
      • _objectPool

        protected ObjectPoolImpl _objectPool
        Holds OLE2 objects
      • _ss

        protected StyleSheet _ss
        Holds styles for this document.
      • _cbt

        protected CHPBinTable _cbt
        Contains formatting properties for text
      • _pbt

        protected PAPBinTable _pbt
        Contains formatting properties for paragraphs
      • _st

        protected SectionTable _st
        Contains formatting properties for sections.
      • _ft

        protected FontTable _ft
        Holds fonts for this document.
      • _mainStream

        protected byte[] _mainStream
        main document stream buffer
    • Constructor Detail

      • HWPFDocumentCore

        protected HWPFDocumentCore()
      • HWPFDocumentCore

        public HWPFDocumentCore​(java.io.InputStream istream)
                         throws java.io.IOException
        This constructor loads a Word document from an InputStream.
        Parameters:
        istream - The InputStream that contains the Word document.
        Throws:
        java.io.IOException - If there is an unexpected IOException from the passed in InputStream.
      • HWPFDocumentCore

        public HWPFDocumentCore​(POIFSFileSystem pfilesystem)
                         throws java.io.IOException
        This constructor loads a Word document from a POIFSFileSystem
        Parameters:
        pfilesystem - The POIFSFileSystem that contains the Word document.
        Throws:
        java.io.IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.
      • HWPFDocumentCore

        public HWPFDocumentCore​(DirectoryNode directory)
                         throws java.io.IOException
        This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default. Used typically to open embeded documents.
        Parameters:
        directory - The DirectoryNode that contains the Word document.
        Throws:
        java.io.IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.
    • Method Detail

      • verifyAndBuildPOIFS

        public static POIFSFileSystem verifyAndBuildPOIFS​(java.io.InputStream istream)
                                                   throws java.io.IOException
        Takes an InputStream, verifies that it's not RTF or PDF, builds a POIFSFileSystem from it, and returns that.
        Throws:
        java.io.IOException
      • getRange

        public abstract Range getRange()
        Returns the range which covers the whole of the document, but excludes any headers and footers.
      • getOverallRange

        public abstract Range getOverallRange()
        Returns the range that covers all text in the file, including main text, footnotes, headers and comments
      • getDocumentText

        public java.lang.String getDocumentText()
        Returns document text, i.e. text information from all text pieces, including OLE descriptions and field codes
      • getText

        @Internal
        public abstract java.lang.StringBuilder getText()
        Internal method to access document text
      • getCharacterTable

        public CHPBinTable getCharacterTable()
      • getParagraphTable

        public PAPBinTable getParagraphTable()
      • getStyleSheet

        public StyleSheet getStyleSheet()
      • getListTables

        public ListTables getListTables()
      • getFontTable

        public FontTable getFontTable()
      • getMainStream

        @Internal
        public byte[] getMainStream()
      • getEncryptionInfo

        public EncryptionInfo getEncryptionInfo()
                                         throws java.io.IOException
        Overrides:
        getEncryptionInfo in class POIDocument
        Returns:
        the encryption info if the document is encrypted, otherwise null
        Throws:
        java.io.IOException - If retrieving the encryption information fails
      • updateEncryptionInfo

        protected void updateEncryptionInfo()
      • getDocumentEntryBytes

        protected byte[] getDocumentEntryBytes​(java.lang.String name,
                                               int encryptionOffset,
                                               int len)
                                        throws java.io.IOException
        Reads OLE Stream into byte array - if an EncryptionInfo is available, decrypt the bytes starting at encryptionOffset. If encryptionOffset = -1, then do not try to decrypt the bytes
        Parameters:
        name - the name of the stream
        encryptionOffset - the offset from which to start decrypting, use -1 for no decryption
        len - length of the bytes to be read, use Integer.MAX_VALUE for all bytes
        Returns:
        the read bytes
        Throws:
        java.io.IOException - if the stream can't be found