VeryUtils Text Extraction Command Line utility allows to extract text from any types of files.

Press release: VeryUtils Text Extraction Command Line


Publisher: VeryUtils.com Inc.

VeryUtils Text Extraction Command Line utility allows to extract text from the various types of files. The extracted text can be combined into one file or/and split into few files. The converted text files can be reused for index or any other purposes easily. The Text Extraction Command Line utility handles various command line parameters to be able to extract text from files. The command line options use the syntax "all2text.exe [options ...]", all parameters must be separated by a space. Options can appear in any order on the command line so long as they are paired with their related parameters. Use the "all2text.exe -?" command line to get help on the command line syntax and parameters. Supported formats for input files: AZW, AZW3, CHM, DjVu, DOC, DOCX, EML, EPUB, FB2, FB3, HTML, LIT, MD, MHT, MOBI, ODP, ODS, ODT, PDB, PDF, PPT, PPTX, PRC, RTF, TCR, TXT, WPD, WRI, XLS, XLSX. The IFilter interface will be used for files with unknown extensions. The utility works from the command line, without displaying any user interface. This is useful to integrate the text processing options to other applications, for example. Execution order of operations: * Extract text from input file(s). * Format text: remove spaces, linebreaks, etc. (if options are specified). * Combine files into one file (if option is specified). * Split text (if options are specified). * Apply rules for pronunciation correction (if option is specified). * Save output file(s). VeryUtils Text Extraction Command Line Examples: Extract text from "book.doc" and save as "book.txt" to the output folder: all2text.exe -f "d:\Docs\book.doc" -v "d:\Text\" Also this variant can be used if necessary (when the only one input file is specified): all2text.exe -f "d:\Docs\book.doc" -out "d:\Text\book.txt" Extract text from BOOK.DOC and save as "New Book.txt": all2text.exe -f "d:\Docs\book.doc" -v "d:\Text\" -p "New Book" Extract text from the Microsoft Word and RTF documents, remove empty lines and save text files in UTF-8 encoding: all2text.exe -f "d:\Docs\*.doc" -f "d:\Docs\*.rtf" -v "d:\Text\" -e utf8 -rm Extract text from all files in the specified folder, unite and save as "Document.txt": all2text.exe -f "d:\Docs\*.*" -v "d:\Text\" -p "Document" -u Extract text from 1.DOC, divide on parts with size 100 KB and save as text files "Document 20.txt", "Document 21.txt", etc.: all2text.exe -f "d:\Docs\1.doc" -v "d:\Text\" -p "Document" -a -n 20 -t 100000 Extract text from BOOK.FB2, find the words "CHAPTER" and "CONTENTS" to divide text on parts and save as files with the names "Book 1.txt", "Book 2.txt", etc.: all2text.exe -f "d:\Book\book.fb2" -v "d:\Text\" -p "Book" -k "CHAPTER" -k "CONTENTS" Extract text from BOOK.EPUB, find "###" to divide text on parts, remove "###" from text and save each part as a new file: all2text.exe -f "d:\Book\book.epub" -v "d:\Text\" -p "Book" -r "###"

Source: https://veryutils.com/text-extraction-command-line