Juice Maker - Grab And Extract Everything From the Web <P>Juice Maker can help you grab the html pages from the web sites and extract what you want, for example, emails, fax/phone number, some special phases, etc., which is according to PCRE (Perl Compatible Regular Expressions). <P>You can give JM(Juice Maker) an URL, and define some parameters of the task, including what you want to extract, the depth of the pages, which pages you will not grab, Max connections, etc. and then start the task, it will be working, and after a while, all you want to extract are saved in your computer, which you can export to a text file. <P>Multi-threading <P>Juice Maker support multi-threading, which means that it will grab/extract the multiple pages from the web site with multiple connections at the same time. It works more efficient. And the number of connections can be defined by yourself. <P>Recursive Parsing <P>Juice Maker will analyze the grabbed html pages and follow the links within the page recursively according some conditions. The conditions can be defined in the task. Juice Maker can grab all html pages with entire structures of the web sites from a starting URL. <P>Regular Expression <P>Juice Maker supports PCRE (Perl Compatible Regular Expressions). You can define some regular expressions to tell juice maker what you are interested in the pages. And then it will extract the words according the regular expressions from the web site. It means that you can grab everything from the web, including emails, phone numbers, first/last names, addresses, etc. I will help you find out your regular expression if you are registered user. Please email me (weffen@gmail.com) for support. <P>Regular Expression Shortcut <P>You can save some regular expressions, which are in common, to the shortcuts, so that you can click them with the shortcut menu, instead of inputting them manually. Regular Expression Shortcut Menu Regular Expression Shortcut Dialog <P>Depth Controlling <P>You can define the depth of the structure of the pages to grab. Juice Maker will stop when the depth is reached. You also can tell Juice Maker in what condition it will not increase the current depth of the pages so that you can grab the pages with paging function with ease. <P>Breakpoint Continue <P>You can resume the previous unfinished task when you re-launch Juice Maker next time. Juice Maker can remember which pages it has grabbed and which pages it should grabbed, so that you need not grab the whole web site again. <P>Search Engine Grabbing <P>You can grab what you want from the results which are searched from the search engines with some keywords, such as http://www.google.com, http://www.baidu.com. Juice Maker can help you generate the grabbing task with "New From Template" menu item from "Task" Menu. <P>After you select the search engine, and type the keywords, the task's parameters are auto-generated. What you do now is only to enter the Regular Expression "What I Want To Extract From The Page Is" Memo box. Juice Maker grabs the searched results from the specified search engine with the specified keywords. |