Modules:

Cleaner

ListMotor Cleaner is designed to extract e-mail addresses from raw data files and process them to get the standard ListMotor lists (that is sorted, one item per line, no duplicates).

The List Motor Cleaner can process several input files simultaneously. To select the files press the “Input files” button. How to make a selection is described in the section “General notes”, subsection “Input files selection”.

To convert raw text files containing few e-mail addresses into sorted and de-duped e-mails lists enough to do following:

Processing modes and options

The ListMotor Cleaner offers several processing modes which can be chosen/checked in the section “Output files and Options” of the Cleaner’s window. All processing modes can be activated at the same time; each will generate a separate output file.

Processing mode “E-mail addresses”

if this mode is chosen, the Cleaner extracts all syntactically valid e-mail addresses and also tries to 'correct' any e-mail addresses which have some illegal characters within them. The output file is alphabetically sorted, one item per line, without duplicates.

The mode E-mail addresses has some additional options:

Only strip out addresses preceded by: check this option and enter one or several words to indicate that in your output e-mails list must be only the e-mails that follow this word or words in the input file.

By default, the Cleaner places into the output file all syntactically valid e-mail addresses it finds. But in some cases you may need to get only those ones which follow a certain word (or words) in the input file.

The typical application of this feature is creation of unsubscribe-lists (or, in other words, remove-lists). In this case you usually have some e-mail messages from people, who asking you not to mail them any more. You can assemble these messages into one file and run the Cleaner process of this file with the option "Only strip out addresses preceded by" with word “From:” inscribed in the adjacent field. You’ll get your remove list as the result.

Reject any addresses longer then: check this option and enter a maximal length (up to 80 characters) and the Cleaner will reject e-mail addresses longer then this value. The default maximal length is 45 characters.

No duplicate domains: check this option if you need only one e-mail address from each domain present in the input file. For example, the input list is:

  • mary@company.com
  • alex@magazine.com
  • snail@yahoo.com
  • smith@market.com
  • jane@market.com
  • twiggy@yahoo.com
  • info@company.com
  • job@magazine.com
  • nicky@yahoo.com

If the option “No duplicate domains” is not checked, the Cleaner provides the following result:

  • alex@magazine.com
  • info@company.com
  • jane@market.com
  • job@magazine.com
  • mary@company.com
  • nicky@yahoo.com
  • smith@market.com
  • snail@yahoo.comtwiggy@yahoo.com

If the option “No duplicate domains” is checked (and the option “Except mail services” unchecked, see below), the result is following:

  • alex@magazine.com
  • mary@company.com
  • smith@market.com
  • snail@yahoo.com

The option “No duplicate domains” is really useful when a list contains e-mails in corporative domains. Using this option you avoid mailing the same message to the same company repeatedly. But at the same time you keep only one e-mail address per a web-based service (domains yahoo, hotmail, msn, etc.), while in fact each address in such domains belongs to a different person.

To solve this problem a sub-option Except mail services to the option “No duplicate domains”. Check “Except mail services” together with “No duplicate domains” to keep in your list all e-mail addresses which belong to the domains listed in the box “Mail Services” on the page “Options” of the ListMotor. Only one e-mail address will be kept in any other domain.

For example, the input list is the same:

  • mary@company.com
  • alex@magazine.com
  • snail@yahoo.com
  • smith@market.com
  • jane@market.com
  • twiggy@yahoo.com
  • info@company.com
  • job@magazine.com
  • nicky@yahoo.com

If the options “No duplicate domains” and “Except mail services” are both checked (and the yahoo domain is indicated in the “Options”\”Mail Services”), the result is following

  • alex@magazine.com
  • mary@company.com
  • nicky@yahoo.com
  • smith@market.com
  • snail@yahoo.com
  • twiggy@yahoo.com

Allow embedded spaces in AOL usernames: check this option to turn on extended processing of AOL e-mail addresses which include spaces, such as “john smith@aol.com” or “write me@aol.com”.

If the option is turned off, the Cleaner doesn't interpret a space as a valid e-mail address character and thus perceives these e-mails addresses like “smith@aol.com” and “me@aol.com”.

If the option is turned on, the Cleaner accepts these e-mails in full, but before placing them into the output file, removes spaces. The result is “johnsmith@aol.com” and “writeme@aol.com”. Nevertheless, these e-mails are absolutely valid according to AOL rules.

Save rejected e-mails into: check this option and specify a file name to get the list of the addresses which are rejected (i. e. not placed into the output file) by the Cleaner due to some inadequacy, like:

Processing mode “IP addresses”

If this mode is chosen, the Cleaner places all syntactically valid IP addresses which it finds in the input file into the output file. The valid IP address is the one which consists of 4 numbers separated by dots, each number not greater than 255 (e. g. 230.121.1.0).

The output file is sorted in numeric order, one item per line, without duplicates.

Processing mode “IP addresses in ()”

If this mode is chosen, the Cleaner places into the output file only those syntactically valid IP addresses which are within brackets or parentheses in the input file.

Valid bracket/parentheses types are: () {} [] <>.

The output file is sorted in numeric order, one item per line, without duplicates.

Processing mode “Proxies”

If this mode is chosen, the Cleaner extracts from the input file all syntactically valid proxies it can find. The valid proxy is the one which consists of 4 numbers separated by dots (each number not greater than 255) and a port number from 10 to 65535 directly followed by a colon (e. g. 230.121.1.0:80).

The output file is sorted in numeric order, one item per line, without duplicates. 

Processing mode “Proxies in ()”

If this mode is chosen, the Cleaner places into the output file only those syntactically valid proxies which are within brackets or parentheses in the input file.

Valid bracket/parentheses types are: () {} [] <>.

The output file is sorted in numeric order, one item per line, without duplicates. 

Processing mode “Phone numbers”

If this mode is chosen, the Cleaner will extract from the input file phone numbers in North American format: area code (3 digits), exchange (3 digits), local number (4 digits).

The allowable separators between the parts of the number are space, parentheses or hyphen. Examples: 305 120 5067; (903) 701-3018.

The output file is sorted in numeric order, one item per line, without duplicates.

The option Only strip out addresses preceded by is available for this processing mode. It performs the same functions as in the E-mail addresses mode (see above).

The typical application of this feature is to extract the numbers of faxes or mobile phones from e-mail messages. Export the messages into one file run the Cleaner process this file with the option “Only strip out addresses preceded by” set and the word “Fax” or “ Mobile ” written in the adjacent field. In the result you’ll have the faxes/mobiles list.

Processing features

There are some processing features provided by the Cleaner.

We would also give you a small tip: you can use the Cleaner to easily single out the invalid e-mail addresses which returned you the “undeliverable” messages. Export these messages into a single text file and process this file with the Cleaner. As the result you get the sorted and de-duped list of invalid addresses, which can be then processed with the ListMotor Remover to get rid of the useless addresses.

Processing results

You should specify a separate output file for each Cleaner processing mode you are going to use. There is a field for the output file name next to each processing mode’s option. How to indicate an output file is described in the section “General notes”, subsection “Output files selection”.

After processing your raw file and extracting the items you need – e-mails, IPs, phones, etc. – Cleaner places results into the specified files in the form of sorted lists (either alphabetically or numeric order, according to the items type), one item per line, no duplicates.

Indicating the output file names please note that if there is no file with the specified name in the specified path, it is created. If the file with such name already exists in the specified folder, it is overwritten. Besides according to the application options the backup copy of the file may be created (the same refers to any other output file).

The original input file remains unchanged.

Important note

Always save all your files processed by the Cleaner before you process them with any other ListMotor task. Thus you will have list of the e-mail addresses, IP addresses, phone numbers lists not in the form of unpredictably messy text but in the perfect ordered condition: one item per line, sorted, without duplicates. This form is suitable for the processing by the Merger, Remover, Keeper and others.

Neglecting the Cleaner may lead to error messages and unwanted and useless results.