Monday, November 23, 2009

Using Perl to extract @aol.com, @hotmail/\.com, @yahoo.com email addresses

SkyHi @ Monday, November 23, 2009
have an ascii list of about 13,000 opt-in email addresses.

Each email address is on a separate line.

I need a browser based perl script that will help me extract the aol, hotmail and yahoo emails from that text file and separate them into 4 text files residing on the server for later downloading.

The emails are in total random order and are like the following

rowby@aol.com
rowby@earthlink.com
rowby@rowby.com
rowby@hotmail.com
rowby@yahoo.com
etc
etc
etc


I want to be able to upload the text file to the server, and then have the perl script process it. It will create an AOL list, a Yahoo List, A Hotmail list -- and an "Other" list.

I can use FTP to download those text files -- unless you have something more elegant.

Just to add some more "spice" to this request, I would like the list to be created in the following format:

A|john@sampledomain.com|-|-|-|01|01|2001
A|mary@otherdomain.com|-|-|-|01|01|2001
etc

The ONLY thing that will change on each line is the email address.

THanks

Rowby



Solution:
rowby:
Hi all,

Maneshr worked all saturday morning on the server and got the script working and I'm awarding him the points.

However, you have all contributed to this educational process and my client has authorized 2 free day car rentals to all who have submitted suggestions and scripts to this question (certain restrictions apply).

The client has offices in California only at this time, and you can visit the site at www.foxrentacar.com. If you are in California please email me at rowby@foxrentacar.com and I'll set it up.

By the way this script is for a newsletter that has a New YEars day theme, so I appreciated all of your quick responses -- I will be spending the rest of the weekend using the automated program that this script will work with, carefully sending out (slowly but surely) the newsletters.


Now here is the final script:

#!/usr/bin/perl

$|++; ## Disable output buffering

## Print the MIME header
print "Content-type: text/html\n\n";

## Print the HTML title.
print "<TITLE>Rowby's Email splitting page</TITLE>\n";

## Open the data file for reading.
open(IN,"/home/sites/site4/web/cgi-bin/row.txt") || die $!;
@lines=<IN>; ## Read each line of the file as an element of the array
close(IN);

## Remove the newline character from the end of each element.
chomp(@lines);

## Process each element of the array.
foreach $email (@lines){
## Extract just the domain name from the email id.
$email=~ /\@(.*)/; ## Get the ENTIRE domain name.
($domain)=split(/\./,$1); ## Extract ONLY the domain name.
$domain=lc($domain); ## Convert that extracted domain to lowercase.

next if (!($domain) || $domain=~ /^\s+$/); ## Ignore empty domains

## Change the if statement below to add another domain name.
if ($domain!~ /(aol)|(hotmail)|(yahoo)/i){ ## Domain name is not from our list of special ones. Others!!
push(@others,'A|'.$email.'|-|-|-|01|01|2001'); ## Store this email id in a common array.
$files{'others'}++; ## Increment the count of these common email domains.
}else{ ## This domain name is a special one.
push(@$domain,'A|'.$email.'|-|-|-|01|01|2001'); ## Store is seperately, in its unique array. E.g. all hotmail.com ids go in @hotmail array.
$files{$domain}++; ## Increment the count of these special email domains.
}
}

## Process each type of email domain (viz. special ones, like hotmail, aol etc.. & common ones i.e. others)
foreach (sort keys %files){
$total+=scalar(@$_);
print "Email ids for domain $_ = ",scalar(@$_),"<BR>\n";

##create an ouput file with the same name as domain name.
## E.g. all hotmail.com ids will be stored in hotmail.txt
$outfile='/home/sites/site4/web/cgi-bin/'.$_.'.txt';
open(OUT,">$outfile") || die $!;
## Now that all sorting has been done, write the data to the proper output files.
print OUT join("\n",@$_)."\n";
close (OUT);
}

print "<P>Grant total of email ids = $total<BR>\n";