×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Parse data

Parse data

Parse data

(OP)
Many years ago we had perl scrip created that parsed a AP news feed and just grabbed the headlines and created a clean newswire.html file. It has been working for over 10 years. The way received the feed has changed and our script is no longer working. I am a newbie- any help is appreciated. Below is the perl scrip and then the textfile work on.

SCRIPT:
use Cwd;
$curr= cwd();
# Test
# print "Current working directory is ", $curr, "\n\n";

$PresentTime = time;

#Bigen WHYY content
print "WHYY Content";

chdir "d:\\signfiles" ;

$now = localtime;
$tday = substr($now,0,3) ;
$tmo = substr($now,4,3) ;
$tdate = substr($now,8,2) ;
$ttime = substr($now,11,5) ;
$contentfile = $tday. "_". $tmo. "_". $tdate. ".txt" ;
$apchk = "start" ;
$spchk = "specs" ;
#open(CONFILELIST, $contentfile) || die "cannot opendir. $!";
open(CONFILELIST, $contentfile) or open (CONFILELIST, "whyydef.txt");
print "In whyy content";
$timeskip eq "no" ;
$skip = "yes" ;
$spec = "no" ;
$goodtogo = "no" ;
while (<CONFILELIST>) {
$a = $_ ;
$headline = substr($a,0,5) ;
$headline =~ tr/A-Z/a-z/ ;
if ($apchk eq $headline) {
$timeskip = "yes" ;
$chktime = substr($a,6,5) ;
if ($ttime ge $chktime) {
$skip = "no" ;
$spec = "no" ;
@pcontent = "" ;
$ct = 0 ;
} else {
$skip = "yes" ;
}
}elsif ($spchk eq $headline) {
$timeskip = "yes" ;
$chktime = substr($a,6,5) ;
if ($ttime ge $chktime) {
$skip = "no" ;
$spec = "yes" ;
@pcontent = "" ;
$ct = 0 ;
} else {
$skip = "yes" ;
}
}
if ($skip eq "no") {
if ($timeskip eq "no") {
@pcontent[$ct] = $a ;
$ct = $ct + 1
}
}
$timeskip = "no" ;
$oldchktime = $chktime
}

close CONFILELIST;

chdir "C:\\aperl\\bin" ;

open(FILELIST, "newswire.txt") or $spec = "yes" ;
print filelist
print " :$spec: ";
if ($spec eq "yes") {
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print" Once more in content";
goto FINISHLINE ;
}
# End WHYY contnet
print "End WHYY Content";

#open(FILELIST, "newswire.txt") || die "cannot opendir. $!";

$chk = "^AP Top Headlines" ;
$chkus = "^AP Top U.S. News" ;
$chkyes = "no" ;
$apchk = "AP" ;
$brkchk = "\^" ;
#print $chk ;
$fg = "not" ;
$skip = "no" ;
$newsct = 0 ;
while (<FILELIST>) {
#check for news content
$goodtogo = "yes" ;
$headline eq "" ;
$a = $_ ;
#$a =~ s/\W// ; #This gets rid of the ^ on other lines
$headline = substr($a,11,2) ;
#print $headline ;
#$wait = <STDIN> ;
if ($apchk eq $headline) {
$fg = "not" ;
#print $fg ;
}
$grab = substr($a,0,17) ;
if ($grab eq $chk) {
$chkyes = "yes" ;
$grab =~ s/\W// ; #Delete First Character of line
#print HTML "... " ;
#print HTML $grab ;
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print HTML "The Top Headlines From WHYY" ; #add at " to include the time
@timeparts = localtime(time) ;
#print HTML $timeparts[2], ":", $timeparts[1] ;
$fg = "yes" ;
$skip = "yes" ;
} elsif ($grab eq $chkus) {
if ($chkyes eq "no") {
$grab =~ s/\W// ; #Delete First Character of line
print HTML "... " ;
print HTML $grab ;
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print HTML "The Top Headlines From WHYY" ; #add at " to include the time
@timeparts = localtime(time) ;
#print HTML $timeparts[2], ":", $timeparts[1] ;
$fg = "yes" ;
$skip = "yes" ;
}
}
if ($grab ne $chk) {
if ($fg eq "yes") {
$brk = substr($a,0,1) ;
if ($skip eq "no") {
if ($brk eq $brkchk) {
print HTML "... " ;
}
#} else {
#print HTML " " ;
#}

if ($brk eq $brkchk) {
$a =~ s/\W// ; #Delete First Character of line
}
print HTML $a ;
}
} elsif ($grab ne $chkus) {
if ($chkyes eq "no") {
if ($fg eq "yes") {
$brk = substr($a,0,1) ;
if ($skip eq "no") {
if ($brk eq $brkchk) {
print HTML "... " ;
}
#} else {
#print HTML " " ;
#}

if ($brk eq $brkchk) {
$a =~ s/\W// ; #Delete First Character of line
}
print HTML $a ;
}
$skip = "no" ;
}

}

}
$skip = "no" ;
}

}

print HTML "... " ;

close FILELIST;

#close HTML;



FINISHLINE:
if ($goodtogo eq "no") {
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
}
$nct = 0 ;
for ($nct =0; $nct <= $ct; $nct++) {
print HTML @pcontent[$nct] ;
print HTML " " ;
}

close HTML;


Example of NEWSWIRE TEXT File:

Niall Horan (One Direction) is 20. Actor Mitch Holleman (``Reba'') is 18.
^
'' _ Charlotte Bronte (BRAWN'-tee), English author (1816-1855).
^
(Above Advance for Use Friday, Sept. 13)
^
Copyright 2013, The Associated Press. All rights reserved.
^

AP-WF-09-13-13 0401GMT<
0403-----
r a BC-US--People-Kidman 09-13 0501
^*1402< ^AP-US-People-Kidman,118<
^Kidman says she's OK but shaken after collision<
^AP Photo NYET105<
^Eds: APNewsNow. With AP Photos.<

Calvin Klein event, she said she was ok.

Kidman added: ``I'm up, I'm walking around, but I was shaken.''

AP-WF-09-13-13 0411GMT<


0401-----
r a BC-US-TEC--Twitter-IPO-T 09-13 0916
^BC-US-TEC--Twitter-IPO-Tweet Facts,492<
^Tweetable facts about Twitter's IPO<
^AP Photo NYBZ147<
^Eds: With AP Photos.<
^By SCOTT MAYEROWITZ=
^AP Business Writer=

a Tweet.

e limit of tweets.

a planned IPO.


announcement tweet, 7,872 people retweeted the message.


_ The public offering comes at a time of heightened investor interest
in the IPO market _ 131 IPOs have priced so far this year.

_ Is (at)Twitter trying to avoid (at)Facebook's May 2012 IPO (hash)fa
il? Well, company is keeping details secret for now. (hash)TwitterIPO

_ The company hasn't said if it makes a profit or how much revenue it
takes in. (hash)FadOrFuture? Wonder if (at)WarrenBuffett will buy stock.

_ Most of Twitter's revenue comes from advertising. (at)eMarketer est
imates $582.8 million this year, up from $288.3 million in 2012.

_ Compare: In latest quarter, Facebook had $1.6 billion in ad revenue
. By 2015, Twitter's annual ad revenue is expected to hit $1.3 billion.

_ 2013 (hash)Superbowl performance by (at)Beyonce had 268 million twe
ets per minute, more than any other event in past two years.

_ Not everybody on (at)Twitter is who they claim to be. (at)United Ai
rlines CEO Jeff Smisek has to put up with (at)FakeUnitedJeff

_ Sometimes even missing zoo animals get their own Twitter accounts.
And they can be funny. Just read (at)BronxZoosCobra



.''


AP-WF-09-13-13 0411GMT<


0407-----
r a BC-US--NuclearSpending 09-13 0578
^*1110< ^AP-US-Nuclear-Spending,130<
^Nation's bloated nuclear spending comes under fire<
^AP Photo LA104, LA103<
^Eds: APNewsNow. Will be expanded. With AP Photos.<
^By JERI CLAUSING and MATTHEW DALY=
^Associated Press=


sitive nuclear bomb-making facilities doesn't work.

ms that include a redesign to raise the roof so equipment can fit inside.

tic budget increases for nuclear contractors.

uld be overhauled.


Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close