×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

morphological and syntactic analysis on a text

morphological and syntactic analysis on a text

morphological and syntactic analysis on a text

(OP)
Hi all,
I'm a newbie.
I'm tryin to do an exercice but there's something that isn't working yet.
I've a text and I would like to do the analysis of it. The text is in french language.

"viens demain de bon matin"

I would like to analyse the text creating some arrays with the endings. In some cases endings are in common (for example "ain" is an adjectival and nominal ending at the same time, so "demain" will be displayed as "demain NOM ADJ").

How can I do this?

I'll put my code.

CODE --> perl

#!/usr/bin/perl
use warnings;
use diagnostics;
use Data::Dumper;
$Data::Dumper::Terse = 1;
$Data::Dumper::Indent = 0;
 
 #scalars and arrays
 
 my ($m, $file, $suf_adv, $line);
 my (@suf_nom, @suf_adj, @suf_verb);
 my %punt;


@suf_nom = qw(ard ain);
@suf_adj = qw(eux ain);
@suf_verb = qw(est iser ifier eter iller ouiller);
$suf_adv = "ment" ;
%punt = (
	' '   => 1,
	''    => 1,
	','   => 1,
	"'"   => 1,
	'.'   => 1,
	'?'   => 1,
	';'   => 1,
	'!'   => 1,
	'-'   => 1,
	':'   => 1,
	'?'   => 1
);

$line = "viens me trouver demain de bon matin, ... \n";

@words = split (/(\pP|\pS|\s)/, $line);
	
foreach $w(@words) {
	if ($w =~ m/$suf_nom[$_]/ and length($w) >= 6 and !exists $punt{$w}) {
		print "$w NOM\n";
	}
	elsif ($w =~ m/$suf_adj[$_]/ and length($w) >= 6 and !exists $punt{$w}) {
		print "$w ADJ\n";
	}
	elsif ($w =~ m/$suf_verb[$_]/ and length($w) >= 6 and !exists $punt{$w}) {
		print "$w V\n";
	}
	else {
		print "NC\n";
	}
} 



I'd like to have this output :
viens
me
trouver
demain ADJ NOM
de
bon
matin

RE: morphological and syntactic analysis on a text

I tried this:

Jurafsky.pl

CODE

#!/usr/bin/perl
use strict;
use warnings;

# define arrays
my @suf_nom = qw(ard ain);
my @suf_adj = qw(eux ain);
my @suf_verb = qw(est iser ifier eter iller ouiller);
my @suf_adv = qw(ment);
my @punt = split ("", ",'.?;!-:?");

my $line = "viens me trouver demain de bon matin, ... \n";

# find punctuation chharacters and replace them with space 
my @punt_found = ();
foreach my $p (@punt) {
  if ($line =~ /[$p]/) {
    # add to array
    push(@punt_found, $p); 
    # replace punctuation with space
    $line =~ s/[$p]/ /g;
  }
}

# split line to array by space
my @words = split (/\s+/, $line);

foreach my $word (@words) {
  my @word_class = ();
  my $pattern = "";

  push(@word_class, $word);

  # chack for NOM
  foreach $pattern (@suf_nom) {
    if ($word =~ /$pattern/) {
      push(@word_class, "NOM"); 
    }
  }

  # check for ADJ
  foreach $pattern (@suf_adj) {
    if ($word =~ /$pattern/) {
      push(@word_class, "ADJ"); 
    }
  }

  # check for VERB
  foreach $pattern (@suf_verb) {
    if ($word =~ /$pattern/) {
      push(@word_class, "VERB"); 
    }
  }

  # check for ADV
  foreach $pattern (@suf_adv) {
    if ($word =~ /$pattern/) {
      push(@word_class, "ADV"); 
    }
  }

  # print result
  printf "%s\n", join(" ", @word_class);
}

print "\nPunctuation characters found: ";
printf "%s\n", join(" ", @punt_found); 

Output:

CODE

C:\Work>perl Jurafsky.pl
viens
me
trouver
demain NOM ADJ
de
bon
matin

Punctuation characters found: , . 

RE: morphological and syntactic analysis on a text

Now I see, that in the array @punt the character '?' is present two times. Delete one of them.

RE: morphological and syntactic analysis on a text

(OP)
it's perfect ! I used the hash "punt" for the punctuation ! Is there a way to print the result apart from the first foreach loop?

RE: morphological and syntactic analysis on a text

In my case @punt isn't hash, it's only array which contains punctuation characters - I don't need hash.

Quote (Jurafsky)


Is there a way to print the result apart from the first foreach loop?
What result? I don't understand what you mean..

RE: morphological and syntactic analysis on a text

You mean probably something like this:

CODE

#!/usr/bin/perl
use strict;
use warnings;

# define arrays
my @suf_nom = qw(ard ain);
my @suf_adj = qw(eux ain);
my @suf_verb = qw(est iser ifier eter iller ouiller);
my @suf_adv = qw(ment);
my @punt = split ("", ",'.?;!-:?");

my $line = "viens me trouver demain de bon matin, ... \n";

# find punctuation chharacters and replace them with space 
my @punt_found = ();
foreach my $p (@punt) {
  if ($line =~ /[$p]/) {
    # add to array
    push(@punt_found, $p); 
    # replace punctuation with space
    $line =~ s/[$p]/ /g;
  }
}

# split line to array by space
my @words = split (/\s+/, $line);

my @all_word_classes = ();
foreach my $word (@words) {
  my @word_class = ();
  my $pattern = "";

  push(@word_class, $word);

  # chack for NOM
  foreach $pattern (@suf_nom) {
    if ($word =~ /$pattern/) {
      push(@word_class, "NOM"); 
    }
  }

  # check for ADJ
  foreach $pattern (@suf_adj) {
    if ($word =~ /$pattern/) {
      push(@word_class, "ADJ"); 
    }
  }

  # check for VERB
  foreach $pattern (@suf_verb) {
    if ($word =~ /$pattern/) {
      push(@word_class, "VERB"); 
    }
  }

  # check for ADV
  foreach $pattern (@suf_adv) {
    if ($word =~ /$pattern/) {
      push(@word_class, "ADV"); 
    }
  }

  # add reference to array
  push(@all_word_classes, \@word_class);
}

# print results
print "Results:\n";
print "--------\n";
foreach my $word_class (@all_word_classes) {
  # dereference and print
  printf "%s\n", join(" ", @{$word_class});
}
print "\nPunctuation characters found: ";
printf "%s\n", join(" ", @punt_found); 

As you see, in the main foreach loop I only stored the array reference \@word_class to the array @all_word_classes. So, after the foreach loop the array @all_word_classes contains the refernces on all particular arrays @word_class (for every word processed). Now I can on other place dereference and print all its elements.

Output:

CODE

C:\Work>perl Jurafsky.pl
Results:
--------
viens
me
trouver
demain NOM ADJ
de
bon
matin

Punctuation characters found: , . 

RE: morphological and syntactic analysis on a text

(OP)
That's great ! Thank you!!

RE: morphological and syntactic analysis on a text

If you want that the output for your line

CODE

viens me trouver demain de bon matin, ... 
should be

CODE

viens
me
trouver
demain NOM ADJ
de
bon
matin
, PUNT
. PUNT
. PUNT
. PUNT 
then first delimite every puctuation character by spaces and then split the string into array.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close