Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing punctuation from a string 1

Status
Not open for further replies.

tmcneil

Technical User
Joined
Nov 17, 2000
Messages
294
Location
US
All,

I'm writing a stored procedure to parse words from a sentence and store them into an array. I am accounting for the spaces in between each word, but what I would like to do, is for each word, send it to a function to remove any punctuation like a period or a hyphen or anything that is not an upper or lower case letter. Then send the word back to the stored procedure to store the word into the word array. The goal here is to compare each word to a table of words for a profanity filter.

The sentence content is coming from a web page where a user creates the text content, clicks on preview and the gui creates the text to speech wav file. If there is any content that could be considered profanity, this is where my stored procedure comes in and finds those words that fit this category.

Here's is the code to my stored procedure so far and some output.
Code:
PROCEDURE RestrictedContent(InMsgID    IN  MSG_DETAIL.MSG_ID%TYPE,
				            InMessage  IN  VARCHAR2,
				            ErrorMsg   OUT VARCHAR2,
						    ErrorCode  OUT NUMBER)
AS
    warray     WORD_ARRAY;
    word       VARCHAR2(100);
    pos        NUMBER;
    windex     NUMBER;
    Content    VARCHAR2(512);  -- the size of InMessage is no larger than 512
  
BEGIN
  -- Initialize ErrorCode and ErrorMsg
  ErrorCode        := 0;
  ErrorMsg         := 'Success';
  
  IF InMessage IS NULL THEN
      RETURN;
  END IF;
  
  UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: InMsgID: ' || InMsgID);
  UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: InMessage: ' || InMessage);

  windex  := 0;
  Content := RTRIM(InMessage);
  pos     := INSTR(Content, ' ');

  --
  --  Build array of words from message content...
  --
  WHILE pos > 0 LOOP
    -- Fetch first word from content...
	word := RTRIM(SUBSTR(Content, 1, pos-1));
	-- create a function call to parse out punctuation, ie, ".", "-", ";", etc.
	--word := ParseContent(word);   -- I would like the call the function here to remove anything that is not a upper or lower case letter. 
    windex := windex + 1;

	IF LENGTH(word) > 0 THEN
	  warray(windex) := TO_CHAR(word);
    --ELSE
	  --warray(windex) := VLIST(' ');
	END IF;

	Content := SUBSTR(Content, pos + 1, LENGTH(Content) - pos + 1);
	pos     := INSTR(Content, ' ');
  END LOOP;
  
  -- Handle trailing word from Content...
  IF LENGTH(Content) > 0 THEN
    warray(windex + 1) := TO_CHAR(Content);
  END IF;
  
  -- Print out warray to testing the contents  
  FOR i in 1..warray.COUNT LOOP
    UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: windex: ' || i);
    UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: warray: ' || warray(i));
  END LOOP;

	EXCEPTION
    WHEN OTHERS
    THEN
        ErrorMsg  := 'Restricted Words failed!';
        ErrorCode := -20000;
        --RAISE_APPLICATION_ERROR(-20000, 'Restricted Words timeout!');

END RestrictedContent;

Output:
Code:
RestrictedContent: InMsgID: 50000
RestrictedContent: InMessage: Attention customers on the Red Line, we are experiencing delays due to a power problem.
RestrictedContent: windex: 1
RestrictedContent: warray: Attention
RestrictedContent: windex: 2
RestrictedContent: warray: customers
RestrictedContent: windex: 3
RestrictedContent: warray: on
RestrictedContent: windex: 4
RestrictedContent: warray: the
RestrictedContent: windex: 5
RestrictedContent: warray: Red
RestrictedContent: windex: 6
RestrictedContent: warray: Line,
RestrictedContent: windex: 7
RestrictedContent: warray: we
RestrictedContent: windex: 8
RestrictedContent: warray: are
RestrictedContent: windex: 9
RestrictedContent: warray: experiencing
RestrictedContent: windex: 10
RestrictedContent: warray: delays
RestrictedContent: windex: 11
RestrictedContent: warray: due
RestrictedContent: windex: 12
RestrictedContent: warray: to
RestrictedContent: windex: 13
RestrictedContent: warray: a
RestrictedContent: windex: 14
RestrictedContent: warray: power
RestrictedContent: windex: 15
RestrictedContent: warray: problem.

So as you can see, I have to remove the comma after the word Line and the period after the last word, problem. I'll have other punctuation to remove which I can deal with that. what I am looking for is a fancy INSTR function that will remove these characters, but do not want to have to write statements looking for each kind of character. Possibly going through character by character and determining if it is an upper or lowercase, if it isn't, remove that character.

Thanks,
Todd
 

To remove characters from a string use TRANSLATE() function:
Code:
SELECT translate('Attention customers on the Red Line, we are experiencing delays due to a power problem.',' -.,',' ') from dual;

The above removes the dash, dot and comma from the text, just add other characters you wish to remove. [3eyes]

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
LKBrwnDBA,

Thanks for the help. I've got the word array working now and I'll post the code if you or anyone else that responds to this thread needs it. Also, I'm trying to write a select statement in my RestrictedContent stored procedure to based on each word in the array.
Code:
SELECT RESTRICTED_WORD INTO rword FROM RESTRICTED_WORDS
    WHERE LOWER(RESTRICTED_WORD) = LOWER(warray(i));

How do I write the syntax so that I have single quotes around warray(i) like the following example?
Code:
SELECT RESTRICTED_WORD INTO rword FROM RESTRICTED_WORDS
    WHERE LOWER(RESTRICTED_WORD) = LOWER('test');

Thanks,
Todd
 

You may need to use EXECUTE IMMEDIATE:
Code:
SQL:='SELECT RESTRICTED_WORD INTO rword';
SQL:=SQL||'  FROM RESTRICTED_WORDS';
SQL:=SQL||' WHERE LOWER RESTRICTED_WORD) = LOWER('''||warray(i)||''')';
EXECUTE IMMEDIATE SQL INTO ...;
[thumbsup2]



----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top