All,
I'm writing a stored procedure to parse words from a sentence and store them into an array. I am accounting for the spaces in between each word, but what I would like to do, is for each word, send it to a function to remove any punctuation like a period or a hyphen or anything that is not an upper or lower case letter. Then send the word back to the stored procedure to store the word into the word array. The goal here is to compare each word to a table of words for a profanity filter.
The sentence content is coming from a web page where a user creates the text content, clicks on preview and the gui creates the text to speech wav file. If there is any content that could be considered profanity, this is where my stored procedure comes in and finds those words that fit this category.
Here's is the code to my stored procedure so far and some output.
Output:
So as you can see, I have to remove the comma after the word Line and the period after the last word, problem. I'll have other punctuation to remove which I can deal with that. what I am looking for is a fancy INSTR function that will remove these characters, but do not want to have to write statements looking for each kind of character. Possibly going through character by character and determining if it is an upper or lowercase, if it isn't, remove that character.
Thanks,
Todd
I'm writing a stored procedure to parse words from a sentence and store them into an array. I am accounting for the spaces in between each word, but what I would like to do, is for each word, send it to a function to remove any punctuation like a period or a hyphen or anything that is not an upper or lower case letter. Then send the word back to the stored procedure to store the word into the word array. The goal here is to compare each word to a table of words for a profanity filter.
The sentence content is coming from a web page where a user creates the text content, clicks on preview and the gui creates the text to speech wav file. If there is any content that could be considered profanity, this is where my stored procedure comes in and finds those words that fit this category.
Here's is the code to my stored procedure so far and some output.
Code:
PROCEDURE RestrictedContent(InMsgID IN MSG_DETAIL.MSG_ID%TYPE,
InMessage IN VARCHAR2,
ErrorMsg OUT VARCHAR2,
ErrorCode OUT NUMBER)
AS
warray WORD_ARRAY;
word VARCHAR2(100);
pos NUMBER;
windex NUMBER;
Content VARCHAR2(512); -- the size of InMessage is no larger than 512
BEGIN
-- Initialize ErrorCode and ErrorMsg
ErrorCode := 0;
ErrorMsg := 'Success';
IF InMessage IS NULL THEN
RETURN;
END IF;
UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: InMsgID: ' || InMsgID);
UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: InMessage: ' || InMessage);
windex := 0;
Content := RTRIM(InMessage);
pos := INSTR(Content, ' ');
--
-- Build array of words from message content...
--
WHILE pos > 0 LOOP
-- Fetch first word from content...
word := RTRIM(SUBSTR(Content, 1, pos-1));
-- create a function call to parse out punctuation, ie, ".", "-", ";", etc.
--word := ParseContent(word); -- I would like the call the function here to remove anything that is not a upper or lower case letter.
windex := windex + 1;
IF LENGTH(word) > 0 THEN
warray(windex) := TO_CHAR(word);
--ELSE
--warray(windex) := VLIST(' ');
END IF;
Content := SUBSTR(Content, pos + 1, LENGTH(Content) - pos + 1);
pos := INSTR(Content, ' ');
END LOOP;
-- Handle trailing word from Content...
IF LENGTH(Content) > 0 THEN
warray(windex + 1) := TO_CHAR(Content);
END IF;
-- Print out warray to testing the contents
FOR i in 1..warray.COUNT LOOP
UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: windex: ' || i);
UTILITIES_PKG.fprintf('debug.txt', 'RestrictedContent: warray: ' || warray(i));
END LOOP;
EXCEPTION
WHEN OTHERS
THEN
ErrorMsg := 'Restricted Words failed!';
ErrorCode := -20000;
--RAISE_APPLICATION_ERROR(-20000, 'Restricted Words timeout!');
END RestrictedContent;
Output:
Code:
RestrictedContent: InMsgID: 50000
RestrictedContent: InMessage: Attention customers on the Red Line, we are experiencing delays due to a power problem.
RestrictedContent: windex: 1
RestrictedContent: warray: Attention
RestrictedContent: windex: 2
RestrictedContent: warray: customers
RestrictedContent: windex: 3
RestrictedContent: warray: on
RestrictedContent: windex: 4
RestrictedContent: warray: the
RestrictedContent: windex: 5
RestrictedContent: warray: Red
RestrictedContent: windex: 6
RestrictedContent: warray: Line,
RestrictedContent: windex: 7
RestrictedContent: warray: we
RestrictedContent: windex: 8
RestrictedContent: warray: are
RestrictedContent: windex: 9
RestrictedContent: warray: experiencing
RestrictedContent: windex: 10
RestrictedContent: warray: delays
RestrictedContent: windex: 11
RestrictedContent: warray: due
RestrictedContent: windex: 12
RestrictedContent: warray: to
RestrictedContent: windex: 13
RestrictedContent: warray: a
RestrictedContent: windex: 14
RestrictedContent: warray: power
RestrictedContent: windex: 15
RestrictedContent: warray: problem.
So as you can see, I have to remove the comma after the word Line and the period after the last word, problem. I'll have other punctuation to remove which I can deal with that. what I am looking for is a fancy INSTR function that will remove these characters, but do not want to have to write statements looking for each kind of character. Possibly going through character by character and determining if it is an upper or lowercase, if it isn't, remove that character.
Thanks,
Todd