INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Removing non-ascii from large .txt file

Removing non-ascii from large .txt file

(OP)
I'm a awk virgin, however a system we have in place already uses this program with other scripts. I'm needing a command to remove all non-ascii characters from a 180mb .txt file.

One of the commands is this:

gawk -f scpfiles\Test.scp TEST.TXT > TEST.list

Which points to this script file:

{
plate=substr($0,1,8)
rest=substr($0,9,length($0)-8)
gsub(/ */," ",rest)
gsub(/ ,/,",",rest)
rest=substr(rest,1,91)
printf"%s%s\n", plate,rest
}


I'd like a simple command if possible, but either way I'd be very appreciative for any assistance. Thanks!


EDIT:

Okay, I just realized gawk.exe and the above script is used to remove certain information from the file, I still need something using awk.exe or if possible gawk.exe to remove non-ascii characters. Sorry in advance for being a dumbass.

RE: Removing non-ascii from large .txt file

(OP)
I created a .scp file and ran it, it appears to run the script but the output file is empty. Here's what I've got

gawk -f scpfiles\nonascii.scp file1.txt > file2.txt

in the scp file I've got

{
gsub(/[^ -~]/,"",$0)
}


The txt file I'm working with has 2.3 million lines.

RE: Removing non-ascii from large .txt file

(OP)
Thanks again for replying. When I run the above script the output file is this:

#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , / ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,

My data is missing (numbers, letters etc) I just wanted to remove the non-ascii characters and leave the numbers and letters, etc.

RE: Removing non-ascii from large .txt file

(OP)
I finally figured it out, this may help someone in the future so I'm posting what I came up with.

tr -cd '\11\12\15\40-\176' <Filebefore.txt > Fileafter.txt

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close