Remove empty lines with SED


Raw data is often delivered in Excel-sheets with a lot of noise and formating around. For analysis in R or other packages the real raw data is required. Scripting the “deformating” in plain text / csv files using shell tools like SED, AWK or Pearl to remove excess text in the datasheets makes it possible to rerun the procedure or track systematic errors.

Removing empty lines from a file containing code in plain text (like .csv, .html, .php, etc…) is very easy with SED in a UNIX/ MAC OS shell and even possible in the Windows CMD (after installing SED). The following is a blockquote from ZoneO-tips for Mandriva Linux which I found really useful and well written:

So, open up a konsole and move into the directory where your file resides (cd MyDirectory). And here we go with the two lines that’ll do the job

sed '/^$/d' myFile > tt
mv tt myFile

Here is what happens:

sed '/^$/d' myFile removes all empty lines from the file myFile and outputs the result in the console, > tt redirects the output into a temporary file called tt,
mv tt myFile moves the temporary file tt to myFile.

Now, you may have 100 html files to correct at the same time. That’s where foreach comes in… Let’s say you want to correct all files ending with .html, here is what you should do:

Open up a konsole, move into the directory where your html files reside, type the following commands:

foreach file (*html)
sed '/^$/d' $file > tt
mv tt $file
end

Finished!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s