Saturday, September 5, 2015

How to stop Excel converting your SEPT1 gene to September1

Several scientists around the world use Microsoft Excel to visualize or process their data. Excel can be fun to use when you know the commands well. However, when it comes to sorting gene symbols or creating protein databases using Excel, it may cause several problems. Some of the gene symbols are always converted into dates. 

For example:

SEPT1 becomes Sep-01

MARC1 becomes Mar-01

MARCH1 becomes Mar-01

This is true for the SEPT, MARC1 and MARCH1 gene families. 

Most often, you don't know that this has happened and you are forced to erroneously conclude that you haven't identified these genes in your data.

You can manually correct these by adding a single quote (') symbol before the gene name. This is very tedious and troublesome for downstream data processing. To make it worse, I found out that Excel does not have an option to turn off this annoying transmogrification.

By trial and error, I found out  a perfect way to overcome this problem in Excel.

STEP1: Save your database file in CSV (Comma separated) or TSV (Tab separated) formats

STEP 2: Start the "Import Text File" wizard by going to the "Data" Tab and clicking "From Text"

STEP 3: Select your CSV or TSV file and click "Import".

STEP 4: Choose the "delimited" option and click "Next"

STEP 5: Select "Comma" (if you used CSV file) or "Tab" (if you used TSV file) and click "Next"

STEP 6: Select the "Gene Symbol" column in the "Data Preview" and select the option "Text" in the "Column data format"

STEP 7: Click "Finish" and then click "OK".

Now scroll back and check that SEPT1 has not been touched by Excel.

If you want a video on it, use this link


No comments:

Post a Comment

Please provide your inputs