Reading csv files

This post is a Python solution to a programming challenge from Programming Praxis. The challenge is to read a csv file and convert it into an html table.

On first inspection, this seems quite a simple task; Read in the file, replace line breaks and comma’s with the appropriate tr and td tags:

output = ''.join(['<html><body><table><tr><td>', 
                  sample_csv.replace(',', '</td><td>').replace('\n', '</td></tr><tr><td>'), 
                  '</td></tr></table></body></html>'])

The above code works well on simple csv’s, but doesn’t take into acccount the variety of csv formats. firstly there are a range of delimiting characters that can be used, tabs, or pipes are common alternatives to comma’s. Often the first line of data is column headings which would be good to wrap in th tags rather than simple td’s. Finally fields like addresses, which are likely to contain comma’s and therefore cause problems with delimiting are usually provided surrounded by quotes.

Fortunately python already has a library built to handle such quirks, aptly named csv. Within the module there is a useful Sniffer class, that uses a number of heuristics to assess the format of the csv, and also will identify whether it has a first row header or not. It also handles quoted fields correctly.

Final Code:


import csv


def html_page_generation(f):
    def generate_page(*args, **kwargs):
    
        with open('output.html', 'w') as html_out:
            html_out.write(''.join(['''<!DOCTYPE html><html><body>''', 
                                   f(*args, **kwargs    ), 
                                   '''</body></html>''']))
       
    return generate_page
    
    
@html_page_generation
def read_csv(file_name):
    
    mark_up = ['<table>']
    with open(file_name, 'rb') as csvfile:
        try:
            dialect = csv.Sniffer().sniff(csvfile.read())
            csvfile.seek(0)
            has_header = csv.Sniffer().has_header(csvfile.read())
            csvfile.seek(0)
            simple_reader = csv.reader(csvfile, dialect, 
                                       skipinitialspace=True)
        except:
            has_header = False
            csvfile.seek(0)
            # use the default csv settings if non can be sniffed
            simple_reader = csv.reader(csvfile, skipinitialspace=True)
            
        
        if has_header:
            mark_up.append('<tr><th>')
            mark_up.append('</th><th>'.join(simple_reader.next()))
            mark_up.append('</th></tr>')
            
            
        for row in simple_reader:
            mark_up.append('<tr><td>')
            mark_up.append('</td><td>'.join(row))
            mark_up.append('</td></tr>')
    
    mark_up.append('</table>')
    return ''.join(mark_up)
    
while True:
    file_name = raw_input('Please enter the name of the csv file to be '
                          'converted, or q to quit.\n')
    if file_name == 'q':
        break
      
    try:
        read_csv(file_name)
    except IOError:
        print "File %s not found. Please try again." % file_name

The full source code including sample data can be found on bit bucket

Reply