Javascript required
Skip to content Skip to sidebar Skip to footer

Python Choose Sheet Name After File Upload

Epitome past Andrian Valeanu from Pixabay

I of the fascinating things well-nigh programming is that with a few lines of code yous tin can go your computer to deport out a task that would accept been otherwise mundane and annoying for you lot to exercise on your own. One of those mundane tasks is extracting information from a large excel canvass. The Python programming language is very robust, and one of the areas where it shines is helping us automate tedious and repetitive Excel tasks.

In this weblog post, nosotros volition be embarking on a step by stride process to extract some valuable information from an excel canvass. The excel sail we will exist using is the sheet that contains all the fruit sales of a supermarket for a calendar month. Each row contains individual records of fruits purchased by customers. There are 3 columns in the excel sheet. Column A gives the toll per pound of the purchased fruit, Cavalcade B gives the Pounds sold, and Column C gives the states the total cost of the purchase. The excel sheet has 23758 rows and four columns. You can download the excel canvass here.

Our goal is to find out and document the total pounds sold total sales and the unabridged purchase instances for each fruit in that month. You can imagine the frustration of having to get through 23758 rows to extract data about each fruit, well we are in luck as Python would aid united states of america to complete this task in no time. The steps below will give an in-depth and applied explanation of how y'all can use Python to complete this task.

Before we get on to this task, I desire to assume that you have a basic knowledge of writing code in Python and that y'all have the Python installed on your estimator.

Install the openpyxl Module
The python module we volition be working with is the OpenPyXL Module. The OpenPyXL Module is a library that allows y'all to use Python to read and write excel files or files with the .xlsx/xlsm/xltx/xltm extension. If y'all don't have it installed on your IDE, you tin can install it using

          pip install openpyxl        

To exam if you successfully installed it, import it using

          Import openpyxl        

So far no mistake is returned yous accept installed the OpenPyXL Module and are ready to work on some excel documents.

Read in and open up the Excel document with OpenPyXL
The next port of call is to read in the excel sail into our Python surroundings. Make certain the Excel y'all volition be working with is in your current working directory(CWD). You can access your CWD using:

          import os            
os.getcwd()
bone.chdir() #this changes our CWD, if the excel sheet is not in CWD

What if the excel canvass is not in your CWD? You can copy the file and paste information technology into your CWD, and then you admission it from there. Once we are sure nosotros take our Excel document in our CWD, we can now read it in.
Subsequently reading in the Excel document, we can now access it to obtain diverse data about the excel sheet.

          import pandas every bit pd            
file = 'produceSales.xlsx'
data = pd.ExcelFile(file)
print(data.sheet_names) #this returns the all the sheets in the excel file
['Sheet1']

Next, we parse the canvas nosotros will exist working with into a data frame, this will enable usa to know if our excel canvass was correctly read in.

          df = data.parse('Sheet1')
df.info
df.caput(10)

This image shows the kickoff ten rows of our canvas.

Read in the spreadsheet information
The next step is to read in data from the spreadsheet [Sheet1].

          ps = openpyxl.load_workbook('produceSales.xlsx')          sheet = ps['Sheet1']          sheet.max_row            
#returns the total number of rows in the canvass
23758

Next, we use a For loop to iterate over all the rows in the sheet.

          for row in range(2, sheet.max_row + 1):          # each row in the spreadsheet represents data for a particular purchase.          produce = sail['B' + str(row)].value          cost_per_pound = sail['C' + str(row)].value          pounds_sold = sheet['D' + str(row)].value          total_sales = sheet['Eastward' + str(row)].value          # the first cavalcade is B followed past C and so on.          # Each value in a cell is represented by a column letter and a row number. And then #the first chemical element in the sheet is B1, side by side cavalcade C1 and so on. This enables #to iterate over the entire cells.        

Create an empty dictionary that contains all the data on each fruit. Nosotros then use the set.default() method to fill the start gear up of elements into the lexicon. set.default() first statement checks if the key exists in the dictionary, if it doesn't it replaces it with the 2nd argument. That way, nosotros can first populating our dictionary with the second argument of the set.default function.

          TotalInfo.setdefault(produce,{'Total_cost_per_pound': 0,
'Total_pounds_sold': 0,
'Total_sales': 0,
'Total_Purchase_Instances': 0})
# so with this set default method, we have set all metrics we want to collect to zero. When we are iterating, we first from Zero and add new iterated to the dictionary. The key of the dictionary is the fruit which is mapped to their various metrics.

Finally, we populate the dictionary. For each new produce seen in a new row, nosotros increase the metric by its respective value in the new row.

          # Each row represents a fruit, and then increment by the new respective values.            

TotalInfo[produce]['Total_cost_per_pound'] += float(cost_per_pound)

TotalInfo[produce]['Total_pounds_sold'] += int(pounds_sold)

TotalInfo[produce]['Total_sales'] += int(total_sales)

# Each row represents a fruit, so increment past one.

TotalInfo[produce]['Total_Purchase_Instances'] += 1

After running this code block, we would accept populated the TotalInfo lexicon with all the various metrics for each fruit for the month. The populated dictionary looks similar this:

          'Apples': {'Total_Purchase_Instances': 627,
'Total_cost_per_pound': 1178.7600000000068,
'Total_pounds_sold': 12119,
'Total_sales': 22999},

Write the Results to a File
After populating the TotalInfo dictionary. We can write this populated dictionary to any file of our pick exist it a .csv, .txt, .py et al. We volition be using the pprint.pformat module to pretty print our lexicon's values and we employ python's write manner to write the dictionary's values to the file. The code snippet below gives an illustration:

          resultFile = open('Total_info.txt', 'west')
resultFile.write(pprint.pformat(TotalInfo))
resultFile.shut()
print('Done.')

The Total_info.txt file will be found in your CWD.
Yous tin can always change the file format by changing the .txt extension to whatever file format yous want.
The code snippet beneath shows how yous tin can change to a .csv file format.

                      Open ('Total_info.csv', 'due west')        

Determination
In this blog post, we demonstrated how we could use Python to extract information from an excel sheet. Knowing how to obtain information from an excel canvass is e'er a welcome addition to your toolbox as it saves you a lot of fourth dimension from carrying out repetitive tasks. Feel free to alter the code in the article to arrange your needs; you can access the notebook that contains the end to end code that was used in this blog post here.

Happy Pythoning.

wheenantark.blogspot.com

Source: https://medium.com/analytics-vidhya/how-to-extract-information-from-your-excel-sheet-using-python-5f4f518aec49