C++ How Do You Read a Text File in Rows and Columns

There are a couple of cardinal deportment that y'all will do with Microsoft Excel documents. One of the about basic is the act of reading information from an Excel file. You will be learning how to get data from your Excel spreadsheets.

Editor'due south notation: This article is based on a chapter from the book: Automating Excel with Python. You can gild a re-create on Gumroad or Kickstarter.

Before you dive into automating Excel with Python, you should empathise some of the common terminologies:

  • Spreadsheet or Workbook – The file itself (.xls or .xlsx).
  • Worksheet or Sheet – A single sheet of content within a Workbook. Spreadsheets can incorporate multiple Worksheets.
  • Column – A vertical line of data labeled with messages, starting with "A".
  • Row – A horizontal line of data labeled with numbers, starting with 1.
  • Cell – A combination of Cavalcade and Row, like "A1".

Now that you have some basic understanding of the vocabulary, you lot can move on.

In this chapter, you volition larn how to do the following tasks:

  • Open a spreadsheet
  • Read specific cells
  • Read cells from a specific row
  • Read cells from a specific column
  • Read cells from multiple rows or columns
  • Read cells from a range
  • Read all cells in all sheets

You tin can get started by learning how to open up a workbook in the adjacent department!

Open a Spreadsheet

The get-go detail that you need is a Microsoft Excel file. Yous can use the file that is in this GitHub code repository. There is a file in the affiliate two binder chosen books.xlsx that y'all will use here.

Information technology has two sheets in it. Hither is a screenshot of the beginning canvas:

Book Worksheet

For completeness, hither is a screenshot of the second sheet:

Sales Worksheet

Note: The information in these sheets are inaccurate, merely they help learn how to employ OpenPyXL.

Now you lot're prepare to outset coding! Open up up your favorite Python editor and create a new file named open_workbook.py. Then add the following code to your file:

# open_workbook.py  from openpyxl import load_workbook   def open_workbook(path):     workbook = load_workbook(filename=path)     print(f"Worksheet names: {workbook.sheetnames}")     sheet = workbook.agile     print(sheet)     print(f"The title of the Worksheet is: {sheet.championship}")   if __name__ == "__main__":     open_workbook("books.xlsx")

The beginning step in this code is to import load_workbook() from the openpyxl package. The load_workbook() office will load up your Excel file and render it equally a Python object. You can then interact with that Python object like you lot would any other object in Python.

You lot can get a listing of the worksheets in the Excel file by accessing the sheetnames attribute. This list contains the titles of the worksheets from left to right in your Excel file. Your code will impress out this list.

Side by side, yous grab the currently active sheet. If your workbook only has one worksheet, so that sheet volition be the agile one. If your workbook has multiple worksheets, every bit this i does, then the last worksheet will exist the active one.

The concluding two lines of your function print out the Worksheet object and the title of the active worksheet.

What if yous want to select a specific worksheet to work on, though? To learn how to accomplish that, create a new file and name it read_specific_sheet.py.

And so enter the following lawmaking:

# read_specific_sheet.py  from openpyxl import load_workbook   def open_workbook(path, sheet_name):     workbook = load_workbook(filename=path)     if sheet_name in workbook.sheetnames:         sheet = workbook[sheet_name]         print(f"The title of the Worksheet is: {canvass.title}")         print(f"Cells that contain data: {canvass.calculate_dimension()}")   if __name__ == "__main__":     open_workbook("books.xlsx", sheet_name="Sales")

Your function, open_workbook() at present accepts a sheet_name. sheet_name is a string that matches the title of the worksheet that you lot want to read. You check to encounter if the sheet_name is in the workbook.sheetnames in your lawmaking. If it is, you select that sail by accessing it using workbook[sheet_name].

So you print out the sheet's title to verify that y'all accept the right canvass. Y'all likewise call something new: calculate_dimension(). That method returns the cells that contain data in the worksheet. In this case, it will print out that "A1:D4" has data in them.

Now you are set up to movement on and acquire how to read information from the cells themselves.

Read Specific Cells

In that location are a lot of different ways to read cells using OpenPyXL. To start things off, yous will acquire how to read the contents of specific cells.

Create a new file in your Python editor and name it reading_specific_cells.py. Then enter the following code:

# reading_specific_cells.py  from openpyxl import load_workbook   def get_cell_info(path):     workbook = load_workbook(filename=path)     canvass = workbook.active     print(sheet)     print(f'The title of the Worksheet is: {sheet.title}')     print(f'The value of A2 is {sheet["A2"].value}')     print(f'The value of A3 is {sheet["A3"].value}')     cell = sheet['B3']     print(f'The variable "jail cell" is {cell.value}')  if __name__ == '__main__':     get_cell_info('books.xlsx')

In this example, there are 3 hard-coded cells: A2, A3 and B3. You can access their values past using dictionary-like access: canvas["A2"].value. Alternatively, you can assign sheet["A2"] to a variable and then practice something like cell.value to get the cell's value.

You tin can see both of these methods demonstrated in your code above.

When yous run this code, you lot should see the following output:

<Worksheet "Sales"> The title of the Worksheet is: Sales The value of A2 is 'Python 101' The value of A3 is 'wxPython Recipes' The variable "cell" is 5

This output shows how you tin easily excerpt specific cell values from Excel using Python.

Now you're ready to learn how you can read the information from a specific row of cells!

Read Cells From Specific Row

In about cases, you will want to read more a unmarried cell in a worksheet at a time. OpenPyXL provides a mode to go an entire row at once, too.

Go ahead and create a new file. You can name it reading_row_cells.py. And then add the following code to your program:

# reading_row_cells.py  from openpyxl import load_workbook   def iterating_row(path, sheet_name, row):     workbook = load_workbook(filename=path)     if sheet_name not in workbook.sheetnames:         print(f"'{sheet_name}' not institute. Quitting.")         return      canvass = workbook[sheet_name]     for jail cell in sheet[row]:         impress(f"{jail cell.column_letter}{jail cell.row} = {cell.value}")   if __name__ == "__main__":     iterating_row("books.xlsx", sheet_name="Sheet 1 - Books",                   row=two)

In this example, you pass in the row number two. You can iterate over the values in the row like this:

for cell in canvass[row]:     ...

That makes grabbing the values from a row pretty straightforward. When yous run this code, you'll become the following output:

A2 = Title B2 = Author C2 = Publisher D2 = Publishing Date E2 = ISBN F2 = None G2 = None

Those concluding two values are both None. If you don't desire to get values that are None, yous should add some extra processing to check if the value is None before printing it out. Yous can try to figure that out yourself equally an practise.

Yous are now ready to acquire how to go cells from a specific column!

Read Cells From Specific Cavalcade

Reading the data from a specific column is also a frequent use example that yous should know how to accomplish. For example, y'all might have a column that contains just totals, and you need to extract only that specific column.

To see how you lot tin do that, create a new file and proper name it reading_column_cells.py. Then enter this code:

# reading_column_cells.py  from openpyxl import load_workbook   def iterating_column(path, sheet_name, col):     workbook = load_workbook(filename=path)     if sheet_name not in workbook.sheetnames:         print(f"'{sheet_name}' not found. Quitting.")         return      sheet = workbook[sheet_name]     for jail cell in canvas[col]:         print(f"{cell.column_letter}{jail cell.row} = {cell.value}")   if __name__ == "__main__":     iterating_column("books.xlsx", sheet_name="Sheet 1 - Books",                     col="A")          

This code is very similar to the code in the previous section. The deviation here is that y'all are replacing sheet[row] with sheet[col] and iterating on that instead.

In this example, you prepare the column to "A". When you run this code, you volition get the following output:

A1 = Books A2 = Championship A3 = Python 101 A4 = wxPython Recipes A5 = Python Interviews A6 = None A7 = None A8 = None A9 = None A10 = None A11 = None A12 = None A13 = None A14 = None A15 = None A16 = None A17 = None A18 = None A19 = None A20 = None A21 = None A22 = None A23 = None          

Once again, some columns have no data (i.due east., "None"). You lot can edit this code to ignore empty cells and but process cells that have contents.

Now permit's discover how to iterate over multiple columns or rows!

Read Cells from Multiple Rows or Columns

There are two methods that OpenPyXL'due south worksheet objects give yous for iterating over rows and columns. These are the two methods:

  • iter_rows()
  • iter_cols()

These methods are documented fairly well in OpenPyXL's documentation. Both methods take the following parameters:

  • min_col (int) – smallest column index (1-based index)
  • min_row (int) – smallest row index (1-based index)
  • max_col (int) – largest column index (one-based index)
  • max_row (int) – largest row index (1-based index)
  • values_only (bool) – whether just cell values should be returned

You employ the min and max rows and column parameters to tell OpenPyXL which rows and columns to iterate over. You can take OpenPyXL render the data from the cells past setting values_only to True. If you lot set it to False, iter_rows() and iter_cols() will return cell objects instead.

It'due south e'er practiced to meet how this works with bodily code. With that in heed, create a new file named iterating_over_cells_in_rows.py and add together this code to it:

# iterating_over_cells_in_rows.py  from openpyxl import load_workbook   def iterating_over_values(path, sheet_name):     workbook = load_workbook(filename=path)     if sheet_name not in workbook.sheetnames:         impress(f"'{sheet_name}' not found. Quitting.")         render      sheet = workbook[sheet_name]     for value in sheet.iter_rows(         min_row=1, max_row=three, min_col=1, max_col=3,         values_only=Truthful):         print(value)   if __name__ == "__main__":     iterating_over_values("books.xlsx", sheet_name="Sail 1 - Books")

Here you load up the workbook as you have in the previous examples. You get the canvas name that you want to extract data from and and so use iter_rows() to get the rows of information. In this case, you set up the minimum row to 1 and the maximum row to iii. That ways that you will take hold of the first three rows in the Excel sheet you have specified.

Then you also set the columns to be i (minimum) to 3 (maximum). Finally, you set values_only to True.

When you run this code, y'all will get the following output:

              ('Books', None, None) ('Title', 'Author', 'Publisher') ('Python 101', 'Mike Driscoll', 'Mouse vs Python')                          

Your program will print out the commencement three columns of the commencement iii rows in your Excel spreadsheet. Your program prints the rows as tuples with three items in them. You lot are using iter_rows() as a quick manner to iterate over rows and columns in an Excel spreadsheet using Python.

At present you're ready to learn how to read cells in a specific range.

Read Cells from a Range

Excel lets you lot specify a range of cells using the following format: (col)(row):(col)(row). In other words, you can say that y'all desire to start in column A, row 1, using A1. If you wanted to specify a range, you would employ something like this: A1:B6. That tells Excel that you are selecting the cells starting at A1 and going to B6.

Go ahead and create a new file named read_cells_from_range.py. Then add this code to it:

# read_cells_from_range.py  import openpyxl from openpyxl import load_workbook   def iterating_over_values(path, sheet_name, cell_range):     workbook = load_workbook(filename=path)     if sheet_name not in workbook.sheetnames:         print(f"'{sheet_name}' non found. Quitting.")         return      sheet = workbook[sheet_name]     for column in sheet[cell_range]:         for jail cell in cavalcade:             if isinstance(cell, openpyxl.cell.cell.MergedCell):                 # Skip this cell                 keep             impress(f"{jail cell.column_letter}{prison cell.row} = {prison cell.value}")   if __name__ == "__main__":     iterating_over_values("books.xlsx", sheet_name="Sheet 1 - Books",                           cell_range="A1:B6")          

Hither you pass in your cell_range and iterate over that range using the post-obit nested for loop:

for cavalcade in sheet[cell_range]:     for cell in column:

You bank check to see if the cell that you are extracting is a MergedCell. If it is, you skip it. Otherwise, you print out the cell proper noun and its value.

When you lot run this lawmaking, you should see the following output:

A1 = Books A2 = Title B2 = Author A3 = Python 101 B3 = Mike Driscoll A4 = wxPython Recipes B4 = Mike Driscoll A5 = Python Interviews B5 = Mike Driscoll A6 = None B6 = None

That worked quite well. You should take a moment and try out a few other range variations to see how it changes the output.

Annotation: while the epitome of "Canvas 1 - Books" looks similar cell A1 is distinct from the merged cell B1-G1, A1 is really part of that merged cell.

The concluding code example that you'll create will read all the data in your Excel document!

Read All Cells in All Sheets

Microsoft Excel isn't as uncomplicated to read as a CSV file, or a regular text file. That is considering Excel needs to shop each cell's data, which includes its location, formatting, and value, and that value could exist a number, a date, an image, a link, etc. Consequently, reading an Excel file is a lot more than work! openpyxl does all that hard work for us, though.

The natural style to iterate through an Excel file is to read the sheets from left to right, and within each canvass, you would read it row by row, from top to bottom. That is what yous will learn how to do in this section.

Y'all volition take what you have learned in the previous sections and apply information technology hither. Create a new file and name it read_all_data.py. And then enter the following lawmaking:

# read_all_data.py  import openpyxl from openpyxl import load_workbook   def read_all_data(path):     workbook = load_workbook(filename=path)     for sheet_name in workbook.sheetnames:         sheet = workbook[sheet_name]         impress(f"Title = {canvass.title}")         for row in sheet.rows:             for cell in row:                 if isinstance(cell, openpyxl.cell.prison cell.MergedCell):                     # Skip this cell                     proceed                  print(f"{jail cell.column_letter}{cell.row} = {jail cell.value}")   if __name__ == "__main__":     read_all_data("books.xlsx")          

Hither y'all load upwards the workbook as before, but this time you loop over the sheetnames. You print out each sheet name as you lot select it. You use a nested for loop to loop over the rows and cells to extract the data from your spreadsheet.

Once again, y'all skip MergedCells because their value is None -- the bodily value is in the normal jail cell that the MergedCell is merged with. If you run this code, you will see that information technology prints out all the data from the ii worksheets.

You can simplify this lawmaking a bit past using iter_rows(). Open a new file and name it read_all_data_values.py. Then enter the following:

# read_all_data_values.py  import openpyxl from openpyxl import load_workbook   def read_all_data(path):     workbook = load_workbook(filename=path)     for sheet_name in workbook.sheetnames:         sheet = workbook[sheet_name]         print(f"Title = {sheet.title}")         for value in sheet.iter_rows(values_only=True):             print(value)   if __name__ == "__main__":     read_all_data("books.xlsx")          

In this code, you in one case over again loop over the sheet names in the Excel certificate. However, rather than looping over the rows and columns, you lot utilize iter_rows() to loop over only the rows. You prepare values_only to True which volition return a tuple of values for each row. You also practice non set the minimum and maximum rows or columns for iter_rows() because you desire to become all the data.

When y'all run this code, you volition encounter it print out the proper name of each sheet, then all the information in that sheet, row-by-row. Requite it a try on your own Excel worksheets and see what this code can do!

Wrapping Upwards

OpenPyXL lets you lot read an Excel Worksheet and its data in many different ways. You tin extract values from your spreadsheets speedily with a minimal amount of code.

In this chapter, you learned how to exercise the post-obit:

  • Open a spreadsheet
  • Read specific cells
  • Read cells from a specific row
  • Read cells from a specific column
  • Read cells from multiple rows or columns
  • Read cells from a range
  • Read all cells in all sheets

Now you are ready to acquire how to create an Excel spreadsheet using OpenPyXL. That is the subject of the next article in this series!

smiththerecomed1949.blogspot.com

Source: https://www.blog.pythonlibrary.org/2021/07/20/reading-spreadsheets-with-openpyxl-and-python/

0 Response to "C++ How Do You Read a Text File in Rows and Columns"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel