Skip to Main Content

Tools for Organizing and Cleaning Data Workshop

An introductory guide to methods and tools for cleaning, organizing, and exporting data.

Converting Addresses to Latitude and Longitude Coordinates

Sandbox Geocoder (UCLA)
Simply copy and paste your location in the Geocoder to output coordinate data. .

 

Geocode by Awesome Table
Prepping location data for analysis, visualization, and/or mapping may involve converting street addresses into coordinates.
NOTE: There's a 1,000 daily limit from each Google account.

  1. Start with a spreadsheet with street addresses. Open it in Google Sheets.
  2. In the menu, select "Add-ons" and then "Get add-ons."
  3. Search for Geocode and then add it.
  4. Then, select "Add-ons" again and select "Geocode by Awesome Table."
  5. If your addresses are separated into multiple columns, click that link and identify which columns.
  6. Click "Geocode!"

Do you need to combine latitude and longitude into one box in your spreadsheet?

=first column&" "&second column)
Note: This will add a space between your data. If you want a comma and a space do this: ", "
For example: =A2&", "&B2  

Creating New Columns Based on Existing Data

There are multiple methods of creating a new column from data that is already in your spreadsheet.

 
Text to Column

Many spreadsheets contain "compound" fields with lists of data. Unfortunately, these compound lists are difficult to use tools such as "Sort and Filter" on, and they can also hide mistakes.

By using Text to Column to separate "compound fields", you can separate data from fields like the Source field so that the number of pages, size of the original, and other description information are in different fields. You can also use this technique on the Cuisine field to see a complete list of cuisines assigned to these restaurants.

 
Pivot Table

Pivot tables can examine fields for added-value assessment. Copy the cuisine data into one field and use "Insert Recommended Pivot Tables" to assess the frequency with which these cuisines were assigned.

 

Flash Fill

Flash Fill is another option for creating or editing fields. It is a way to see "machine learning" in action. By demonstrating to the computer what transformation you would like to see in 2-5 examples within a field, you can "teach" the computer what transformation to do to an entire column.

For example, create a new column. In the first row of this column, create a header titled "NewFileName". In this second row of this column, copy the title of the menu, followed by an underscore, the content of the edited date field, and .pdf. When date fields are left empty, use "uuuu" to note an unknown date. In this case, W2 would read "Antoine's menu_1963.pdf". Continue following this pattern through W6 or until Flash Fill results begin to fill in the entire column.

If necessary, you can also select "Data">"Flash Fill" to force the program to attempt to "learn" from your existing data and guess at the appropriate content. Always review this content. For example, revisit the dates with the format YYYY. Keep in mind that any refinements made to this field will cause Excel to re-examine and possibly edit the Flash Fill results.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.