Construct and apply Validation formulas

team-member-1

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:

View/download Topic Example Workbook

Instructions: You can type values or formulas into cells above but to access Excel functionality, or if it is easier to work directly in Excel, open the file by clicking on the Excel Icon at the bottom.

If the file does not display correctly try right clicking in the area above and selecting ‘Refresh’. If you are still unable to view the file properly, or it contains unsupported features, click here to download a copy.

Overview

Sometimes the validation you want to apply is more complex than just restricting cells to certain values. For example you might want to restrict the values in one column (or at least highlight exceptions) depending on what the value is another column. So in this case we need to use conditional logic formulas to build up our rules.

In the Topic example Excel workbook, say we want to apply validation such that if the Country is either UK or USA then the Salary must be greater than 50,000.

The creating logic formulas topic explains how to build up such a validation formula using IF, AND, OR – if you are unsure of how to do this, review that topic before this one. This topic in particular looks at three ways of utilising our validation formula:

  • Putting the validation formula in a column on the data entry sheet. This can highlight where an invalid row of data has been entered e.g. by showing “Error” in the validation column. Whilst this makes the validation checks most transparent, it requires adding columns and formulas to the data sheet, and doesn’t actually prevent people from entering invalid data.
  • Using the validation formula as part of a conditional formatting rule. This could perhaps highlight invalid entries/rows in a different colour making errors more apparent to the person reviewing the spreadsheet. This is somewhat less intrusive than having the validation formula(s) on the sheet (the checking is all done in the background), though the validation formula needs to be tweaked to allow it to be used as a rule and, like the above, also doesn’t actually prevent people from entering invalid data.
  • Using the validation formula as part of a data validation rule. Provided it is applied separately to all columns which are related, this can prevent people from entering invalid data. However, the validation logic used should be made very transparent and used sparingly or else it will become very frustrating for the users filling out the spreadsheet!

We will look at all three methods in this topic as each may be appropriate in different situations, or indeed a combination of the methods can be used on any given data entry sheet.

Validation formulas on the data entry sheet

On the ‘Validation on Sheet’ worksheet of the Topic Example workbook, we want to have a validation check column to the right of our data entry columns that checks whether the data entered in each row is valid – if the country is UK or USA then checking that the salary 50,000 or more. We can return “Error” if invalid data has been entered on the row and “OK” for valid data.

This validation formula has to exist on each row in which data is or can be entered on, so the validation formula has to be extended down to a reasonable range (the safest way is to extend this all the way down but with over 1 million rows in Excel 2007-2010 this is probably a bit excessive and will lead to the spreadsheet becoming very large, so perhaps extend to something sensible like 10,000 rows instead). If we don’t want to have values in this column extending all the way down the spreadsheet, we can prefix our validation formula with something like =IF(COUNTA(A2:D2)=0,””, i.e. if there is no data entered in the row then just return an empty cell value (double speech marks), otherwise return either “OK” or “Error”.

Taking this into account, our validation formula for row 2 could be something like =IF(COUNTA(A2:D2)=0,””,IF(AND(OR(C2=”UK”,C2=”USA”),D2<50000),"Error","OK")). So if there are no values in cells A2:D2 then the formula returns nothing, else it checks whether the country is UK or USA; if it is and the salary is less than 50,000 it returns an “Error” otherwise it returns “OK”. If there is data but the country isn’t UK or USA then it also returns “OK”.

We can then use conditional formatting to highlight any “Error” values in red for example and any “OK” values in green, and perhaps putting an [input message] for the formulas which explain the validation formula in plain English.

We probably also want to protect the sheet and ensure that the cells in our validation columns are locked so that users can’t manipulate the validation checks.

Using validation formulas in Conditional Formatting

Either as an alternative or addition to having the validation formulas in the sheet itself, you can embed it within conditional formatting so that it perhaps highlights the invalid rows in a different colour.

To do this, click anywhere on the sheet, select Conditional Formatting from the Home tab and then ‘New rule’ and ‘add’. Then select the type ‘Use a formula to determine…’. and enter the validation formula tweaked as per the instructions below . Once the formula and corresponding formatting has been set you will need to set the range to apply.

The validation formula we applied above of =IF(COUNTA(A2:D2)=0,””, IF(AND(OR(C2=”UK”, C2=”USA”), D2<50000), "Error", "OK")) as a conditional formatting formula as:

=AND(OR($C2=”UK”,$C2=”USA”),$D2<50000)

With an applied to range of say =$D$2:$D$10000.

There are a few important differences between the formulas:

  • The Conditional Formatting formula doesn’t need to return a value such as “OK” or “Error”. Instead the AND function here returns either TRUE (if both conditions are met) or FALSE (if either condition is not met), and the conditional formatting will be applied to any rows which evaluate to TRUE. If you are using an IF function within the formula, you can indicate that you want the formula to return a TRUE or FALSE value by writing these as functions themselves e.g. something like =IF(LEN(A2)>0,TRUE). Bear in mind that because formulas that evaluate to TRUE will have the conditional formatting applied, if you want to highlight invalid data ensure that the formula returns TRUE when the row is invalid (as in this case – both the country being UK/USA and the salary being less than 50,000)
  • We don’t need the beginning part of the formula to return an empty value if there is no data in the row because the conditional formatting formula isn’t returning a value anyway – rows with no data won’t be highlighted in this case because the first part of the AND condition which is checking for a value of UK/USA in Column C of the row will be FALSE.
  • If you want the formula to extend across rows, it should be entered with [dollar signs] on the column but not the row (e.g. use $C2 instead of $C$2)-this will ensure that the formatting applies across all rows selected in the ‘applies to range’. Since we have put a dollar sign in the column, it will always reference the value in Column C for the country and Column D for the Salary.
  • The formula you actually enter for conditional formatting only needs to reference the first row you want to apply conditional formatting to. in the example file this would be row 2 so our formula references $C2
  • The ‘Applies to range’ should be the entire range you want to potentially highlight invalid rows for, starting with the first row referenced in the formula. So of you want to apply it to the entire row for all rows starting from Row 2 up to 10 000 the applies to range is $2:$10000; if you only want to highlight the values in Column C it would be $C$2:$C$10000 (include both dollar signs)

Before putting this formula into a conditional formatting rule, type it into a column in the data sheet first and drag it down to your data range – it should return a TRUE or FALSE value for each row. Once you’re confident that all the rows with TRUE are invalid (and so will be highlighted), then you can copy and paste it as a conditional formatting formula.

Validation Formulas as part of Data Validation rules

With a slight change, this same formula validation can also be used as part of Data Validation

Start by highlighting the range you want to apply this to, again starting with the first cell referenced in the validation formula. In this case we need to highlight applicable rows in both columns C and D. The reason for this is because we want to restrict the values that can be entered in both these columns – data validation is enforced at the point of entry, so applying data validation to only one column would mean that an invalid entry would be accepted provided the other column was populated first.

Once you have highlighted the range, select Data -> Data Validation, choose type ‘Custom’ in settings and put in your validation formula as per the screen shot below.

The formula to put in here, corresponding to the same as above is =NOT( AND( OR( $C2=”UK”, $C2=”USA”), $D2<50000)).

This is exactly the same as the conditional formatting formula though is preceded with NOT – this function converts all TRUE results into FALSE and vice-versa. The reason for this is because the Data Validation formula determines what is allowable (as opposed to what is invalid)-so formulas which return TRUE are allowed whereas formulas which return FALSE aren’t. (This is in contrast to the conditional formatting formula where formulas which return TRUE are highlighted/formatted whereas formulas that return FALSE aren’t).

Remember that the row reference must be entered without the dollar sign, and to enter the formula referring to the first row that is highlighted (if you select the entire column then refer to $C1 and $D1 instead).

This data validation rule will then restrict or warn people when entering values which don’t conform to the formula. So when entering a country, if the salary is already populated with a value less than 50,000 then the values of UK or USA won’t be accepted as a valid country; if the country is already populated then if the user attempts to put in a salary figure less than 50,000 this will only be allowed if the value in column C isn’t UK or USA.

If you are using this option, it is very useful to put a description of the validation rule as an input message (and/or as an error/warning message) so that it is more transparent to the users. If only a warning message is applied (which will still allow users to enter invalid data) then this is less restrictive on the users but also less transparent to the person reviewing the spreadsheet where the invalid data lies. Therefore in this case, either combine this with conditional formatting or have the reviewer see the invalid entries by using the ‘circle invalid data’ feature under ‘Data validation’ on the Data tab of the ribbon.

Try it out

In the Topic Example Workbook, apply the appropriate validation formulas in each format as described above.

A demonstration of each type of Validation Formula is shown in the video clips below. Resist the temptation to view the answers file below until you have tried this!

Validation Formula on Worksheet

Validation formula as part of Conditional Formatting

Validation Formula as part of Data Validation Rule

Answers

Once you have completed the ‘Try it Out’ exercise, the completed Excel file should look like this:

If the file does not display correctly try right clicking in the area above and selecting ‘Refresh’. If you are still unable to view the file properly, or it contains unsupported features, click here to download a copy.