Winning with complexity - naming variables


Standards - annoying order or useful tool? Poorly prepared standards that add nothing to the software development process can be frustrating. A programmer, instead of focusing on creative programming work, makes sure that his code meets the Holy Standards, which are often written on a few or even several pages and can be meticulous. You can also meet poor people who sweat their brows and count the spaces, the number of which was strictly defined in the current convention. Too meticulous standards, even if they improve the clarity and readability of the code, bring more harm than good. Therefore, when preparing standards, they should be constrained only to things that significantly improve the quality of the code. Contrary to appearances, the advantages of using naming standards are not limited to improving the clarity and readability of the code. They can bring us many more benefits.

In variable names, thanks to the use of prefixes and suffixes, we can encode huge amounts of key information for programming. Such information is available immediately, without searching the code - it facilitates the understanding and analysis of procedures and minimizes making many simple and yet frequent mistakes. The immediate knowledge of the main attributes of a variable speeds up programming and makes it easier to understand the entire process.


The key variable attributes that are important in the programming process are:
1. function in a procedure
2. its range
3. type: whether it is input, output, or constant
4. data type

The naming convention should allow this information to be easily encoded in a variable name.

The responsibility of a variable in a procedure
Correct naming of variables is an absolute must if we want our code to be readable. It seems very simple, but it turns out to be extremely difficult in practice. Naming variables to accurately reflect the role of that variable in the process is not easy. More than once you can come across code stuffed with variables like x_data, data_tmp, y_data, cur_data. The names of these variables do not inform about the responsibility of the variable in the process and make it difficult to analyze the procedure. The code should be readable and understandable. At higher decomposition levels, it should even be read like a book.
 Begin  
  zm1 := prc + prc_r * r;  
 end;  
Does the above code tell someone something? Let's try again:
 begin  
  new_employee_sal := employee_sal + (company_income* sal_raise_percnt);  
 end;  
Correct, thoughtful names of variables are the first and most important element necessary to improve the quality of our system.

The scope of the variable
The ability to determine at a glance what scope a variable has is very important, especially in complex processes, where we use many variables of different types: local, global, parameters.

Mistakes involving value substitutions or using a variable of the wrong type are very hard to find! In one system we looked for such an error for a good few days, before the code was traced line by line under the debugger and the error was caught. It turned out that in one place the global variable was taken for the calculation instead of the local variable. Both variables had very similar names, and nothing indicated what scope the variable was. Using the wrong variable resulted in a very big error in the calculation. Being able to distinguish the scope of a variable will completely eliminate such mistakes.

The scope of variables can be distinguished by using prefixes in variable names::
g<name> - global variables  np. gCurrentDate
p<name> – parameters np. pCurrentDate

What about local variables? Local variables can be left without prefixes. We use local variables most often, so why bother and write an extra letter each time? If the variable does not have the g or p prefix, it means that we are dealing with a local variable. It's always one more letter to use in a variable name to accurately reflect its purpose.

Variable in or out?
Parameters can be input or output, or both. Their use and behavior vary considerably. Knowledge about the type of such a variable can significantly facilitate orientation in the code and the development itself.

CONSTANT variables as well as input variables - cannot be changed in the procedure. For the output variables, however, you need to prepare a variable in the parent procedure to capture the value of the output parameter. Distinguishing the type of the variable just by the name will make the work easier. You will know immediately how to deal with which variable.

When calling a procedure from a parent procedure, it is important to distinguish between input and output parameters. Of course, you can check the type of a variable in the specification of a given procedure. But wouldn't it be faster to "decode" this information from the name itself? This will reduce the frustration of jumping in code from the snippet we are working on to specification and back again.

Since we already used the prefixes, now we use suffixes:
p<name>_io – parameter IN/OUT
p<name>_o – parameter OUT
<BIG_LETTERS_NAME> or c<name>- constant variable

Hmm, and again, one type of parameter is missing, the IN parameter. But on the other hand, if after the prefix "p" we see that the variable is a parameter, and has no suffix neither _io nor _o - we have no other option, it must be the IN input parameter, the most popular. So why write an "i" suffix each time if that suffix adds nothing? As they say, DRY - Don't Repeat Yourself: P

So instead:
 Begin  
  zm1 := prc + prc_r * r;  
 end;  

we can do that:
 begin  
  new_employee_sal_o :=p_employee_sal + (company_income * g_SAL_RISE_PERCENT);  
 end;  

At first glance, you can see that the new_employee_sal_o variable is an output variable, it will be evaluated and returned by the procedure. The p_employee_sal variable is an input parameter. Company_income is a local variable and g_SAL_RISE_PERCENT is a global constant.

When calling a procedure, using suffixes has even greater benefits.

Let's have a look:
 Begin  
   calc_salary(Employee, NewSalary);  
 end;  

Do we know anything from the code about this procedure? Not much right. To find out more, we need to either look at the calc_salary code or/and check the types of Employee and newSalary variables.

But as we add prefixes and suffixes:
 Begin  
   calc_salary(pnEmployee=> nEmployee, pnNewSalary_o => nNewSalary);  
 end;  

We can see that both p-parameters are n- numeric, the pnEmployee parameter is the input parameter, and the pnNewSalary_o → parameter is the output parameter, so we can't put the value of either a constant or an input parameter here, we just need to define the nNewSalary numeric variable to capture the parameter value from the procedure.

It should be emphasized immediately that the benefits in this situation can only be gained when the procedure is called according to the naming notation and not the positional notation (parameters to procedures are passed after the parameter name as above, and not according to the order of parameters as in the previous example. It has some other advantages but maybe more on that another time).

List of prefixes specifying the type of the parameter/variable

+-----------------+---------------+---------------+
|      scope      |    PREFIX     |    EXAMPLE    |
+-----------------+---------------+---------------+
| GLOBAL          | g<name>       | gdCurrentDate |
| LOCAL           | No prefix     | vUserName     |
| CONSTANT        | (big letters) | gnUSER_ID     |
| PARAMETR IN     | P<name>       | pnUserId      |
| PARAMETR OUT    | P<name>_O     | pvUserName_o  |
| PARAMETR IN OUT | P<name>_io    | pvFeeAmt_io   |
+-----------------+---------------+---------------+


The data type of the variable:
Specifying the data type of a variable name is perhaps the least common convention. The convention is not difficult to apply - it is enough to add the first letter of the data type in the prefix of variable names, e.g. v-varchar2, n-number. This information will facilitate programming and prevent unpleasant and easy-to-avoid errors.

Thanks to the additional prefix in the variable name, this especially applies to local variables, which we have left without any prefixes so that will allow us to distinguish variable names from column names in SQL queries. Why do we need it? Consider the example below:
 emp_id varchar2(100);  
 begin  
  select salary into nsal from employee  
  where emp_id = emp_id;  
 end;  
Oracle sees nothing suspicious about this query when compiling the routine. If we don't find anything unusual, the query will appear in the application. And the consequences can be considerable. In the WHERE condition, we compare here the emp_id column from the table with ... the emp_id column from the table. Although we have a variable with this name in the code - Oracle will not substitute this variable, it will only match the column to itself. And instead of a record for an employee - we will get all records from the table. Probably this error will not be detected on unit tests. After all, one row in the employee table is enough for tests ...

Let's try to fix it:
 begin  
  select salary into sal from employee  
  where emp_id = l_emp_id;  
 end;  

Well, it should work now. But are you sure?
It depends WHAT KIND OF DATA is l_emp_id! If it's the same type as emp_id, we're in luck. If the l_emp_id variable is a different type than the emp_id column in the table, and there is an index on that column, Oracle will implicitly convert the column data to match the l_emp_id parameter variable. As a result, instead of a quick index query - we can end up with a full scan.

And to detect this type of error, you need experience, luck, or time-consuming debug. When we try to run explain plan for a query to check its cost, either we'll use a bound variable:
  select salary into sal from employee  
  where emp_id = :l_emp_id;  

Then Oracle will calculate the query cost assuming the correct variable type, i.e. not the one we have in the process! And it tells us that the query is great and fast and uses the index.

We can also substitute the correct value in the query instead of l_emp_id. But since we don't know there is a bug in the procedure - we'll probably substitute a valid value. And Oracle tells us again that the query is great and will use the index.

And in the process, the inquiry as to anger goes slowly….

Can we do something about it?
 begin  
  select salary into sal from employee  
  where emp_id = v_emp_id;  
 end;  

In this situation, we can see at first glance that the variable v_emp_id is of the varchar2 type, which makes it much easier to spot the problem. As a rule, basic indexed fields are well known to programmers and the view of the primary key compared to a variable of type v - varchar2 - may raise reasonable doubts. Thanks to the information about the data type, we can, unfortunately, avoid frequent mistakes in data types and save Oracle implicit conversions, which are usually harmless, sometimes can have significant performance consequences.

+--------------+------------------------+---------------+
| TYP ZMIENNEJ |         PREFIX         |   PRZYKŁAD    |
+--------------+------------------------+---------------+
| VARCHAR      | v<name>                | vUserName     |
| NUMBER       | n<name>                | nUserId       |
| BOOLEAN      | b<name>                | bIfUserActive |
| ROWTYPE      | rec<name> or row<name> | recUSerData   |
| TABLE        | t<name> or tab<name>   | tUsers        |
| CURSOR       | cur<name>              | curUserTxn    |
| INTEGER      | i<name>                | iTxnCnt       |
| DATE         | d<name>                | dCurrentDate  |
+--------------+------------------------+---------------+

We combine prefixes and suffixes depending on the scope, type and data type of the variable. If we have the IN / OUT parameter of the number type, we should include all this information in the name, e.g. pnEmpSal_io.

Do I always use prefixes and suffixes everywhere?
No, as always, there are exceptions to the rule. The exceptions are typical variables, used and understood by programmers reflexively, i.e. all kinds of iterators. It has been assumed that we use variables with short, often one-letter names such as i, j, z, x, y for iterators. Any programmer seeing something like this
 for i in 1..100 loop  
knows exactly what's going on.

Record:
 For nEmployeeId in 1..100 loop  

it already introduces a bit of anxiety, slightly confuses the image, is long and iterators are often used in arrays or collections, and writing a long iterator name can be frustrating.
 For nEmployeeId in 1..100 loop  
    tabEmplyees(nEmployeeId).employee_id := nEmployeeId;  
 end loop;  

I prefer to stick to the good old i, j:
 For i in 1..100 loop  
    tabEmplyees(i).employee_id := i;  
 end loop;  

Better now :)

CamelCase or underscores
Yes, the topic was especially omitted, put to the end. Because in fact, whether CamelCase or underscores don't really matter. The size of the letters will not contribute much to the readability of the code or facilitate programming. As long as the names of the variables are well matched and the convention allows you to quickly distinguish between types and ranges of variables and parameters - this is what we want to achieve with standards. Beating foam or a better CamelCase or underscores misses the point. some prefer one, some prefer the other. Prefixes and suffixes that carry useful information - should be used because it's worth it. All other standards that do nothing you can let go of. Conventions that describe the use of upper / lower case letters on a few or several pages force the counting of spaces in the indentation - they give nothing but frustration. Instead of thinking creatively, the programmer is concerned with whether his code has broken a rule. Are you sure that this is what you want?


Preparation of standards:
Well-thought-out naming standards can make the work of programmers much easier and protect the code from errors that are easy to commit and difficult to fix. You can apply the proposed standards, but you can also modify them or prepare your own. It is important that they fulfill the basic tasks that will make the developer's work easier and more enjoyable

1. The variable name should describe its task/function in the process
2. The variable name should define its scope
3. The variable name should indicate its type e.g. in, out, constant
4. The variable name should specify the data type of the variable

Examples:


+-----------------+------------------------------------+---------------+
|      scope      |               PREFIX               |    example    |
+-----------------+------------------------------------+---------------+
| GLOBAL          | g<name>                            | gdCurrentDate |
| LOCAL           | no prefix, just data type prefix   | vUserName     |
| CONSTANT        | <scope prefix ><type prefix><NAME> | gnUSER_ID     |
| PARAMETER IN     | p<type prefix><name>               | pnUserId      |
| PARAMETER OUT    | p<type prefix ><name>_o            | pvUserName_o  |
| PARAMETER IN OUT | p<type prefix><name>_io            | pvFeeAmt_io   |
+-----------------+------------------------------------+---------------+

+---------+------------------------+---------------+
|  type   |         PREFIX         |    example    |
+---------+------------------------+---------------+
| VARCHAR | v<name>                | vUserName     |
| NUMBER  | n<name>                | nUserId       |
| BOOLEAN | b<name>                | bIfUserActive |
| ROWTYPE | rec<name> or row<name> | recUSerData   |
| TABLE   | t<name> or tab<name>   | tUsers        |
| CURSOR  | cur<name>              | curUserTxn    |
| INTEGER | i<name>                | iTxnCnt       |
| DATE    | d<name>                | dCurrentDate  |
+---------+------------------------+---------------+


Komentarze