#
All
about dataflow expressions - Primer

Dataflow supports the following datatypes for column values

1.
String – Unicode set of characters

2.
Boolean – if it is true or false value

3.
Binary – byte array

4.
Date – Date with month/day/year

5.
Timestamp – Date which includes
hour/minute/second/millisecond precision

6.
Integral type – Any non-fractional type or whole
number

a.
Byte – 8 bit

b.
Short – 16 bit

c.
Integer – 32 bit

d.
Long – 64 bit

a.
Float – Single precision with 32 bits

b.
Double – Double precision with 64 bits

c.
Decimal – High precision fractional with 38 bits
of precision or scale. If you have financial math that requires high degree of precision
which cannot have rounding errors use Decimal. Decimal math can be a shade
slower that float or decimal. More on this after you master the sword.

Dataflow also has

__array__type,__structures__to represent hierarchical content,__maps__for mapping one type to another and combination of all these are also supported. That’s for a different blog after you get the basics.
Ok this means I have my favorite
GUID datatype in SQL and it has no representation in dataflow? Dataflow can
support newer datatypes with behavior in the future but until then it will be
the closest thing this can map to (a string or a binary type). Don’t mistake
this with datatypes your source system provides. Dataflow will do the mapping
under the scenes to make sure you have

__no__data loss when you__read__or__write__from that source system.
Back to the difference between
dataflow metadata and any other expression system. It has to do with the

__grammar__. Just like natural language grammar you can say the same words and can be interpreted in different countries differently. Let’s take a simple example.**col1**

**+**

**col2**

That’s easy. Adding two numbers.
Not so fast!!

**col1**and

**col2**were strings it means concatenation of strings.

**col1**was timestamp and

**col2**was a number it means adding as many milliseconds to the timestamp.

**col1**was high precision decimal(Account balance) and

**col2**was a integer, the result is a high precision decimal

**col1**was a string and

**col2**was a number it is an error!!!

Now you start to see why expressions
looks similar but can mean different things? Don’t go running to paste your U-SQL
or Excel expressions without careful inspection.

All expressions are made of three
things. Operands, operators and functions.

If this is all sounding like the

__compiler__class you skipped in school, dataflowninja will help you cover for it J.**Operand**is a value you are acting on. In the simplest form it is a column.

**Function**is a routine you are calling on one or more operands.

**Operator**is a short-hand notation for a function.

Let’s look at a small snippet of
the dataflow metadata.

*Movies*derive(**upperTitle :=**

__upper(title)__

**fullName =**

__:upperTitle + concat(initCap(firstName) , lastName)__

**balance =**

__amount * floor($discount)%100__

**partitionBy(‘key’,**

__trim(country)__**) ~>**

*DeriveMovies*
The

__underlined__text represent expressions and color coded components of an expression.**Operand**

There are 6 types of operands you can use

1.

**Column**– reference to a column in the input stream. If the name is alphanumeric (alphabets followed by numbers) you can directly use it in the expression. If not, you should enclose it with {}.**dumboCount**

**{count of #dumbos}**references a column named

**count of #dumbos**

2.

**Constant**– a constant value. These can be
a.

__String constant__– set of characters enclosed in single quote. Use a \ character to escape single quote in the value. Everything within single quotes is the actual value.**‘dumbo’**equates to

**dumbo**

**‘dumbo\’o big’**equates to

**dumbo’obig**

b.

__Integer constant__– a whole number that can be held in an integer datatype**123**

c.

__Long constant__– a whole number that can be held in a long datatype. Notice it is suffixed by ‘L’**123123123123L**

d.

__Double constant__– Fractional number with having a .(dot)**123.456**

e.

__Float constant__– Fractional number that can be held in a float datatype suffixed by ‘f’**123.456f**

f.

__Boolean constant__– true or false. You can also use true() or false() functions if your column names are named true/false!**true**

g.

__Regex constant__– This is a special version of the string constant where you don’t need regex escaping rules. It has a back quote ` start and end**`(\\d+) and (\\d+)`**is the same as

**‘(\\\\d+) and (\\\\d+)’**

3.

**Parameter**– If you are passing parameters into your dataflow, they can be referenced in expression with the $ syntax. Parameters also have datatypes like any operand**$discountCode**

4.

**Transient**– Transients allow you to store repeated computations in a temporary column. They cannot be used outside the transformation they are defined in. You can reference a transient prefixed with : symbol**:orderTempTotal**

5.

**Function outputs**– A output of a function can be used as an operand.**concat(‘dum’, upper(‘bo’))**

6.

**Meta operands**– These are special inputs that result out of pattern matching functions. We will cover these in a schema drift for transformation blog.**$1**

**Functions**

Functions take operands as input
and return an output. The output is again of a certain datatype. Functions in
dataflow have either fixed or variable number of inputs.

**upper(title)**

**concat(‘d’, ‘u’, ‘m’, ‘b’, ‘o’)**

Functions also have optional
inputs. You can choose to pass a value or omit it.

**trim(title)**

**trim(title, ‘-‘)**

**Operators**

Operators are shorthand notation
for some functions. It is highly encouraged to use operators for brevity instead
of function equivalents.

**orderTotal * {discount%} + 100**

**add(multiply(orderTotal, {discount%}), 100)**

Operators
supported are

1.
- unary
minus to negate a numeric value

2.
! negate a
boolean value

3.
* multiply

4.
/ divide

5.
% modulus

6.
+ add

7.
– substract

8.
== equals

9.
=== equal with null (we will learn about NULL in
a different blog)

10. !=
not equal

11. <=>
equalsIgnoreCase for strings

12. >
greater

13. <
lesser

14. >=
greaterOrEqual

15. <=
lesserOrEqual

Operators have precedence in that
some apply first before the other. The operators above are given in the precedence
order.

**100**

**+**

**orderTotal * {discount%}**is the same as

**orderTotal * {discount%} + 100**as

*****precedes

**+**.

**TIP:**Did you know dataflow metadata has

**reserved words! You can name your columns**

__no__**Select**or

**Table**or

**{I love Dumbeldore!@!#^&}**

Now that you are armed with the
basics of how expressions work, we will explore tips and tricks and advanced
expression techniques in subsequent blogs.