Recently my car was blinking the maintenance minder B167 code.
Turns out each character has a specific meaning.
Recently my car was blinking the maintenance minder B167 code.
Turns out each character has a specific meaning.
Code | What it Means |
A | Change Oil and Filter |
B | Replace the engine oil and oil filter. Perform front and rear brake cleaning and lubrication. Adjust parking brake. Perform in-depth inspection of the brake components. |
Sub-Code | What it Means |
0 | Perform multi-point inspection. |
1 | Tires need a rotation service. |
2 | Replace the cabin and engine air filters. |
3 | Replace the transmission fluid. |
4 | Replace spark plugs, timing belt (if equipped) and adjust valves. |
5 | The engine coolant needs replacing. |
6 | If equipped, rear differential fluid needs replacing. |
7 | Replace brake fluid. |
A quick post on setting up a connection to Athena with OKTA using JDBC. The SQL client I am using is DBeaver, but any other tool could be used.
jdbc:awsathena://AwsRegion=us-west-2;S3OutputLocation=s3://my-athena-results-bucket/results-folder/;AwsCredentialsProviderClass=com.simba.athena.iamsupport.plugin.OktaCredentialsProvider;user={user};password={password};idp_host=mycompany.okta.com;App_ID=0aosd2asadsdsdsd/123;SSL_Insecure=true;
This is a quick post on RLS behavior in a powerbi dataset, where you dont define any table filter expressions.
Basically, when you setup a role and you dont define any table filter DAX expressions, it allows you to control who has access to your dataset. This is useful, when you have multiple datasets in a workspace and you need to reduce the access to a particular group.
Here is an example:
Have you wondered how to use the "Create a report" >> "From a template" option from a dataset in PowerBi Service?
If you get the error: "Bash script and /bin/bash^M: bad interpreter: No such file or directory", then you are likely trying to run a shell script (.sh) from your windows drive. Most likely, the issue is that your line-endings are CR+LF and not the Unix style LF.
If you need to fix this, the simplest way is to open the file in NotePad++ and change the line-endings (bottom right side of the status bar).
Another simple way to do this is to run the following script that runs regex to replace CR+LF with just LF.
sed -i -e 's/\r$//' myfolder/*.*
Another option, if you are using GIT is to add a .gitattributes file with line-endings set to LF.
* text eol=lf
Please note that if you do this, you may have to first delete all files in your repo (expect .git folder and .gitattributes and then run git reset to get the latest files again).
Finally, another option is to not download the files directly from your command prompt or windows explorer. Instead use your WSL terminal to download the files and they should come down with the correct line endings.
The Unix timestamp represents the number of seconds since 1970-01-01. So to convert it to a DateTime in PowerQuery, use the following code:
= Table.AddColumn(Source, "unixTimeStamp to DateTime", each #datetimezone(1970,1,1,0,0,0,0,0) + #duration(0,0,0,[unixtimestamp]))
When you need to filter a Power Bi report by multiple values and your users want to copy & paste the list of values by which they wish the report filtered, there isnt an out of the box way to do it in Power Bi.
I recently stumbled upon an AppSource visual called "Mass Filter" and it does exactly that!
And a quick video that shows you how to use it: Power Bi - Filter by Multiple Comma Separated Values - YouTube
Incremental refresh involves partitioning of the data and this concept isnt something that is dwelled upon a lot in the documentation. Because of what fields you use for partitioning can make or break incremental refresh, a lot of times, not understanding the underlying concepts can cause weird behavior later on. Most often the issue, is duplicate records or ghost records. These issues typically surface a months after the incremental refresh is implemented, as the time periods for the archived and incremental refresh periods are measured in multiple months.
Here is some typical account data that one might encounter in a CRM system. I will be using this data to illustrate how Incremental Refresh works.
Id |
Name |
Created
Date |
Last
Modified Date |
1 |
Microsoft |
2022-01-01 |
2022-01-02 |
2 |
Sales Force |
2022-02-01 |
2022-02-02 |
3 |
Hitachi |
2022-03-01 |
2022-03-01 |
4 |
NCM |
2022-04-01 |
2022-04-05 |
5 |
Tesla |
2022-05-01 |
2022-06-03 |
Picking the partitioning field
Incremental refresh works by storing the data in separate partitions. If you choose "months" then a partition is created for each month. These partitions are important for PowerBi as when data needs to be refreshed, the entire partition's data is reloaded. When a partition is older than the number of months you wish to archive, then that partition is dropped. The partitions are always created using a date column.
The first thing to remember is that one needs to pick a date-time column that does not change (invariant). This means that one cannot use the LastModifiedDate column, which many systems capture, as that will change all the time. Typically, one will end up using the CreatedDate field, as that should never change once the record has been created.
Next you need to create 2 parameters named "RangeStart" and "RangeEnd". These need to be of type Date or Date/Time. You will use these 2 parameters to filter the above table by its CreatedDate. You can use any values you like for the RangeStart and RangeEnd on your computer, while developing your dataset, they will be reset upon the first refresh that is performed in PowerBi.com.
= Table.SelectRows(#"Reordered Columns", each [createddate] >= RangeStart and [createddate] < RangeEnd)
The next step is to setup the incremental refresh.
For this example, I am using Archive data set to 6 months and Incrementally refresh data starting 3 months before refresh date. This will make it easier for me to illustrate the way incremental refresh works. You will likely use very different values in your production datasets. Take note of the fact that I have set the "detect data changes" option. This allows you to fine tune the incremental refresh even more (and a setting I think you should definitely use). Instead of refreshing all the partitions in the "Incremental Refresh" period, only those partitions that have changes are refreshed. But how does PowerBi know which partitions changed? You use a field that is updated anytime a record is updated (an audit date column). Typically in most systems this is a field called LastModifiedDate. PowerBi, keeps track of the max value of this field for each Incremental Refresh partition. The next time the dataset is refreshed, only those partitions are reloaded where the source data returns a higher LastModifiedDate than the one PowerBi captured the last time it ran. SalesForce note: you should use the SysModStamp column instead of the LastModifiedDate column, as even backend process changes will update the SysModStamp column, where as LastModifedDate is typically only updated by user based changes.
With the settings as defined above (3 months of incremental refresh and another 3 months of archived data, for a total of 6 months of data), on the very first refresh in PowerBi.com, we will see 3 + (3*2) queries getting fired off (3 for the archived months and 6 for incremental refresh months). The reason you see the additional set of queries for the incremental refresh is that PowerBi will send 2 queries per partition (in this case its by month). These queries will look like this:
select max("systemmodstamp") as "C1" from ( select "systemmodstamp" from account where "createddate" >= timestamp '2022-01-01 00:00:00' and "createddate" < timestamp '2022-02-01 00:00:00' ) as "ITBL"
select "id", "name", "isdeleted", "createddate", "systemmodstamp" from account where "createddate" >= timestamp '2022-01-01 00:00:00' and "createddate" < timestamp '2022-02-01 00:00:00'
The 1st query is on the "detect data changes" column (sysmodstamp) and gets the max value for that partition. The 2nd query gets the data for that partition. After this first refresh in PowerBi.com, all 6 months of data are loaded into the dataset.
Now on every subsequent refresh, PowerBi will first send similar queries as this for each partition that is part of "Incremental Refresh". In our case it will run the queries for the months of April, May, June.
select max("systemmodstamp") as "C1" from ( select "systemmodstamp" from account where "createddate" >= timestamp '2022-04-01 00:00:00' and "createddate" < timestamp '2022-05-01 00:00:00' ) as "ITBL"
PowerBi will then compare these max values to the previous max values it retrieved. PowerBi will only refresh those partitions for which the new max values are different from the previous max values on the detect-changes-column. By doing this, PowerBi can be even more efficient about which partitions it is reloading. If you had not used the detect-data-changes option, then for every subsequent refresh, PowerBi would reload all 3 "incremental refresh" partitions.
Now lets get back to the data we were loading. Initially this data was (I have added the partition each record would be loaded into, in this table). Lets assume this occurred on 6/3/2022
Id |
Name |
Created
Date |
Last
Modified Date |
Partition |
1 |
Microsoft |
2022-01-01 |
2022-01-02 |
January
(archived) |
2 |
Sales Force |
2022-02-01 |
2022-02-02 |
February
(archived) |
3 |
Hitachi |
2022-03-01 |
2022-03-01 |
March
(archived) |
4 |
NCM |
2022-04-01 |
2022-04-05 |
Apr (Incr.
Refresh) |
5 |
Tesla |
2022-05-01 |
2022-06-03 |
May (Incr.
Refresh) |
Lets assume the following changes happened on the next day (6/4/2022)
Id |
Name |
Created
Date |
Last
Modified Date |
Partition |
1 |
Microsoft
(deleted) |
2022-01-01 |
2022-06-04 |
January
(archived) |
2 |
SalesForce
(SFDC) |
2022-02-01 |
2022-02-02 |
February
(archived) |
3 |
Hitachi |
2022-03-01 |
2022-03-01 |
March
(archived) |
4 |
National
CineMedia |
2022-04-01 |
2022-06-04 |
Apr (Incr.
Refresh) |
5 |
Tesla/Twitter |
2022-05-01 |
2022-06-04 |
May (Incr.
Refresh) |
6 |
Disney |
2022-06-04 |
2022-06-04 |
June (Incr.
Refresh |
In the above case, because rows 1 and 2 are in archived partitions they will not be reloaded. This can be a problem if you support soft-deletes, as now your report potentially has 1 additional record than the source (as row 1 was soft-deleted in the source system). The name change from row 2 will also not show up in your report. The only changes that will be loaded are the changes to the rows 4, 5 and 6.
Finally, when July rolls around, the January partition will be dropped entirely from the dataset and a July Partition will be added.
Id |
Name |
Created
Date |
Last
Modified Date |
Partition |
|
|
|
|
Partition is dropped |
2 |
SalesForce
(SFDC) |
2022-02-01 |
2022-02-02 |
February
(archived) |
3 |
Hitachi |
2022-03-01 |
2022-03-01 |
March
(archived) |
4 |
National
CineMedia |
2022-04-01 |
2022-06-04 |
Apr (archived) Partition type changed |
5 |
Tesla/Twitter |
2022-05-01 |
2022-06-04 |
May (Incr.
Refresh) |
6 |
Disney |
2022-06-04 |
2022-06-04 |
June (Incr.
Refresh |
7 |
Paramount
Studios |
2022-07-04 |
2022-07-04 |
July (Incr.
Refresh) Partition is added |
Incremental Refresh and CRM data:
When you use "CreatedDate" as your partition field, one thing you have to think about is what period of data you need refreshed. Imagine if you set up your Incremental Refresh to be 12 months and Archive period to be 24 months. On an entity like "Account", sometimes accounts may get created and may not get sold to or touched for a long time. So, if the account was created more than 12 months ago, then when it gets finally used and if any changes were to occur on that entity, you would not see those changes. In the above example, Microsoft is no longer even in the dataset, as its created date is outside the archival range. If Microsoft had an order on 2022-06-04, you would see the order's details, but the order would not have the account details, as its no longer being loaded.
For this reason, one needs to know what is the maximum time period between last-modified on an entity and its created date. You would then have to setup your refresh settings based on the usage statistics of the entities at your company. This is also one of the biggest reasons that it is very hard to setup an incremental refresh with CRM data.
When it comes to Order/Opportunity data, you have a little more lee-way. Typically one should use a field like "Order-Date" and not something like "Shipped-Date", as the latter could change. Also, you could institute a rule that states that orders/opportunities that are older than x months should be closed and a new one created.
What would happen if you use a partition column that might change
Lets take the following data as an example:
Id |
Order Name |
Order Date |
Last
Modified Date |
Order
Value |
Partition |
1 |
My Big Order |
2021-01-01 |
2022-01-02 |
$1 million |
January |
Id |
Order Name |
Order Date |
Last
Modified Date |
Order
Value |
Partition |
1 |
My Big Order |
2022-06-01 |
2022-06-01 |
$1.5 million |
January |
Important things to remember:
Some Tips and Notes:
PID or Proportional Integral Derivate is used often times when you need an automated system to self correct. Think for example a robot that has to drive straight or follow a line, or self balance.
Driving or Flying Straight: In this case the robot needs to keep to a certain heading or angle. Various forces might make it deviate from that heading (wind, wheel slip or friction, etc). The difference between the target heading and the actual heading is the error and PID can be used to correct that heading.
Following a line: In line following the robot attempts to follow a dark line to its target. In this case, typically a sensor that senses reflected light is used. The robot attempts to keep the reflected light within a certain range and tries to correct its direction to keep tracking the line. The difference between the amount of reflected light sensed and target reflected light is the error value and used in self correcting the direction the robot is travelling.
Self Balancing Robot: Think about a segway like robot that attempts to self balance using 2 motors. In this case the robot's target orientation is upright and the error is the angle from the vertical plane. The robot then attempts to self correct by applying a force towards the vertical plane. PID again provides the amount of self correcting needed.
PID is most important in the self balancing example, as one cannot use a constant amount of correction, as the robot would likely topple. This is because while the robot is attempting to balance, the amount of deviation from the vertical plane will vary randomly.
Quad Copter: This is probably the most intuitive applications to work with. Think of a quad copter on the ground. You wish to have it take off and hover at a certain height (eg: 10 feet). When the quad copter first takes off, its altitude sensor will start at 0 feet and the error is 10 feet. The quad copter increases its propeller speed and starts its journey to 10 feet. It will invariably overshoot the target altitude (say 11 feet), so it will have to slow down its propeller to return back to its target of 10 feet. Wind and other factors will keep moving the copter away from its target altitude and so, as the sensor returns its altitude readings, the quad copter will have to continuously change the propeller speed to increase or decrease its altitude.
In a PID based system, the math is trying to reduce the error at every calculation step (or loop) and bring it to its target value. The systems typically never ever reaches the their target value and continuously fluctuates around the target value.
Even though PID uses fancy words like integral and derivate, the math itself is very simple.
Lets go through PID math by its various components:
Proportional
One can use only the proportional part of PID (with I = 0 and D = 0) in many simple systems (eg: line following robot). The job of this component of the PID math is to provide a correction towards the target value.
Kp: constant of proportion (usually called the "proportional gain constant")
SP: Set Point (or the target value we want on the sensor)
PV(t): The process value at time t (or the current sensor value)
E(t) = SP - PV(t) : This is the error at any given time t. (or the error in the graph above).
The proportional value P(t) is calculated as Kp * E(t) or Kp * (SP - PV(t)).
P(t) = Kp * E(t)
P(t) = Kp * (SP - PV(t))
In other words its the error multiplied by a certain tuning factor (Kp). Kp can be used to magnify or dampen the effect of the error on the driving components.
In the diagram below, its Kp * 9ft (where Kp could be any value that you pick, other than 0!). If Kp = 1, then the value would be 9.
Integral
The job of the integral is make larger corrections if the Proportional component is not providing enough of a correction. It accumulates the errors and applies it to the correction. Again a Integral gain constant is used to dampen or magnify the effect of this component.
Ki: integral gain constant
I(t) = Ki * E(t) + I(t-1)
Where I(t-1) is the I(t) that was calculated in the previous cycle. If the terms (t-1) and (t) are confusing, you can also write this as:
I = Ki * E + Iprev
I = 1 * 9 + 0 = 9 (0, as I starts at 0)
at time = 2, (now I = 9)
I = 1 * 7 + 9 = 16
Derivate
The job of the derivate is to slow down the effect of the Proportional and Integral components. It attempts to minimize overshoot that can happen as the previous two components attempt to bring the system to the target value. From the graph above its the difference in value of the 2 error values shown above at (t-1) and t.
Kd: derivate gain constant
D(t) = Kd * (E(t) - E(t-1))
or alternatively:
D = Kd * (E - Eprev)
D = 1 * (9-0) = 9
at t=2, E(t-1) or Eprev = 9:
D = 1 * (7-9) = 1 * -2 = -2
Putting it all together:
C(t) = P(t) + I(t) + D(t) where C is the correction.
C(t) = Kp*E(t) + (Ki*E(t) + Ki*I(t-1)) + Kd*(E(t) - E(t-1))
Going back to our quad-copter example, this is what its flight might have looked like
In fancy math terms this is the PID equation:
The first term is Proportional component, 2nd is Integral and 3rd term is Derivative component.
And as an Algorithm: