Analytics certified… finally!

July has been really exciting so far as I finally managed to complete my Einstein Analytics and Discovery Consultant certification! I was provided with a free voucher that was only valid until July 31, so I really had to give it a shot and see how I did. Luckily enough, I knew enough to pass – and here’s how I did it.

Preparing for the Einstein Analytics and Discovery Consultant Certification

As every Salesforce consultant certification, the “Einstein Analytics and Discovery” certification is a mixture of scenarios that you have to solve, some pseudo-“debugging” and a good number of questions that basically test your knowledge. Read the exam guide and make sure that you complete the Trailhead Superbadges for Data Preparation and Analytics & Discovery Insights. Kelsey Shannon has blogged very comprehensively on her certification journey and Charlie Prinsloo has written the definitive preparation guide (some links require partner community access, though).

Get an EA org (either trial or developer)

If you don’t have experience with Einstein Analytics, then your starting point is getting an org. There is a free (and more or less perpetual) developer org available, and if you want to have a look at the fully configured “real thing”, Salesforce now offers a fully functional 30 days trial packed with sample data, sample dashboards and apps.

Watch the academy video training

If can ‘t attend an in-person “Einstein Analytics Academy” class, the EA team has a great alternative for you: Ziad Fayed has recorded a full training as a series of free webinars. It is your number one resource if you want to pass the Analytics certification and I recommend watching *and building* the solutions Ziad is presenting.

Use the Templated Apps

It might sound strange, but it’s highly recommended to use your developer org to create at least two essential apps from the App Templates Salesforce provides:

  • The Learning App has examples for essential techniques such as bindings.
  • The Sales Analytics App has functional examples of a sync scenario, complex dataflows, and dashboard designed as best practice.

Speaking of templates: You can score some easy points in the exam if you know

Know Dataflows & Recipes inside out

Though it’s not a universal truth, for the sake of the exam stick to the best practice that Sync, Uploads and Dataflows ingest all data into analytics, and recipes work off datasets. You’ll see that once you set up synced sources, you can use the synced source straightaway in a recipe to prepare a dataset. Yet this is not what the certification exam is about.

Know the limitations of synced external sources (such as an Azure DB) as compared to synced Salesforce objects (it’s a good idea to know the limits in this area: How many Salesforce objects can you sync? How many dataflows can you have?)

For dataflows (aka “the purple nodes”), you should know each node type and understand what you use it for:

  • Dataset Builder (aka “the blue nodes”) is exclusively for Salesforce objects. It helps you to find the finest grain, all related objects and allows you to select fields and relations.
  • sfdcDigest reads from a synced Salesforce object
  • edgemart reads from a dataset in Analytics (read: re-use an existing dataset)
  • sfdcRegister saves a dataset
  • append works like the “union” command – it adds the rows of a second dataset to the existing dataset.
  • augment “joins” one dataset to another by adding fields to existing rows, not new rows. In simple words: you choose the key on the “left hand side” (the data you already have), choose the dataset you want to join, select the field that matches the key (also: decide if you expect single or multiple matches, and which fields you want to add. The outcome will be the same number of rows with more columns/fields.
  • computeExpression lets you create new fields or recalculate values based on the fields in the same row. If row 10 has a value for “Quantity” of 10, the value for “ProductName” is “Cherry Cake”, you can create a formula to create a new field “Line Item Label” with the value ‘Quantity’ + ‘ProductName’ + “s”, which will build “10 Cherry Cakes”.
  • computeRelative allows you to compare or summarize a row with previous ones (row over row, or based on a grouping that is used as a “partition”).
  • dim2mea is a handy tool to convert a dimension to a measure if you need to do that. Unfortunately, there’s no mea2dim (if you accidentally read numeric product number as a measure). If you need this, you’ll have to use computeExpression, generate a String field, copy the value, and convert it to a String.
  • flatten allows you to convert a hierarchy into directory or path like representation of the hierarchy that gets access to the row. You can decide whether or not to include what Analytics call the self_id. The difference it makes is team visibility. Should a team always share their records, you’d need to set “include_self_id” to false. Imagine two records that include self_ids, one has “me/myteam/mymanager/ourboss”, the other one has “mycolleague/myteam/mymanager/ourboss”. They won’t be able to see their records respectively. If you set “include_self_id” to false, both will get “myteam/mymanager/ourboss” as their hierarchy path and by that are eligible to be shared among all members of “myteam”
  • prediction allows you to run a prediction from a Discovery model on your dataset’s rows (only available for Einstein Analytics Plus).
  • filter does, what the name says. It filters records that either match or don’t match a criteria.
  • sliceDataset acts like a filter, but for columns. You can choose whether you want to specify the columns/fields to drop or keep.
  • digest reads from any connected source and object (read: synced external data)
  • update does, what the name says: It updates a dataset with the changes you made. It’s basically a digest node that writes to the same dataset.
  • export WAS used to push a dataset to Discovery. Nowadays you can do that in the UI with a button on the dataset. By default, it only works with Discovery, and Discovery is only available with Einstein Analytics Plus licenses.

Know that the finest grain of your dataset is always determined by what you want to analyse. If your grain is not fine enough (let’s say you only loaded Opportunities, but not Opportunity Line Items, so there’s no way to get to the product level with this dataset. You can load the Line Items in a separate dataset and augment with the existing Opportunity Data, but in this case, rebuilding the dataset from scratch would be better.

On the other hand, you can’t run aggregations in Dataflows, so you can’t reduce the grain either. Groupings will help you there.

Exploration, Visualization & Dashboard Design

The exam parts that focus on Exploration and Visualization seem to be quite straightforward. If you know how to navigate the application, know key principles (progressive disclosure) for Dashboard design and know how to review (Dashboard inspector) and improve dashboard performance (e.g. pages, global filters, combine steps and such), you should be able to ace this section. Don’t forget to look into Actions and remember the C-A-S-E-S formula for good data analysis!

A particular focus should be on bindings – there are only a few questions on bindings, but you really need to know them to score these points. Consider building each binding type at least once and make sure that you understand what “results binding” vs. “selection binding” means. Look up, what a “nested binding” is (not a separate type but a specific way to use a binding), and make sure you understand the functional blocks of binding syntax. One top resource for that is Rikke Hovgaard’s blog (start here) – hint, hint: Rikke authored *some* questions for the exam (guess which ones…).

Security and Access

Another topic that is both straightforward and tricky at the same time.

  • Review how to get people access to both Einstein Analytics and Apps that you’ve built.
  • Understand the roles (they’re different than “Roles” in Salesforce).
  • Again, there’s a marginal topic in “Inherited Sharing” vs. “Security Predicate” , but you can score some precious points there. Make sure you know the limitations of inherited sharing, and how you can leverage security for cases where you hit a wall with inherited sharing.

Einstein Discovery

For Einstein Discovery, it’s crucial to know a bit about how data gets into Discovery, and how to analyze and improve the model quality. The discovery part of the exam is too large to be neglected, but still small enough that it won’t blow up your test immediately if you fail some questions here.

Data can be pushed from Einstein Analytics and other sources, including CSV. Click the import path for both EA and CSV, review the imported data and select the data types, review the columns, the outcome variable (a single one) and the predictors / actionable variables (up to three). You will see that some columns are closely related and Discovery can prompt you to review if they really represent the same thing (such as Product Number and Product Name) – or if they is just a very high correlation. You typically want to drop data only if you really know that they mean the same thing – when in doubt or you don’t know, then don’t make assumptions.

Understand the impact of outliers / extreme values: Typically these should NOT be in your analysis because you don’t want edge cases to drive your prediction. Don’t be shy to trim at least everything beyond the 6th standard deviation.

Finally, you should know how to read and understand the charts used by a Discovery story and the quality metrics. While everyone knows bar charts, Waterfalls charts are lesser known, so it might be a good idea to review if you really understand how Discovery uses both types to present data to you.

At the time of writing, there are just a few flashcard sets available to memorize the stuff. You can find the handful of them by searching for Einstein Analytics, combined with any EA specific term. While it helps you massively to memorize terms, limits etc, the one thing that will drastically improve your chances is reading the exam guide closely, get hands-on experience and/or actively follow the academy training videos. You can use the old Advanced Accreditation form to test your knowledge still. It will give you an idea what the Analytics team thinks you should focus on, even if the test is only for you to test your knowledge and will neither be scored nor will it give you an accreditation.

General Guidance

The general tips for all Salesforce exams apply here as well:

  • know the pass score and know what it means in numbers of questions. There will be 60 questions and the pass score is 68%. So 41 correct answers will let you narrowly pass, and there are up to 19 question that you can miss.
  • use the “mark for review” checkbox whenever you’re not 100% sure about your answer (it will give you a good overview later). Immediately after the last question, you will get the chance to review your checked questions – if your number is 15 or above, it’s a good idea to review all checked questions. Remember that there are probably some wrong answers among those questions that you DIDN’T check for review.
  • Read questions AND answers closely. Really, really! There’s a lot of information that you will only recognize on the second or third read. And you will be more successful to separate bogus answers from the correct ones if you scrutinize every single word.
  • There aren’t just “correct” and “wrong” answers – there are also items that are called “distractors” that could be correct… or almost correct. Scan each question and answers thoroughly for tiny deviations from Salesforce terminology, such as “computeField” (the real term for a dataflow function to compute a field is “computeExpression”). Scan for plural vs. singular, scan for the wrong order of steps.
  • If you don’t know the answer, try to rule out wrong answers.
  • If you still have no clue, check the “mark for review” checkbox and don’t waste more time on this item.

I hope this helped you a bit. Good luck with the exam, and let me know how you did!

11 Replies to “Analytics certified… finally!”

  1. Hi
    I got failed in first attempt so need your help on few questions. I was just sort of 3 questions 38 questions were correct in first attempt

  2. from where you prepared for this. i am planing to give this exam.
    can you please share the material and ref. for study.

  3. Hello,
    I have been looking for documentation about “include_self_id” and I find some in your blog. From your explanation, it says to set “include_self_id” to false; otherwise the user will not be able to see their records (respectively).

    I’m wondering if you can point me to doc that explains what use case would require a setting of “include_self_id” to false; or to true.

    Thank you.

    1. Imagine a user hierarchy: Linda is the CEO, and the manager of Ron who is the manager of Anna and Ben. If Anna and Ben own opportunities, a flatten operation would result in Linda/Ron/AnnaAnna/Ron/Linda and Linda/Ron/BenBen/Ron/Linda respectively. If you want to see all opportunities of Ron’s team, you’ll need to choose “include_self_id” to false so the flatten operation with result in “Linda/RonRon/linda” for both Anna and Ben, and you can filter for all opportunities Ron’s team owns. Yet you wont see Ron’s Opportunities in this setup (because his hierarchy path is “Linda”). Consider what the outcome of the flatten operation should be, and what you want to do with the result – and it always helps to actually flatten a hierarchy and actually see the result. You’ll immediately see if “include_self_id” helps you. If false, you’ll only get the hierarchy ABOVE the current records, if set to true, you see the hierarchy INCLUDING the current record.

  4. Thank you for your quick response. I’m glad I asked since your initial post of:

    “Imagine two records that include self_ids,
    one has “me/myteam/mymanager/ourboss”,
    the other one has “mycolleague/myteam/mymanager/ourboss”.
    They won’t be able to see their records respectively.”

    confused me.

    From your last response, what I gather is that
    if include_self_id = true ; then Anna or Ben will be able to see their records as well as opportunities owned by Ron (their manager) and other opportunities owned by other persons that are ‘under’ Ron.

    1. I just double checked: the order in the path is correct in the blog post and wrong in my comment: Anna/Ron/Linda would be correct.
      Now, forget row visibility for a moment – it’s a use case but not the only one.

      If you filter for “contains: Ron” on the flattened path, you will get (a) Ron’s subordinates if you set “include_self_id” = false and (b) Ron and his subordinates if you set “include_self_id” = true.

      It depends on how you want to use the flattened path. Let’s say you want only Ron’s team, then you would need Equals:”Ron/Linda” as the flattened path to get Anna’s and Ben’s records but not Ron’s, and that would be a use case to flatten without the self_id, because “Anna/Ron/Linda” can never equals to “Ben/Ron/Linda”. If you leave the self_id out, “Ron/Linda” is equal to “Ron/Linda”, and that’s obviously a match.

  5. Thank you. I will test this include_self_id to strengthen my understanding.

    On another topic, I was watching the videos on Advanced Techniques in using ED:\. (Last video: 40 seconds into it)

    I don’t think I can attach a snapshot here but it showed a graph of ‘Cumulative Capture Rate Chart’. I thought that at first glance – that the model is good because of the high accuracy rating of 0.7434. I listened and watched this video a number of times and I still cannot figure out if he confirms that the model is good or if it is weak? Or do we need to look at other variables under ‘Scoring Metrics’ such as Precision and Recall before we can judge that it’s a good model? Thank you.

    1. That’s tough to say. R squared is one of the key values, and 0.74 looks pretty high indeed, but what it means depends a lot on the choice of variables and the variance. R squared is the amount of variance in the dependent variable explained. So it can’t be used as a quality indicator per se. In some scenarios, you can foresee that R2 of .25 is already a success, but for what Einstein Analytics does, a number of .80 and more can still be too small for a reliable prediction.

      What I would try is thinking about the outliers in the training data first and remove them (which will trim a bit of your variance, so the same model could get a bit higher on R squared without changes). You could then scrutinize all cards in your story for relevance, make some assumptions and see how they fare. The goal is to find the ones that are relevant and remove the less relevant ones.

      Hope that helps, my last theory lesson is almost 25 years back in university, so I’m probably talking semi-nonsense. Regarding the exam: Model quality is just one question, and a few about the different charts. you won’t fail the exam because of the quality metrics.

  6. I took the test just an hour ago and I passed the certification exam!

    I have 2 complaints. 1) I finished the 65 questions in 45 minutes; quickly reviewed my answers in 15 minutes. I wanted to go through each one very carefully but I d e s p e r a t e l y needed to pee after half an hour, so I had no choice but to submit the entire exam as I could no longer hold it. 2) It is my opinion that the answers for the questions on Einstein Discovery were quite subjective and ran more along the lines of what a business consultant should say to the client – therefore it was difficult to gauge what is the correct answer. Most of the Einstein Discovery questions had 2 plausible answers each and there is no material that teaches which Salesforce deem better than the other.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.