Microsoft Fabric Key Concepts: Your Questions Answered!

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags: Microsoft Fabric, OneLake, Direct Lake, Data Mesh,

This series of blog posts covers the LDI Microsoft Fabric Syllabus and is also available in e-book format under LDI Content

The series is split up into 4 blog posts:

Microsoft Fabric Key Concepts (this post)
Working with the Lakehouse and Data Warehouse (Coming soon!)
Data Science and real-time analytics (Coming soon!)
Building a semantic model and vizualising data (Coming soon!)

There is an interactive exercise as part of Q4! Let us know if you like this style of learning by filling in the feedback form at the bottom of the article!

Q1: What is Microsoft Fabric?

Microsoft Fabric combines Power BI with all the core functionality and tooling of a traditional Azure data platform into a single Software-as-a-service (Saas) environment that caters for both low code self-service and code-first pro development.

With Fabric enabled on the Power BI tenant, a Power BI workspace is no longer just somewhere to host and manage reports and datasets/dataflows. You can now also use it to create lakehouses, data warehouses, pipelines and notebooks to ingest, clean, model and analyse data all in one place.

How we fundamentally store data has also been re-engineered from the ground up. With Fabric, we’re introduced to “OneLake”, a new Data Lake solution built with Analytics in mind. Through the addition of OneLake we can now query our Datalake directly from Power BI (Direct Lake), providing all the latency benefits of Direct Query without sacrificing performance.

It’s worth noting that Fabric isn’t a replacement for Synapse – rather, just like Power BI, a new generation of Synapse is part of the Fabric ecosystem (in the same way as Word and Excel are part of MS Office!).

Watch the video below to get an overview of Microsoft Fabric in less than 2 minutes!

Q2: What makes Microsoft Fabric unique?

The simple answer is that it offers a great answer to the traditional compromise between efficiency, performance and Data Ops by converging the storage and compute platforms. Every data platform can be thought of with these two fundamental concepts in mind: where and in what format do I want to save my data so that it is accessible to the right people at the right time (storage) vs what languages and tools will I use to query, read and write to it (compute).

There’s no need to re-invent the wheel here and Microsoft haven’t – open-source Delta Lake (Parquet) for storage, and Spark or SQL for compute.

Watch the Kasper On BI episode below to learn more about what makes the Microsoft Fabric platform unique!

The key advantage of this is simplification. Everything is offered as a fully managed Software as a Service (SaaS) which means that there is no infrastructure to set up or manage, allowing organisations to implement a “data mesh” style architecture with ease using just Power BI workspaces.

Q3: What is OneLake and how does it enable a “data mesh” style architecture?

A data mesh is “a domain-driven analytical data architecture where data is treated as a product and owned by teams that most intimately know and consume the data.” Zhamak Dehghani (founder)

One Lake’s core principles are synonymous to that of Microsoft’s implementation of One Drive. With OneDrive, files are stored in a single storage account across the organisation with permissions granted on individual files, folders or through sharing links. Similarly, with OneLake, all Fabric data items output to this single storage account and items can be individually shared or permissions granted at the workspace level.

In fact OneLake is the underlying storage engine for all the Fabric workloads:

Like OneDrive has a windows application to sync files between your machine and the cloud, Microsoft Fabric too has the OneLake File Explorer application which allows you to do the same, uploading files into a lakehouse item that is backed by OneLake.

Whilst permissions can be assigned at a workspace level, any tenant level settings and features – including Security, Governance and Data Lineage – apply to all workspaces and lakehouses in the tenant and all files/items within them.

The set of items contained in a workspace is a “data product” and data products are encapsulated in a group of workspaces called a domain.

In the near future, it will also be possible to assign individual controls and permissions on domains. So, for example, if you wanted to restrict any Power BI reports within the Finance Domain (group of workspaces) from being able to utilise the “Publish to Web” feature then it will be possible to set this control at the domain level and ensure that all workspaces with finance data/reports are assigned to this domain. With this level of finite control over our data domains, previously complex architectural patterns such as Data Mesh, are now, vastly more accessible.

Q4: How do I get data into OneLake?

There are multiple ways to get data into OneLake depending on your preference/skillsets and the type of data you are working with.

You can use Dataflows Gen 2 to connect to any of the available sources in Power BI and use the same standard Power Query Online interface to ingest/shape data as required. The difference with Dataflows Gen2 is that there is an option to write the results of the dataflow each time it is refreshed to either a SQL database (like datamarts did automatically) or to a Fabric Lakehouse:

This will automatically convert the data into the delta lake format required to be able to use it with the other Fabric engines such as SQL. You can choose whether to append or replace the data in the lakehouse each time the dataflow is refreshed.

It is also possible to use a Synapse Data Factory pipeline where one of the outputs is a Fabric lakehouse.

You can even upload files directly into the Files layer of the lakehouse using the OneLake file explorer or the Upload Files/Folder feature within a lakehouse item. From here it can be accessed via a spark notebook or you can have Fabric automatically attempt to convert it into delta lake format using the “Convert to Table” feature

Try the “Practice the clicks” tutorial below to try this out! (Click on the dotted square to open in full screen!)

Q5: How do I know which tool to use to clean and shape my data? What are the use cases for each?

The basic use case for each tool/compute engine in Microsoft Fabric is just to use whichever one the individual analyst working with it feels comfortable in.

As a low code (completely UI-based) approach, you can use Dataflows Gen2 as the primary method to apply an ingestion and compute layer on top of your data. If your analysts are a bit more comfortable writing Python (PySpark), SQL (Spark SQL) , R or Scala to query and transform data, then notebooks are the way forward. This approach will offer you a lot more flexibility in what operations you can apply to that data (moreso than dataflows – functions to use Regular expressions or regression with a single line of code for example!).

All of these languages – Python,R, SQL or Scala – can be used interchangeably within the same notebook (each cell can be in a different language!) to interrogate your data using the spark engine.

Q6: What is the Spark engine?

Apache Spark is an open-source Massively Parallel Processing (MPP) engine that runs on a group (“cluster”) of machines where the data processing operations are divided between the machines as jobs and then collated together to provide the end result.

With most spark engines (including Databricks and Synapse) you have to configure and spin up this cluster before you can run your jobs, but with live pools in Fabric the spark experience is serverless – this happens automatically in seconds behind the scenes as soon as you first run a notebook cell against your lakehouse or execute a Dataflow.

Whilst Dataflows in Power BI have traditionally just utilised the Power Query engine, Dataflows Gen 2 in Microsoft Fabric utilise Spark behind the scenes to make that processing more efficient and allow them to write back to delta lake (rather than just the traditional CSV – or more accurately CDM format – that data was stored in traditionally with dataflows).

You can create notebooks automatically connected to any lakehouse in Microsoft Fabric with no need to mount it as a data source first. In your lake house-connected notebook, you can write in either Python (PySpark), SQL (Spark SQL) or Scala in order to apply any data transformations.

Watch the video below to have a look at how the spark engine works with Microsoft Fabric for data engineering product experiences.

Q7: What is Direct Lake?

Whilst eliminating data transfer and duplication between data lakes and data warehouses is great, the other area of data duplication has always been the tabular models (import mode) used in Power BI.

Whilst many SQL/Relational database systems have allowed you to utilise [Direct Query] in Power BI – thereby eliminating data transfer and allowing you to pass back queries as the logged in user – performance has generally suffered as a result.

This is because traditionally with Direct Query each DAX query required to generate a visual/apply a filter in Power BI report had to be essentially converted into SQL with the query result passed back to the source. If the underlying data structure is more normalised (e.g [Third-normal form]) compared to the star schema that Power BI works well with) then all the required table joins are also being done on the fly and the queries can be quite expensive as a result, especially with more complex DAX expressions!

So the optimal, performant way of working with data in Power BI has been to bring it into its own Analysis Services Tabular model – powered by the Vertipaq storage engine – where data is compressed and held in a format optimised for querying and aggregating at scale using DAX. (Note that this contrasts with operational systems/relational databases where data is held in a normalised format optimised for working with individual records).

But remember we said that Parquet was also a compressed data format? What if Microsoft could leverage this to build a tabular model engine on top of it so that it could be queried using DAX and still be performant? This is what they’ve done with Microsoft Fabric!

It allows for what is called Direct Lake mode with Power BI – the ability for the Power BI data model to be stored as parquet files in the lake house rather than its own Vertipaq/Analysis Services storage engine.

Therefore with Direct lake mode you can query delta lake directly from Power BI. Your DAX measures will evaluate directly against the lake with the same kind of performance as you’re used to with import mode but without the need to refresh your models. This means no data duplication or latency and the credentials of the user viewing the report are passed back to respect the security/access permissions set at the delta lake level!

Note that initially Microsoft Fabric will only have security set at the artifact or workspace level. For Row/Object Level Security this has to be set using the SQL Datawarehouse which then uses Direct query over SQL rather than Direct lake. For General Availability of Microsoft Fabric, Microsoft are planning to have full Row/Object Level security applied directly to OneLake, even for when the parquet files are accessed directly!

Q8: How can I leverage AI with Microsoft Fabric?

Every area of the Microsoft technology stack is being infused with AI and Microsoft Fabric is no different. Just as Office is harnessing the power of Generative AI with co-pilot, with Microsoft Fabric you will soon be able to utilise co-pilot within notebooks to help write the code needed to perform certain tasks and have Power BI leverage co-pilot to automatically produce reports and logic.

I would expect much more from this in the future. Generative AI needs a solid foundation of well-modelled data which many organisations currently lack. Having data in a common format stored in a single cloud-based location (OneLake) makes Microsoft Fabric the perfect enabler for AI and very timely for where technology is heading.

See the video for a quick overview of co-pilot in Power BI!

Q9: How does the pricing model work for Microsoft Fabric?

The pricing model for Microsoft Fabric works in the same way as, and is part of, Power BI Premium Per Capacity. In fact, if you have already have Power BI Premium Per Capacity, this can just be automatically upgraded to enable Fabric. A Power BI Premium P1 SKU relates to a F64 SKU of Fabric – 64 units of compute to use across Power BI and other Fabric workloads.

The utilisation of these compute units works in an almost identical fashion to Power BI Premium Gen 2. The initial version of Power BI Premium Per Capacity (Gen 1) was based largely on memory – Power BI datasets have a certain size based on the data volume and levels of compression achieved; the Premium Capacity has a memory allocation and the datasets that you have loaded (or pinned) into memory have to fit into the memory allocated.

With Premium Gen 2, and now Fabric, memory is much less of a constraint. This is because Microsoft have been ruthlessly efficient with how memory is used – rather than loading entire datasets into memory, data (and now just specific parquet files) are loaded into memory dynamically in real-time as reports are being consumed or data is being processed through a compute engine.

So the only real constraint is then the volume of usage – measured by CPU rather than memory. But Microsoft recognise that every organisation has peaks and troughs even throughout a single day. Perhaps reports have heavy usage first thing in the morning but then consumption dies down later in the day. To accommodate this, the capacities have built-in smoothing and bursting. Bursting provides a temporary boost of CPU power to utilise more than is available through the capacity SKU during periods of heavy usage.

Smoothing allows the CPU load to be balanced over each 24-hour period following a spike so that it is not impacted by those relatively short bursts of intense usage. If you do happen to go over CPU consistently, your reports/workloads don’t generate errors but rather just slow down.

It remains to be seen exactly how intensive the other workloads will be and what SKUs will be needed for what levels of activity. Note that Premium Per User (PPU) doesn’t provide access to Fabric – the Power BI licencing model remains exactly the same and still relates to all Power BI items (i.e. you can create Power BI artifacts in Pro/PPU workspaces but not Fabric items unless the workspace is backed by Premium Capacity).

There is also a more flexible option for Fabric pricing compared to a Premium Capacity SKU. This Premium capacity SKU requires an up-front annual commitment and the starting point of $5k/month may be too much for smaller organisations. So, like Power BI Embedded Azure SKUs (which are essentially an on-demand Premium capacity), Microsoft have also Azure based Fabric SKUs. These are priced on either an hourly basis from the Azure portal, or a reserved capacity where blocks of hours can be purchased to save on costs. They can be paused, stopped/started and scaled up or down as required, either manually through the Azure portal or programmatically.

These can be used instead of or in addition to Power BI Premium capacity for Fabric. You can have certain teams use their own capacities which can be started and stopped as required through Azure Fabric SKUs, or use the Azure SKUs for Development/Test workloads and Premium Capacity for Production.

Or you can just Azure SKUs for Fabric if you don’t already have Power BI Premium Per Capacity. The lowest F SKU will be available for less than $300/month on a pay as you go basis!

This will make Fabric accessible to a very broad set of organisations.

Q10: There’s a lot here! How do I get started?

You can get started with Microsoft Fabric today by just signing up to a trial (60 days) at aka.ms/try-fabric or just enabling the trial on your Power BI Tenant. Watch the video below to learn more!

In most cases, the best way to get started with Microsoft Fabric is with specific, small use cases and an engaged, multi-disciplinary team of individuals who can make the magic happen. Microsoft Fabric doesn’t have to be turned on for everyone in your organisation at once – you can choose to enable it only for one or more Azure Active Directory (AAD) groups in the Power BI Tenant Settings. So start with the requirements for the use case from your Power BI reporting layer and work backwards from there, designing and curating the data structures you need at each stage and identifying the individuals who can process the data with their tools of choice at each stage. (It doesn’t matter if this is people across different teams – the handover between e.g data engineering and Power BI teams is now much more seamless than it ever has been!). Put those individuals in a security group, add that group into your tenant settings, allow them to create the workspaces/artifacts they need and let them bring Microsoft Fabric to life for your organisation!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Microsoft Fabric Key Concepts: Your Questions Answered!

Rishi Sapra

There is an interactive exercise as part of Q4! Let us know if you like this style of learning by filling in the feedback form at the bottom of the article!

Q1: What is Microsoft Fabric?

Q2: What makes Microsoft Fabric unique?

Q3: What is OneLake and how does it enable a “data mesh” style architecture?

Q4: How do I get data into OneLake?

Here’s an interactive tutorial

Q5: How do I know which tool to use to clean and shape my data? What are the use cases for each?

Here’s an interactive tutorial

Q6: What is the Spark engine?

Here’s an interactive tutorial

Q7: What is Direct Lake?

Here’s an interactive tutorial

Q8: How can I leverage AI with Microsoft Fabric?

Here’s an interactive tutorial

Q9: How does the pricing model work for Microsoft Fabric?

Here’s an interactive tutorial

Q10: There’s a lot here! How do I get started?

Here’s an interactive tutorial

Leave a comment Cancel reply