The bug reporting system is a system that correlate server and clients logs when an error or exception happen on any of them. and store those logs in an accessible container for us developers so that we can debug the errors and fix them as fast as possible.
The system automatically report errors as soon as they happen, as long as they met the report requirements and thresholds set (especially for duplicates errors), we also allow the users to manually report a specific bug manually through ui.
The reporting system consists of 4 parts:
The DAL responsible for the bug reporting is divided into 3 parts, the container dal, the bug reports dal and the metadata (log entries) dal.
- Container DAL: responsible for creating blob container and get SAS (shared access signature) token for the container based on authorization type.
- Admin permission: Read, Write, List
- Guest/Public permission: Write only
- Bug reports DAL: responsible for creating bug report based on the supplied bug report info.
- Metadata (Log entries) DAL: responsible for mapping out the reported bugs to uploaded logs as well as indicates the timestamp and the log owner.
You can find more detail about the tables here: War to the core Tables
We have 2 API controllers that are responsible for controlling bug reports related DALs as well as allow servers and clients to store and get the needed data (tokens) when reporting bugs. all operations must be authorized except getting the response token and adding log entries (to avoid any security exploits).
- Bug Reports controller: responsible for bug reports creation, deletion and getting info, as well as getting the response token which included the base container uri, sas token and the log id that should be used to upload the logs with.
- Log Entries controller: responsible for adding log entries for the specified report ids and log owner and link them correctly with the log id on the container.
The server is the core of operating the system, the core components are:
- Bug Reports DAL: responsible for web API operations to GET, POST, bug report info, response token and add log entries.
- Bug Reports Manager: responsible for creating bug reports, check if they can be reported according to thresholds and cool down supplied via the global properties, uploads the server logs when needed (depends on freshness for last uploaded logs if any)
- Bug Reports Cache Manager: responsible for caching bug reports and managing duplicates reports and keeps track of their count and last time they occurred.
- Report Bug Server Action: an action used to create a bug report, uploads server logs (if it’s illegible) then notify related clients (if applicable).
- Add Bug Report Entry Server Action: an action used to add log entries for both client and server this action is invoked after the log upload is completed.
- Server ETW Listener: responsible for listening to etw events and report bugs on errors and critical.
The client uses the OnMessageLogged event to report bugs via server, we will dive to the whole operation in a bit but for now the components on client are.
- Bug Reports Manager: responsible for reporting bugs to server
- Bug Reports DAL: responsible for getting the response token from api and uploading logs to the azure container.
- Bug Report Response Token (Shared with DAL, API and Server): the response object that contains the container uri, log Id (blob name) and the SAS token.
- Bug Report Cached Entry (Shared with DAL, API and Server): responsible for caching bug reports and determine if they are illegible to be reported if they passed the threshold and cool down checks.
- Bug Reported Event: an event invoked in response to the Report bug server action and it’s notify the client about the report id and to start uploading it’s own logs
Now that we covered up the basic components, let’s dive into operation, but before that we wanted to ensure that the reporting feature won’t spam the server or client as much as possible and wanted to avoid any exploits that may harm the server in any way, that’s why repeated reports have maximum no. of being reported for specified duration. as well as a direct cooldown to avoid consecutive spamming. The client doesn’t directly report a bug to API, it requests from the server and once gets permission it only upload the logs as instructed.
- Upon server startup it starts listening to etw events and when an event is error or critical it proceed to report bug by invoking the report bug server action.
- The report bug server action requests the bug reports manager to create a bug report for the error that happened
- The bug report manager creates a bug report info and cache an entry (if already exists then proceed to next step without caching)
- The bug reports manager checks if the cached report can be reported or not by checking the threshold and cooldown. (the threshold cooldown and maximum count is supplied via global properties)
- once all the checks passes the bug reports manager requests the server dal to send the bug report info to API.
- The API receives the report info create a new bug report record and generate a report id then respond to server with that id.
- The server confirms that the received id exists via API again, then check if there are previously uploaded logs.
- If there is previously uploaded logs then check the latest uploaded log and check it’s freshness based on the timestamp. (the freshness threshold is being supplied via the global properties)
- If previously uploaded logs are fresh enough then go to step 12, if not fresh enough then go to step 10.
- The server requests from API a response token for the report id with a new log id.
- The server then upload it’s logs to the container using the response token.
- The server requests API to add a log entry with the specified id and associate it to the specified report id.
- The API Checks if the specified log id has been uploaded (exists) and proceed to add the log entry (otherwise refuse, which should allow the server to re upload or take a counter measure)
- The server notifies related clients that a bug has been reported.
- Upon client start it initializes Bug reports manger, which register to OnMessageLogged event and Application.MessageLogged event and when an error is logged then it proceed to report a bug by invoking the report bug client action which will start the server operation.
- When The server operation finishes, the server notifies related clients that a bug has been created and supply it with the report id.
- The client then requests from API the previously uploaded logs using its own unique id, checks the latest logs if they are fresh go to step 6 if not go to step 4. (the freshness threshold is supplied via the global properties)
- The client uses the supplied report id to get the response token from API with the new log id.
- The client starts upload it’s own log using the response token.
- The client notifies the server using the AddBugReportLogEntry client action using the log id that will be associated to the report id.
- The server then proceed to request from API to add the log entry and associate it with the report id just like the server.
- The client also let’s the player to manually report a bug by invoking the ReportBugClientAction and supply the info as needed, except that the isAutomatic property will be false.
Here is a diagram to help visualize the operation for both client and server.
- The server relies on client to know if the error is a battle error or not since we don’t have a way to know if the logged error or exception has been logged in a battle or not.
- The client can’t report bugs if not connected to a server, so all errors that happen in the Game Initialization and Login scene won’t be automatically reported. (we can maybe force the action to be connected to primary when in those 2 scenes or alternatively fully rely on the API for reporting which is not safe)
- Both clients and servers uploads the whole log file for automatic reports, we need a shared ring buffer that stores the logs without losing the format as the log file. and only upload that buffer, the whole logs should only be uploaded for manual reports by the players.
- Manual reporting is yet to be implemented (the ui that will call the bug reporting manager) as well as the upload feedback for the manual reports.