1. Problem or Opportunity Statement:
The government spends millions of dollars and hires thousands of individuals to analyze data on behalf of the American people. As technology changes and new tools become available each year to analyze data more quickly and cost-effectively, there is a reluctance to adopt these new technologies.
We aim to create an informational resource of analytical tools that could be used by various groups in the Federal Government. For example, a procurement officer would find a central repository of data analytic tools a valuable resource during their market research evaluating the breadth of analytical tools available for conducting similar types of data analysis or displaying similar type of information. Data analysts encounter limits in their current tools (e.g., Excel is limited to about a million observations) and they would be able to use the central repository as a resource to understand what additional tools might be available to them. A section chief or the head of an agency might be unfamiliar with new open-source software preferred by newly-minted data scientists or uncertain about the risks that the open-source software poses to their agency. These policymakers may benefit from knowing that many other agencies in the Federal Government are already using the tool.
2. Identify the root cause:
During two initial focus groups to understand the types of software programs being used at different agencies in the Federal Government to analyze data, our initial interviews suggest an increasing consensus by data scientists around the use of open-source software (e.g. Python and R) to conduct analysis of large datasets. We also found preliminary evidence that the adoption of these platforms in the Federal Government has been slow for three reasons: (1) unfamiliarity with new tools, (2) uncertainty about the risks, and (3) the proficiency of their current staff on legacy systems.
We need to further understand the issues around data analysis and the adoption of new technologies in the Federal Government. We propose to create an Interagency Survey of Data Analytical Tools (ISDAT) to better understand what is currently in use and what software data analysts would like to learn in the future. The data from the ISDAT will be organized into a central repository. We believe making this information to everyone in the Federal Government will have a positive impact.
3. Benchmarking and Market Research Plan:
We propose doing an initial survey that is sent out to a small group of individuals to determine whether the questions we asked are understandable and we are receiving useful information. After making any adjustments necessary to the survey instrument, we will send the survey to everyone that might conduct data analysis in each of the five agencies in the pilot of the ISDAT.
To develop the ISDAT survey and provide the information from the survey in a searchable database of the best tools for analyzing data among five different government agencies by August 2020. We have three goals with regard to this project:
- (1) Educating Unfamiliarity - ISDAT will provide information about the analytical tools that are available along with a discussion of pros and cons of the tools.
- (2) Resolving Uncertainty - ISDAT will also provide clarity through an informational database, and potential points-of-contact, for which agencies have already adopted these technologies. This could provide decision-makers with "cover" for their decisions since by learning that they are not the first agency to adopt the new technology.
- (3) Evaluating Legacy - We recognize that there are legacy systems in the Federal Government where analysts have spent thousands of hours becoming proficient in the use a particular system or coding language. While these analysts will continue to be a significant barrier to the adoption of new technologies, the ISDAT will provide a resource for agencies to begin to think about when and how they should address these issues.
5. Project Description:
We propose to develop a pilot project for the ISDAT to identify the tools being used to analyze data by five different agencies - Department of State, Department of Health and Human Services, Department of Education, US Patent and Trademark Office and the National Credit Union Administration. The pilot project could be expanded by future EIG Results Groups to include other agencies. We would gather the information collected from our surveys of the five agencies into a searchable database. We propose that the content of our project be housed on the website of the Partnership for Public Service.
6. Impact and results:
We aim to make it easier for our stakeholders to understand what tools are available to them to analyze data. Ultimately what we are proposing would be successfully measured after our EIG program concludes. Ideally, we could also create a roadmap or some type of friendly template for future cohorts to build on or other agencies to begin utilizing.
In the short term, success will be measured by the ability of our group to create the survey instrument, how many analysts respond to our survey, the total number of agencies we get to respond, tally the type of data they analyze (stats, financial, economical), and how we are able to organization the information collected from the survey. Based on number of responses, we will test our data through data visualization (creating graphs) to better understand the results and trends of what we are seeing.
7. Stakeholder Engagement:
- Our Executive Sponsor is the Chief Financial Officer of HHS. The role of the Executive Sponsor will be to provide us guidance, insight, and knowledge as to what is needed when certain types of data is essential to key management decisions.
- Other key stakeholders
- Technology Procurement Officer: We plan to conduct more extensive interviews with some technology procurement officers in our respective agencies to understand better what type of information would be useful to their jobs.
- Chief Economist/Quantitative Analysis Division Chief: We plan to conduct interviews with directors of offices whose primary responsibility is to conduct data analysis to better understand the challenges that these officer encounter around data analytics.
- Chief Information Officer: We plan to reach out to CIOs or other policymakers to understand any concerns that they might have around the new analytical tools that are being used to conduct data analysis.
8. Risks and Constraints
One risk is that not enough people will respond to the survey. Nonresponse rates would impede our ability to achieve our project goal. Additionally, we could have sampling issues, in that the people we survey may not be an accurate representation of the audience we are trying to reach.
We are also assuming we have a clear path to house the data. At this stage in the project planning, it is unclear whether the data will be made available for wider use after we have gathered it.
Finally, in the future will there be a mechanism to update the data? We will need to build out a mechanism for expanding or updating our platform for future use.