Currently, this project contains empirical evaluation to support a restriction on Julia types to provide for decidable subtyping for the Julia language.
Thus, we need to analyze type annotations used in Julia programs to see if they satisfy a wildcards-like restriction (there is also an analysis of scoping, lower bounds, lower+upper bounds).
Note. Some annotations that do not literally correspond to the restriction
from JB's thesis proposal/paper on decidable subtyping are not reported.
In particular, cases like Tuple{Ref{T}} where T are not reported because
they are trivially equivalent to Tuple{Ref{T} where T}.
Analysis results for the May 2023 corpus used for the thesis/paper can be found here.
Usually, type annotations appear after :: in the code:
- as a part of the method signature (either as an argument or return type annotation),
- as a type assertion in the method body.
When methods are generic, there can be an extra where sequence
in the method signature, outside of the argument list.
Examples:
foo(x :: Int) :: Boolfoo(x :: Vector{T} where T)foo(x :: T, xs :: Vector{T}) where Tx :: Int
We collect method type signatures and all other type annotations to the right
of ::, which includes type assertions and types of fields.
Note. MacroTools.jl package has a handy function isdef to check
for function definitions, but it seems to
always return true.
We can use that to process method signatures.
longdefturns all functions into long forms, including nested expressionssplitdefconveniently processes any function definition form (short, long, anonymous) except for the do-notation
But we also want to collect information from nested function definitions
and stand-alone type assertions.
This is done manually with @capture.
We collect all user-defined type declarations and record the declaration
itself and its declared supertype.
Then, we check whether they satisfy use-site variance when treated as complete
types. For example, Foo{X, Y<:Ref{X}} is analyzed as
Foo{X, Y} where Y<:Ref{X} where X.
Decidable subtyping doesn't require type declarations to satisfy use-site
variance, but it is a nice indicator of the complexity.
-
README.mdthis file -
init-script.jlto install dependencies into the global Julia environment -
types-extract.jlscript for extracting type annotations from source code of packages -
types-analyze.jlscript for analyzing extracted type annotations -
run-tests.jlconvenience script for running the tests ($ julia run-tests.jlor$ ./run-tests.jl) -
srcsource codeJuliaSub.jlmain modulelb-analysisanalysis of lower boundslib.jlmain file combining everything related to the analysisdata.jldata types used for the analysisprocess-code.jlextraction and counting lower bounds in Julia expressionsprocess-text.jltextual and parse-based analysis of lower bounds in textprocess-pkgs.jllower-bounds analysis of files, packages, and folders with packages
types-analysisanalysis of type annotationslib.jlmain file combining everything related to the analysisdata.jldata types used for the analysistypes-extract.jlextraction of type annotationstypedecls-extract.jlextraction of type declarationstype-analyze.jlanalysis of type annotationstypedecl-analyze.jlanalysis of type declarationspkg-process.jlprocessing of packages: extraction of type annotations and declarations into a CSV, an analysis of type annotations and declarations read from a CSV and saving interesting results into another CSV
utilsauxiliarylib.jlmain file combining all utilitiesequality.jlgeneric definition of structural equalityfile-system.jlfile system helpersmultiset.jlmultiset merging via adding frequencies (instead of default max)parsing.jlhelpers for parsing Julia filesstatus-info.jlcustom logging
-
lb-analysis.jlscript that performs a complete analysis of lower bounds in the given folder with Julia packages -
Project.tomldependencies
-
Julia with the following packages:
MacroToolsfor working with Julia AST
Note. Another package that could have been useful isMatchMultisetsfor counting frequencies of lower boundsDataStructuresfor linked lists, to efficiently collect annotationsCSV.jlDataFrames.jlDistributed.jlArgParse
Assumes ../utils/JuliaPkgsList.jl and ../utils/JuliaPkgDownloader.jl.
- For both packages, run
init-script.jlfirst. JuliaPkgsList.jlshould be "patched" with an emptydata/excluded.txtfile to make it easier to track which entries are invalid (since the file is outdated now anyway).
Note. Sometimes because of network issues, some packages are not downloaded. If in the case of all packages, the number of failed packages is > 50, run downloading again. Several dozen packages will remain broken for other reasons.
$ ../utils/JuliaPkgsList.jl/gen-pkgs-list.jl 100 -p data/julia-pkgs-info.json --name --includeversion --includeuuid -o data/pkgs-list/top-pkgs-list.txt
$ ../utils/JuliaPkgsList.jl/gen-pkgs-list.jl 0 -p data/julia-pkgs-info.json --name --includeversion --includeuuid -o data/pkgs-list/all-pkgs-list.txt
$ julia -p 32 ../utils/JuliaPkgDownloader.jl/download-pkgs.jl -s data/pkgs-list/100-top-pkgs-list.txt -d data/100
$ julia -p 32 ../utils/JuliaPkgDownloader.jl/download-pkgs.jl -s data/pkgs-list/all-pkgs-list.txt -d data/all
Run init-script.jl first.
Note. Output and error streams are redirected to a file.
To print to the terminal, remove > data...
To extract type annotations:
$ julia -p 32 types-extract.jl data/100 data/ta-info/100 > data/ta-info/log-extract-100.txt 2>&1
$ julia -p 32 types-extract.jl data/all data/ta-info/all > data/ta-info/log-extract-all.txt 2>&1
To analyze type annotations:
$ julia -p 32 types-analyze.jl data/ta-info/100 > data/ta-info/log-analysis-100.txt 2>&1
$ julia -p 32 types-analyze.jl data/ta-info/all > data/ta-info/log-analysis-all.txt 2>&1
To extend the output CSV of the analysis and have a new CSV with types of interest:
-
In
src/types-analysis/pkg-process.jl:- extend
ANALYSIS_COLS_ANNS_NOERR - in
analyzePkgTypeAnns,- extend
failedResult - extend
dfta - add a
df*var and extend the for-loop right after - extend the resulting
Dict
- extend
- in
getTypeAnnsAnalyses, extendmapinvarsAnalyses - extend
ANALYSIS_COLS_DECLS - in
analyzePkgTypeDecls,- extend
failedResult - extend
dftd - add a
df*var and extend the for-loop right after - extend the resulting
Dict
- extend
- in
addTypeDeclsAnalysis!, extendnewCols - in
analyzeTypeDecl, extend the resulting array and increment infill - in
analyzePkgTypesAndSave2CSV,- extend both
combineVCat! - add a
CSV.write
- extend both
- extend
-
In tests, make sure to add
isfileinpkg-process.jlfor new CSV files. Furthermore, manually check that necessary annotations/declarations are reported, as it is easy to make mistakes when copying stuff in dataframe-related code...