Title: ANALYSIS OF MASTECTOMY IN BREAST CANCER TREATMENT
1ANALYSIS OF MASTECTOMY IN BREAST CANCER
TREATMENT Beatrice Ugiliweneza, Patricia Cerrito,
University of Louisville, Louisville, KY
ABSTRACT Objective Surgery is main treatment
used in most early breast cancer cases surgery
is the primary treatment for breast cancer to
remove as many cancer cells as possible. There
are two types of surgeries The mastectomy and
the lumpectomy. Studies have shown that
mastectomy is the most performed in comparison to
lumpectomy. We analyze in detail the information
on Mastectomy as a treatment option for breast
cancer to compare the use of different types of
Mastectomy and to study the cost of the most
frequent types. Method We use the 2005 data from
the National Inpatient Sample (NIS). We first
filter the data with respect to the main types of
mastectomy Radical mastectomy, modified radical
mastectomy, simple (total) mastectomy, and the
subcutaneous mastectomy. Second, we analyze them
with summary statistics using SAS, and then
finally, we look at the cost of the most frequent
by plotting the actual values and the future
trend. We use SAS Text Miner to compress patient
diagnoses, and compare them to the types of
treatment. Results The unilateral extended
mastectomy is the most frequently used in
treating breast cancer with 56 of the total
mastectomies. Then, we have the simple (total)
mastectomy with 27. Finally, we have the
bilateral mastectomy (.9) and the radical
mastectomy. The remaining percentage contains the
less frequent cases and different breast
reconstructions. The cost of the modified radical
mastectomy has a constant trend of around 20,000
thousand dollars Conclusion The modified radical
mastectomy is the most performed so far in the
treatment of breast cancer by mastectomy. SAS can
be used to study healthcare cases. METHOD The
data were collected by the Nationwide Inpatient
Sample (NIS). The NIS is a database collected by
the Healthcare Cost and Utilization Project, a
family of health care databases and related
software tools and products developed through a
Federal-State-Industry partnership and sponsored
by the Agency for Healthcare Research and
Quality. The data they obtained from US
hospitals was made available via their website
(http//www.hcup-us.ahrq.gov/home.jsp). Table 1
gives a summary of the mastectomy procedures from
2005. Table 1. Mastectomy Procedures
In contrast, there were 1697 lumpectomies. We
want to determine whether the ratio of
lumpectomies to mastectomies is increasing or
decreasing over time. We also want to investigate
the adverse effects and recurrence resulting from
mastectomy versus lumpectomy. Because the NIS is
not longitudinal, it is not possible to
investigate these issues using this dataset. To
investigate the adverse effects, we use SAS Text
Miner on procedures PR2-PR15 in the NIS database.
We cluster the procedures to determine additional
needs of the patients. To investigate
longitudinal effects, we use the NIS to score the
Medical Expenditure Panel Survey (MEPS), which
has longitudinal information but only uses 2
digits of the ICD9 procedure codes instead of 4
digits so there is no way to identify a
lumpectomy versus a mastectomy. Scoring will
allow us to distinguish between lumpectomy and
mastectomy in the MEPS. CONTACT INFORMATION
Beatrice Ugiliweneza University of
Louisville Louisville, KY 40292 Work Phone 502
852-6022 E-mailb0ugil01_at_louisville.edu
RESULTS Scoring the MEPS The NIS data contain
various surgical treatment procedures for breast
cancer. After filtering the cases of mastectomy
and lumpectomy, the number of observations was
considerably reduced. The analysis was performed
on 315 observations for the variable, LOS (Length
Of Stay) and 301 observations for the Total
Charges. The Kernel Density Estimation helps
visualize the density function and test for
normality. PROC KDE for Length of Stay is a way
of examining the procedures in detail. Figure
1 Kernel Density Estimation for LOS for
Mastectomy and Lumpectomy in the NIS
data Figure 1 shows that the LOS is
normally distributed for both mastectomy and
lumpectomy. This graph shows that the patients
having a mastectomy stay longer than those having
a lumpectomy. Figure 2 gives the kernel density
for Total Charges. Figure 2 Kernel Density
Estimation for Total Charges for Mastectomy and
Lumpectomy in the NIS data The total charges
variable is also normally distributed for both
mastectomy and lumpectomy. This graph points out
that the total cost of mastectomy has a higher
probability of a higher cost compared to the cost
of lumpectomy. The MEPS data are not precise on
different treatments, especially on surgical
treatments of breast cancer. In order to get a
complete data set, the previous results were
scored to this data set. Logistic regression, as
a predictive modeling procedure, is applied to
the result. The basic logistic regression model
is performed by PROC GENMOD. We apply the
logistic regression to the result using SAS
Enterprise Guide 4. By doing this, we use the
model of NIS procedures to score the MEPS
procedures. After this step, we separated the
MEPS data from the NIS data. This is one of the
first steps to preprocess the MEPS data for
further analysis. Figure 3 compares the MEPS to
NIS for total charges Figure 4 compares the
length of stay. Figure 3 Kernel Density
Estimation for Total Charges for Mastectomy in
MEPS inpatient data set compared to NIS
dataset Figure 4 Kernel Density
Estimation for Total Charges for Mastectomy in
MEPS physician visits data set compared to NIS
dataset
Figure 3 and figure 4 show that the resulting
Total Charges for Mastectomy in the MEPS data is
skewed and normally distributed compared to the
Mastectomy in the NIS, which is fairly normally
distributed. For this reason, after merging the
physician visit data and the inpatient data,
minor changes are needed for this variable before
proceeding in the analysis. From the two
incomplete NIS and MEPS datasets, we are able to
construct a complete MEPS dataset. The diagnosis
codes in the MEPS are now complete and we can
differentiate mastectomy from lumpectomy. The
dataset is ready to be used for longitudinal
analysis. In the treatment of breast cancer, the
chance of having a mastectomy is significantly
higher. The cost of this treatment is high, too,
but the length of stay is similar for each
procedure. Clusters of Procedures To examine
clusters of procedures, we first use the
code data nis.mastectomy3 set nis.mrsa where
rxmatch('8521',pr1)gt0 or rxmatch('8541',pr1)gt0 or
rxmatch('8542',pr1)gt0 or rxmatch('8543',pr1)gt0 or
rxmatch('8544',pr1)gt0 or rxmatch('8545',pr1)gt0 or
rxmatch('8546',pr1)gt0 or rxmatch('8547',pr1)gt0 or
rxmatch('8548',pr1)gt0 if (rxmatch('8521',pr1)gt0)
then lump1 else lump0 remainingprocedurescatx
(' ',pr2,pr3,pr4,pr5, pr6,pr7,pr8,pr9,pr10,pr11,pr
12,pr13,pr14,pr15) run Then we use SAS Text
Miner on the defined field, remainingprocedures.
Procedure 85.41 (unilateral simple mastectomy) is
listed as a secondary procedure 711 times after
it was previously listed as a primary procedure.
The most common secondary procedure, occurring
2836 times is Excision of axillary lymph node.
The main clusters of procedures are listed in
Table 2. Table 2. Clusters of Secondary
Procedures In particular,
the last cluster of secondary procedures define
patients who have infections while undergoing the
primary procedure of mastectomy of lumpectomy. It
is a substantial number of patients. CONCLUSION
We can use SAS Text Miner to classify groups of
patients by their conditions and secondary
procedures. Once identified, the clusters can be
used to investigate in more detail. This
research shows that data mining can be used to
complete one dataset using another one that also
has incomplete information. The MEPS dataset,
which is incomplete on the procedures because of
the HIPAA de-identification, is completed by the
NIS dataset using predictive modeling and
scoring. We found the variable, Total charges, is
normally distributed and the LOS (Length Of Stay)
is mostly one day. All this helped us to do the
first preparation of the MEPS data. Further
analysis will be done with an ARIMA (Auto
regressive Integrated Moving Average) model.